TY - JOUR
T1 - Hc-OTU
T2 - A Fast and Accurate Method for Clustering Operational Taxonomic Units Based on Homopolymer Compaction
AU - Park, Seunghyun
AU - Choi, Hyun Soo
AU - Lee, Byunghan
AU - Chun, Jongsik
AU - Won, Joong Ho
AU - Yoon, Sungroh
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2018/3/1
Y1 - 2018/3/1
N2 - To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon sequences of 16s rRNA genes need to be clustered into operational taxonomic units (OTUs). Many existing tools for OTU clustering trade off between accuracy and computational efficiency. We propose a novel OTU clustering algorithm, hc-OTU, which achieves high accuracy and fast runtime by exploiting homopolymer compaction and k-mer profiling to significantly reduce the computing time for pairwise distances of amplicon sequences. We compare the proposed method with other widely used methods, including UCLUST, CD-HIT, MOTHUR, ESPRIT, ESPRIT-TREE, and CLUSTOM, comprehensively, using nine different experimental datasets and many evaluation metrics, such as normalized mutual information, adjusted Rand index, measure of concordance, and F-score. Our evaluation reveals that the proposed method achieves a level of accuracy comparable to the respective accuracy levels of MOTHUR and ESPRIT-TREE, two widely used OTU clustering methods, while delivering orders-of-magnitude speedups.
AB - To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon sequences of 16s rRNA genes need to be clustered into operational taxonomic units (OTUs). Many existing tools for OTU clustering trade off between accuracy and computational efficiency. We propose a novel OTU clustering algorithm, hc-OTU, which achieves high accuracy and fast runtime by exploiting homopolymer compaction and k-mer profiling to significantly reduce the computing time for pairwise distances of amplicon sequences. We compare the proposed method with other widely used methods, including UCLUST, CD-HIT, MOTHUR, ESPRIT, ESPRIT-TREE, and CLUSTOM, comprehensively, using nine different experimental datasets and many evaluation metrics, such as normalized mutual information, adjusted Rand index, measure of concordance, and F-score. Our evaluation reveals that the proposed method achieves a level of accuracy comparable to the respective accuracy levels of MOTHUR and ESPRIT-TREE, two widely used OTU clustering methods, while delivering orders-of-magnitude speedups.
KW - 16s rRNA
KW - Clustering algorithm
KW - metagenomics
KW - operational taxonomic unit (OTU)
KW - pyrosequencing
UR - https://www.scopus.com/pages/publications/85044938163
U2 - 10.1109/TCBB.2016.2535326
DO - 10.1109/TCBB.2016.2535326
M3 - Article
C2 - 26930691
AN - SCOPUS:85044938163
SN - 1545-5963
VL - 15
SP - 441
EP - 451
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 2
ER -