TY - GEN
T1 - Rapid and robust denoising of pyrosequenced amplicons for metagenomics
AU - Lee, Byunghan
AU - Park, Joonhong
AU - Yoon, Sungroh
PY - 2012
Y1 - 2012
N2 - Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. High-throughput pyrosequencing is a next-generation sequencing technique very popular in microbial community analysis due to its longer read length compared to alternative methods. Computational tools are inevitable to process raw data from pyrosequencers, and in particular, noise removal is a critical data-mining step to obtain robust sequence reads. However, the slow rate of existing denoisers has bottlenecked the whole pyrosequencing process, let alone hindering efforts to improve robustness. To address these, we propose a new approach that can accelerate the denoising process substantially. By using our approach, it now takes only about 2 hours to denoise 62,873 pyrosequenced amplicons from a mixture of 91 full-length 16S rRNA clones. It would otherwise take nearly 2.5 days if existing software tools were used. Furthermore, our approach can effectively reduce overestimating the number of OTUs, producing 6.7 times fewer species-level OTUs on average than a state-of-theart alternative under the same condition. Leveraged by our approach, we hope that metagenomic sequencing will become an even more appealing tool for microbial community analysis.
AB - Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. High-throughput pyrosequencing is a next-generation sequencing technique very popular in microbial community analysis due to its longer read length compared to alternative methods. Computational tools are inevitable to process raw data from pyrosequencers, and in particular, noise removal is a critical data-mining step to obtain robust sequence reads. However, the slow rate of existing denoisers has bottlenecked the whole pyrosequencing process, let alone hindering efforts to improve robustness. To address these, we propose a new approach that can accelerate the denoising process substantially. By using our approach, it now takes only about 2 hours to denoise 62,873 pyrosequenced amplicons from a mixture of 91 full-length 16S rRNA clones. It would otherwise take nearly 2.5 days if existing software tools were used. Furthermore, our approach can effectively reduce overestimating the number of OTUs, producing 6.7 times fewer species-level OTUs on average than a state-of-theart alternative under the same condition. Leveraged by our approach, we hope that metagenomic sequencing will become an even more appealing tool for microbial community analysis.
KW - Amplicons
KW - Biomedical informatics
KW - Cluster analysis
KW - GPU
KW - Metagenomics
KW - Pyrosequencing
UR - https://www.scopus.com/pages/publications/84874036155
U2 - 10.1109/ICDM.2012.68
DO - 10.1109/ICDM.2012.68
M3 - Conference contribution
AN - SCOPUS:84874036155
SN - 9780769549057
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 954
EP - 959
BT - Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012
T2 - 12th IEEE International Conference on Data Mining, ICDM 2012
Y2 - 10 December 2012 through 13 December 2012
ER -