TY - JOUR
T1 - LncRNAnet
T2 - Long non-coding RNA identification using deep learning
AU - Baek, Junghwan
AU - Lee, Byunghan
AU - Kwon, Sunyoung
AU - Yoon, Sungroh
N1 - Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press. All rights reserved.
PY - 2018/11/15
Y1 - 2018/11/15
N2 - Motivation Long non-coding RNAs (lncRNAs) are important regulatory elements in biological processes. LncRNAs share similar sequence characteristics with messenger RNAs, but they play completely different roles, thus providing novel insights for biological studies. The development of next-generation sequencing has helped in the discovery of lncRNA transcripts. However, the experimental verification of numerous transcriptomes is time consuming and costly. To alleviate these issues, a computational approach is needed to distinguish lncRNAs from the transcriptomes. Results We present a deep learning-based approach, lncRNAnet, to identify lncRNAs that incorporates recurrent neural networks for RNA sequence modeling and convolutional neural networks for detecting stop codons to obtain an open reading frame indicator. lncRNAnet performed clearly better than the other tools for sequences of short lengths, on which most lncRNAs are distributed. In addition, lncRNAnet successfully learned features and showed 7.83%, 5.76%, 5.30% and 3.78% improvements over the alternatives on a human test set in terms of specificity, accuracy, F1-score and area under the curve, respectively. Availability and implementation Data and codes are available in http://data.snu.ac.kr/pub/lncRNAnet.
AB - Motivation Long non-coding RNAs (lncRNAs) are important regulatory elements in biological processes. LncRNAs share similar sequence characteristics with messenger RNAs, but they play completely different roles, thus providing novel insights for biological studies. The development of next-generation sequencing has helped in the discovery of lncRNA transcripts. However, the experimental verification of numerous transcriptomes is time consuming and costly. To alleviate these issues, a computational approach is needed to distinguish lncRNAs from the transcriptomes. Results We present a deep learning-based approach, lncRNAnet, to identify lncRNAs that incorporates recurrent neural networks for RNA sequence modeling and convolutional neural networks for detecting stop codons to obtain an open reading frame indicator. lncRNAnet performed clearly better than the other tools for sequences of short lengths, on which most lncRNAs are distributed. In addition, lncRNAnet successfully learned features and showed 7.83%, 5.76%, 5.30% and 3.78% improvements over the alternatives on a human test set in terms of specificity, accuracy, F1-score and area under the curve, respectively. Availability and implementation Data and codes are available in http://data.snu.ac.kr/pub/lncRNAnet.
UR - https://www.scopus.com/pages/publications/85054534141
U2 - 10.1093/bioinformatics/bty418
DO - 10.1093/bioinformatics/bty418
M3 - Article
C2 - 29850775
AN - SCOPUS:85054534141
SN - 1367-4803
VL - 34
SP - 3889
EP - 3897
JO - Bioinformatics
JF - Bioinformatics
IS - 22
ER -