Towards Inferring Nanopore Sequencing Ionic Currents from Nucleotide Chemical Structures
bioRxiv (Cold Spring Harbor Laboratory)(2020)
Abstract
The characteristic ionic currents of nucleotide kmers are commonly used in analyzing nanopore sequencing readouts. We present a graph convolutional network-based deep learning framework for predicting kmer characteristic ionic currents from corresponding chemical structures. We show such a framework can generalize the chemical information of the 5-methyl group from thymine to cytosine by correctly predicting 5-methylcytosine-containing DNA 6mers, thus shedding light on the de novo detection of nucleotide modifications.
### Competing Interest Statement
The authors have declared no competing interest.
* Kmer
: DNA or RNA sequence with length of k.
Canonical kmer
: kmer sequences purely composed of non-modified nucleotides, including {A, T, G, C} for DNA and {A, U, G, C} for RNA.
Characteristic ionic current
: ionic currents yielded by a specific kmer are usually modeled by a Gaussian distribution, the mean of which is referred to as the characteristic ionic current.
Kmer model
: a table recording kmers and their corresponding nanopore sequencing characteristic ionic currents. To avoid confusion, the “deep learning model” will be referred to as “framework” throughout the paper.
Framework
: in this paper “framework” specifically refers to the deep learning model used to predict the characteristic ionic current from kmer chemical structures.
GCN
: Graph Convolutional Network.
CNN
: Convolutional Neural Network.
NN
: Neural Network.
RMSE
: Root Mean Square Error.
R
: Pearson correlation.
BA
: Balanced accuracy.
5mC
: 5-methylcytosine.
6mA
: N6-methyladenine.
I
: Inosine.
SMILES
: Simplified Molecular Input Line Entry System for annotating chemical structures using character strings.
Atom
: specifically refers to non-hydrogen atoms throughout the paper.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined