A Bayesian Model Based Computational Analysis Of The Relationship Between Bisulfite Accessible Single-Stranded Dna In Chromatin And Somatic Hypermutation Of Immunoglobulin Genes

PLOS COMPUTATIONAL BIOLOGY(2021)

引用 1|浏览14
暂无评分
摘要
Author summary To make protective antibodies against various pathogens, the enzyme activation induced deaminase (AID) introduces mutations into single-stranded DNA (ssDNA) in the variable region of immunoglobulin genes (IGHVs) in B cells, as part of a process called somatic hypermutation (SHM). Here, a bisulfite assay, together with deep sequencing, was used to characterize the accessible ssDNA that represents the substrate of AID in B cells. To deal with issues such as noise in the data, we developed a novel algorithm to more accurately identify bisulfite accessible ssDNA regions (BARs) and applied it to the IGHV4-34 immunoglobulin gene in a human B cell line. Using the new algorithm, we found that location of these BARs recurred in certain subregions of the IGHV4-34 gene. The average size of the BARs is similar to 15 bp, which is close to the size of a transcription bubble. We also found that some potential G-quadruplex DNA structures in the IGHV4-34 gene co-located with the BARs but on the opposite DNA strand. Furthermore, we found that, most of the AID induced mutations were near to, but not within, BARs suggesting alternative mechanisms for targeting somatic hypermutation.The B cells in our body generate protective antibodies by introducing somatic hypermutations (SHM) into the variable region of immunoglobulin genes (IgVs). The mutations are generated by activation induced deaminase (AID) that converts cytosine to uracil in single stranded DNA (ssDNA) generated during transcription. Attempts have been made to correlate SHM with ssDNA using bisulfite to chemically convert cytosines that are accessible in the intact chromatin of mutating B cells. These studies have been complicated by using different definitions of "bisulfite accessible regions" (BARs). Recently, deep-sequencing has provided much larger datasets of such regions but computational methods are needed to enable this analysis. Here we leveraged the deep-sequencing approach with unique molecular identifiers and developed a novel Hidden Markov Model based Bayesian Segmentation algorithm to characterize the ssDNA regions in the IGHV4-34 gene of the human Ramos B cell line. Combining hierarchical clustering and our new Bayesian model, we identified recurrent BARs in certain subregions of both top and bottom strands of this gene. Using this new system, the average size of BARs is about 15 bp. We also identified potential G-quadruplex DNA structures in this gene and found that the BARs co-locate with G-quadruplex structures in the opposite strand. Using various correlation analyses, there is not a direct site-to-site relationship between the bisulfite accessible ssDNA and all sites of SHM but most of the highly AID mutated sites are within 15 bp of a BAR. In summary, we developed a novel platform to study single stranded DNA in chromatin at a base pair resolution that reveals potential relationships among BARs, SHM and G-quadruplexes. This platform could be applied to genome wide studies in the future.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要