Quantitative sequence basis for the E. coli transcriptional regulatory network

bioRxiv (Cold Spring Harbor Laboratory)(2022)

引用 0|浏览2
暂无评分
摘要
Abstract The transcriptional regulatory network (TRN) of E. coli consists of thousands of interactions between regulators and DNA sequences. Inherently the DNA sequence is the primary determinant of the TRN; however, it is well established that the presence of a DNA binding motif does not guarantee a functional regulatory protein binding site. Thus, the extent to which the TRN architecture can be predicted by the genome DNA sequence alone remains unclear. Here, we developed machine learning models that predict the TRN structure of E. coli based on genome sequence. Models were constructed successfully (cross-validation AUROC >= 0.8) for 84% (57/68) of valid E. coli regulons identified from top-down analysis of RNA-seq data. We found that: 1) While regulatory motif strength is the most important sequence feature for determining regulon membership, additional features such as DNA shape substantially influence membership; 2) complex regulons involving multiple interacting regulators could be unraveled by machine learning; 3) investigating regulons where initial ML models failed revealed new regulator-specific sequence features that improved model accuracy. Finally, while regulon structure can appear to be variable across estimation methods and strains, we found that strong regulatory sequence features underlie both the genes that appear most consistently in regulons across estimation methods as well as the core regulon genes in the Fur pan-regulon. This work develops a quantitative understanding of the sequence basis of the TRN and suggests a path towards computationally-guided control of transcriptional regulation for synthetic biology applications.
更多
查看译文
关键词
quantitative sequence basis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要