Additional file 1 : supplemental material for discovering and mapping chromatin states using a tree hidden Markov model

semanticscholar(2013)

引用 0|浏览0
暂无评分
摘要
Supplemental material 1 Supplementary Text 1.1 Data processing for ENCODE dataset We preprocessed the datasets by dividing the genome into 200-bp non-overlapping bins and then binarized the reads within each bin, similar to [1]. For binarization, we assign value 1 if the total number of reads located within the bin is above the threshold corresponding to a p-value of 10 under a Poisson model, where the Poisson rate λ is the number of reads in all replicates of an experiment divided by the length of the genome. To reduce computational cost, we segmented the genome into regions with and without chromatin marks and only use the regions with sufficient reads present. To do this, binned read counts across all species and all marks were summed together into a single track and convolved using a 1-D Gaussian kernel acting over σ = 40kb. Only regions with at least 0.5 smoothed reads across at least 5kb were retained as having sufficient signal to include in training. In total, these segments covered 54.8% of the genome and inference proceeded on each segment in parallel.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要