Multiple-trait subsampling for optimized ancestral trait reconstruction

bioRxiv (Cold Spring Harbor Laboratory)(2022)

引用 0|浏览5
暂无评分
摘要
Large datasets along with sampling bias represent a challenge for phylodynamic reconstructions, particularly when the study data are obtained from various heterogeneous sources and/or through convenience sampling. In this study, we evaluate the presence of unbalanced sampled distribution by collection date, location, and risk group of HIV-1 subtype C using a compre-hensive subsampling strategy, and assess their impact on the reconstruction of the viral spatial and risk group dynamics using phylogenetic comparative methods. Our study shows that the most suitable dataset for ancestral trait reconstruction can be obtained through subsampling by collection date, location, and risk group, particularly using multigene datasets. We also demonstrate that sampling bias is inflated when considerable information for a given trait is unavailable or of poor quality, as we observed for the risk group in the analysis of HIV-1 subtype C. In conclusion, we suggest that, even if traits are not well recorded, including them deliberately optimizes the representativeness of the original dataset rather than completely excluding them. Therefore, we advise the inclusion of as many traits as possible with the aid of subsampling approaches in order to optimize the dataset for phylodynamic analysis while reducing the computational burden. This will benefit research communities investigating the evolutionary and spatiotemporal patterns of infectious diseases. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
reconstruction,multiple-trait
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要