Creating Transparent and Reproducible Pipelines: Best Practices for Tools, Data, and Workflow Management Systems

Human Genome Informatics(2018)

引用 1|浏览0
暂无评分
摘要
Abstract Recently, the practice of properly sharing the source code, analysis pipelines, and protocols of published studies has become commonplace in bioinformatics. In addition, there is a plethora of technically mature workflow management systems (WMS) that offer simple and user-friendly environments where users can submit tools and build transparent, shareable, and reproducible pipelines. Arguably, the adoption of open science policies and the availability of efficient WMSs constitute major progress toward battling the replication crisis, advancing research dissemination, and creating new collaborations. Yet now we still see that it is very difficult to include a large range of tools in a scientific pipeline, whereas on the other side, certain technical and design choices of modern WMSs discourage users from doing just this. Here we present three sets of easily applicable “best practices” targeting (i) bioinformatics tool developers, (ii) data curators, and (iii) WMS engineers, respectively. These practices aim to make it easier to add tools to a pipeline, to make it easier to directly process data, and to make WMSs widely hospitable for any external tool or pipeline. We also show how following these guidelines can directly benefit the research community.
更多
查看译文
关键词
reproducible pipelines,tools,management systems,best practices
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要