FAIR enough? A perspective on the status of nucleotide sequence data and metadata on public archives

Christiane Hassenrück,Tobias Poprick,Véronique Helfer,Massimiliano Molari,Raissa Meyer,Ivaylo Kostadinov

biorxiv（2021）

引用 0|浏览4

暂无评分

摘要

Knowledge derived from nucleotide sequence data is increasing in importance in the life sciences, as well as decision making (mainly in biodiversity policy). Metadata standards have been established to facilitate sustainable sequence data management according to the FAIR principles (Findability, Accessibility, Interoperability, Reusability). Here, we review the status of metadata available for raw read Illumina amplicon and whole genome shotgun sequencing data derived from ecological metagenomic material that are accessible at the European Nucleotide Archive (ENA), as well as the compliance of the primary sequence data (fastq files) with data submission requirements. While overall basic metadata, such as geographic coordinates, were retrievable in 98% of the cases for this type of sequence data, interoperability was not always ensured and other (mainly conditionally) mandatory parameters were often not provided at all. Metadata standards, such as the ‘Minimum Information about any(x) Sequence (MIxS)’, were only infrequently used despite a demonstrated positive impact on metadata quality. Furthermore, the sequence data itself did not meet the prescribed requirements in 31 out of 39 studies that were manually inspected. To tackle the most immediate needs to improve FAIR sequence data management, we provide a list of minimal suggestions to researchers, research institutions, funding agencies, reviewers, publishers, and databases, that we believe might have a potentially large positive impact on sequence data and metadata FAIRness, which is crucial for further research and its derived applications. ### Competing Interest Statement The authors have declared no competing interest.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要