pfb_fhir: A utility to extract clinical data systems into a portable format

Brian Walsh, Jordan A. Lee,Kyle Ellrott

medRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览3
暂无评分
摘要
Background Fast Healthcare Interoperability Resources (FHIR) is a server specification and data model that allows for EHR systems to represent clinical metadata using a consistent API. There is a critical mass of EHR and clinical trial data stored in FHIR based systems. Research analysts can take advantage of existing FHIR tooling for de-identification, pseudonymization, and anonymization. More recently the BiodataCatalyst consortium has proposed the Portable Format for Bioinformatics (PFB) which is a carrier format for describing raw data and the data model in which it is structured, based on an efficient binary format (AVRO). PFB allows an entire cohort of metadata to be loaded into a research data system. Here, we describe an open source utility that will scan FHIR based systems and create PFB based archives. Results pfb_fhir scans data from FHIR based clinical data systems and converts the data into a self contained PFB file. This utility identifies types, customizations (extensions), and element connections. It then converts all of these components into a graph model compatible for storage in the PFB specification. The structure of the original FHIR system is faithfully reproduced using the PFB schema description system. All records from the system are downloaded, converted and stored as vertices in a graph described by the PFB file. This system has been tested against a number of different FHIR installations, including ones hosted by dbGAP, The Kids First Data Resource and AnVIL. Conclusions pfb\_fhir helps to unlock the potential of EHR and clinical trial data. pfb\_fhir allows researchers to easily scan and store FHIR resources and create self contained PFB archives, called FHIR in PFB. These archive files can easily be moved to new data systems, allowing the clinical data to be connected to more complex genomic analysis and data science platforms. The FHIR in PFB archives generated by pfb_fhir have been loaded into data platforms including the Broad’s Terra system, Gen3 based data system, custom graph query engines and Jupyter notebooks. This flexibility will enable genomics investigators to do more integrated genotype to phenotype association analysis using whichever tools suit their line of research. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement 1U24HG010263-01 ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes All data produced are available online at * #### Abbreviations ANSI : American National Standards Institute API : Application Programming Interface AVRO : A data serialization framework developed within Apache’s Hadoop project DRS : GA4GH’s Data Repository Service EHR : Electronic Health Record FHIR : Fast Healthcare Interoperability Resources GA4GH : Global Alliance for Genomic Health HL7 : Health Level Seven NCPI : NIH Cloud Platform Interoperability NIH : National Institutes of Health PFB : Portable Format for Bioinformatics URI : Uniform Resource Identifier UUID : Universally Unique Identifier dbGaP : Database of Genotypes and Phenotypes
更多
查看译文
关键词
clinical data systems,extract
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要