A framework for de-identification of free-text data in electronic medical records enabling secondary use.

Australian health review : a publication of the Australian Hospital Association(2022)

引用 4|浏览10
暂无评分
摘要
Clinical free-text data represent a vast, untapped source of rich information. If more accessible for research it would supplement information captured in structured fields. Data need to be de-identified prior to being reused for research. However, a lack of transparency with existing de-identification software tools makes it difficult for data custodians to assess potential risks associated with the release of de-identified clinical free-text data. This case study describes the development of a framework for releasing de-identified clinical free-text data in two local health districts in NSW, Australia. A sample of clinical documents (n  = 14 768 965), including progress notes, nursing and medical assessments and discharge summaries, were used for development. An algorithm was designed to identify and mask patient names without damaging data utility. For each note, the algorithm output the (i) note length before and after de-identification, (ii) the number of patient names and (iii) the number of common words. These outputs were used to iteratively refine the algorithm performance. This was followed by manual review of a random subset of records by a health information manager. Notes that were not correctly de-identified were fixed, and performance was reassessed until resolution. All notes in this sample were suitably de-identified using this method. Developing a transparent method for de-identifying clinical free-text data enables informed-decision making by data custodians and the safe re-use of clinical free-text data for research and public benefit.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要