Development of a Natural Language Processing Pipeline to Identify Histological Subtypes and Site of Cancer from Pathology Reports

Faith Sze Ee Ng, Guat Hwa Low, See Boon Tay, Han Jieh Tey, Fun Loon Leong,Choon Hua Thng, Iain Bee Huat Tan,Ryan Shea Ying Cong Tan

Research Square (Research Square)(2022)

引用 0|浏览0
暂无评分
摘要
Abstract Purpose To develop a Natural Language Processing (NLP) pipeline with the ability to determine the histological subtype and site of a patient’s cancer from pathology reports. Methods A Spark NLP-based deep learning model pipeline was developed to perform named entity recognition (NER) and assertion status detection for histological subtypes before extracting key relations of interest to determine the site of a patient’s cancer from pathology reports. We assessed the ability of this NLP pipeline to extract histological subtypes and site of a patient’s cancer against manual curation of pathology reports. Results A total of 1358 reports from 474 patients seen at a single tertiary cancer centre were used in the development and validation of the pipeline. The NLP pipeline achieved a mean accuracy of 99.79% and an F1 score of 84.08% for NER of histological subtypes. The relation extraction (RE) model also achieved an average accuracy of 91.96% and an F1-score of 92.45% for key entity relations relevant to histological subtypes entities. Conclusion We developed an NLP pipeline that can extract the histological subtypes and relate them to the site of a patient’s cancer from free-text pathology reports with high accuracy. This has the potential to be deployed for both research and clinical quality processes.
更多
查看译文
关键词
natural language processing pipeline,pathology reports,natural language,cancer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要