Generalized Zero-Shot Text Classification for ICD Coding

IJCAI 2020(2020)

引用 50|浏览258
暂无评分
摘要
The International Classification of Diseases (ICD) is a list of classification codes for the diagnoses. Automatic ICD coding is a multi-label text classification problem with noisy clinical document inputs and long-tailed label distribution, making it difficult for fine-grained classification on both frequent and zero-shot codes at the same time, i.e. generalized zero-shot ICD coding. In this paper, we propose a latent feature generation framework to improve the prediction on unseen codes without compromising the performance on seen codes. Our framework generates semantically meaningful features for zero-shot codes by exploiting ICD code hierarchical structure and reconstructing the code-relevant keywords with a novel cycle architecture. To the best of our knowledge, this is the first adversarial generative model for generalized zero-shot learning on multi-label text classification. Extensive experiments demonstrate the effectiveness of our approach. On the public MIMIC-III dataset, our methods improve the F1 score from nearly 0 to 20:91% for the zero-shot codes, and increase the AUC score by 3% (absolute improvement) from previous state of the art. Code is available at https://github.com/csong27/gzsl text.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要