Guided Discrete Diffusion for Electronic Health Record Generation
arxiv(2024)
摘要
Electronic health records (EHRs) are a pivotal data source that enables
numerous applications in computational medicine, e.g., disease progression
prediction, clinical trial design, and health economics and outcomes research.
Despite wide usability, their sensitive nature raises privacy and
confidentially concerns, which limit potential use cases. To tackle these
challenges, we explore the use of generative models to synthesize artificial,
yet realistic EHRs. While diffusion-based methods have recently demonstrated
state-of-the-art performance in generating other data modalities and overcome
the training instability and mode collapse issues that plague previous
GAN-based approaches, their applications in EHR generation remain
underexplored. The discrete nature of tabular medical code data in EHRs poses
challenges for high-quality data generation, especially for continuous
diffusion models. To this end, we introduce a novel tabular EHR generation
method, EHR-D3PM, which enables both unconditional and conditional generation
using the discrete diffusion model. Our experiments demonstrate that EHR-D3PM
significantly outperforms existing generative baselines on comprehensive
fidelity and utility metrics while maintaining less membership vulnerability
risks. Furthermore, we show EHR-D3PM is effective as a data augmentation method
and enhances performance on downstream tasks when combined with real data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要