Examining Batch Effect in Histopathology as a Distributionally Robust Optimization Problem

bioRxiv(2021)

引用 0|浏览23
暂无评分
摘要
Computer vision (CV) approaches applied to digital pathology have informed biological discovery and development of tools to help inform clinical decision-making. However, batch effects in the images have the potential to introduce spurious confounders and represent a major challenge to effective analysis and interpretation of these data. Standard methods to circumvent learning such confounders include (i) application of image augmentation techniques and (ii) examination of the learning process by evaluating through external validation (e.g., unseen data coming from a comparable dataset collected at another hospital). Here, we show that the source site of a histopathology slide can be learned from the image using CV algorithms in spite of image augmentation, and we explore these source site predictions using interpretability tools. A CV model trained using Empirical Risk Minimization (ERM) risks learning this source-site signal as a spurious correlate in the weak-label regime, which we abate by using a training method with abstention. We find that a patch based classifier trained using abstention outperformed a model trained using ERM by 9.9, 10 and 19.4% F1 in the binary classification tasks of identifying tumor versus normal tissue in lung adenocarcinoma, Gleason score in prostate adenocarcinoma, and tumor tissue grade in clear cell renal cell carcinoma, respectively, at the expense of up to 80% coverage (defined as the percent of tiles not abstained on by the model). Further, by examining the areas abstained by the model, we find that the model trained using abstention is more robust to heterogeneity, artifacts and spurious correlates in the tissue. Thus, a method trained with abstention may offer novel insights into relevant areas of the tissue contributing to a particular phenotype. Together, we suggest using data augmentation methods that help mitigate a digital pathology model’s reliance on potentially spurious visual features, as well as selecting models that can identify features truly relevant for translational discovery and clinical decision support. ### Competing Interest Statement Eliezer M. Van Allen Disclosures (last updated 10/19/2021) Advisory/Consulting: Tango Therapeutics, Genome Medical, Invitae, Enara Bio, Janssen, Manifold Bio, Monte Rosa Research support: Novartis, BMS Equity: Tango Therapeutics, Genome Medical, Syapse, Enara Bio, Manifold Bio, Microsoft, Monte Rosa Travel reimbursement: Roche/Genentech Patents: Institutional patents filed on chromatin mutations and immunotherapy response, and methods for clinical interpretation; intermittent legal consulting on patents for Foaley & Hoag
更多
查看译文
关键词
histopathology,distributionally robust optimization problem,batch effect,optimization problem
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要