Abstract 4331: Machine learning-based classification of tissue origin of cancer using methylation profiles

Marco A. De Velasco, Kazuko Sskai, Seiichiro Mitani,Yurie Kura, Shuji Minamoto,Takahiro Haeno,Hidetoshi Hayashi,Kazuto Nishio

Cancer Research(2024)

引用 0|浏览1
暂无评分
摘要
Abstract Cancer of unknown primary (CUP) is a malignancy with poor prognosis and an unknown primary site and histologically unknown metastasis. Most patients receive empiric chemotherapy including platinum-taxane therapies but experience short survival times. Patients with poor prognosis CUP could benefit from optimizing drug therapy based on primary organ estimation. We constructed and evaluated an ensemble learning model to accurately determine the primary organ using methylation profiles of tumor tissues. Methylation data from 890 samples representing 10 types of cancer from TCGA were analyzed. After data preprocessing, we extracted the top 10,000 CpGs sites based on ANOVA and Gain Ratio or 100 CpG sites from a Gradient Boosting classifier. Performance was evaluated using several machine learning models. Unsupervised analysis was carried out to determine the relationships between the CpG sites selected by Gradient Boosting. Methylation profiling by ANOVA and Gain Ratio yielded favorable performance when using various machine learning models. Using gradient boosting as a feature selector reduced the number of CpG sites by 100-fold without compromising model performance. The training and validation sets showed favorable results for the classification of primary organs with ensemble models. In validation, classification accuracy was 91.2%, 93.5%, 89.7%, and 87.7% for Extreme Gradient Boosting, CatBoost, Random Forest, and Gradient Boosting, respectively. Further profiling of the selected methylation regions was correlated with cancer types and even revealed subgroups within breast and lung cancers. Gradient Boosting as a feature selector for DNA methylation profiling was highly effective in accurately determining tissue origin. Our study has outlined an approach whereby we used an embedded machine learning algorithm to identify a select set of informative features from complex high-dimension data to train and predict cancer type. Citation Format: Marco A. De Velasco, Kazuko Sskai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio. Machine learning-based classification of tissue origin of cancer using methylation profiles [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4331.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要