Classification of RNA-Sequencing Data Via Poisson and Negative Binomial Linear Discriminant Analyses: A Methodological Study

Turkiye Klinikleri Journal of Biostatistics（2023）

引用 0|浏览4

暂无评分

摘要

Objective: Microarray and RNA sequencing (RNA-Seq) technologies are frequently employed in genetic data analysis for detecting disease-associated genes, identifying cancer subtypes, and enabling molecular diagnosis.While numerous methods have been proposed for classification problems using microarray data, there is a paucity of developed methods for classifying RNA-Seq data.This study aims to compare the performance of novel methods developed for RNA-Seq data on 3 distinct real-life datasets.Material and Methods: Cervical cancer, Alzheimer's disease, and kidney cancer RNA-Seq data were utilized in this study.The data were divided into training and test sets in a %70 and %30 ratio, respectively.Various preprocessing steps, such as normalization, power transformation, and variance filtering, were applied to the data.The Poisson Linear Discriminant Analysis (PLDA) and Negative Binomial Linear Discriminant Analysis (NBLDA) models were used for classification purposes, and the predictive performances of these models were compared.Results: Among the three datasets, the Alzheimer's data exhibited the lowest level of dispersion, while the cervical cancer data had the highest overdispersion.The NBLDA model demonstrated superior classification performance compared to the PLDA model.In cases of mild-to-moderate overdispersion, the predictive performance of the PLDA model improved when power transformation was applied, resulting in performance similar to that of the NBLDA model.Conclusion: PLDA and NBLDA models are two novel and promising techniques used in classifying RNA-Seq data.The performance of these models is influenced by the degree of overdispersion.In cases of high overdispersion, it is recommended to utilize the NBLDA model.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要