Optimal supervised reduction of high dimensional transcription data.

IEEE/ACM transactions on computational biology and bioinformatics(2023)

Cited 0|Views15
No score
The plight of navigating high-dimensional transcription datasets remains a persistent problem. This problem is further amplified for complex disorders, such as cancer as these disorders are often multigenic traits with multiple subsets of genes collectively affecting the type, stage, and severity of the trait. We are often faced with a trade off between reducing the dimensionality of our datasets and maintaining the integrity of our data. To accomplish both tasks simultaneously for very high dimensional transcriptome for complex multigenic traits, we propose a new supervised technique, Class Separation Transformation (CST). CST accomplishes both tasks simultaneously by significantly reducing the dimensionality of the input space into a one-dimensional transformed space that provides optimal separation between the differing classes. Furthermore, CST offers an means of explainable ML, as it computes the relative importance of each feature for its contribution to class distinction, which can thus lead to deeper insights and discovery. We compare our method with existing state-of-the-art methods using both real and synthetic datasets, demonstrating that CST is the more accurate, robust, scalable, and computationally advantageous technique relative to existing methods. Code used in this paper is available on https://github.com/richiebailey74/CST.
Translated text
Key words
Supervised reduction, explainable machine learning, optimal class separation
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined