The Fault in Our Data Stars: Studying Mitigation Techniques against Faulty Training Data in Machine Learning Applications

Abraham Chan,Arpan Gujarati,Karthik Pattabiraman,Sathish Gopalakrishnan

2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)（2022）

引用 4|浏览18

暂无评分

摘要

Machine learning (ML) has been adopted in many safety-critical applications like automated driving and medical diagnosis. Incorrect decisions by ML models can lead to catastrophic consequences, such as vehicle crashes and inappropriate medical procedures, thereby endangering our lives. The correct behaviour of a ML model is contingent upon the availability of well-labelled training data. However, obtaining large and high-quality training datasets for safety-critical applications is difficult, often resulting in the use of faulty training data.We compare the efficacy of five different error mitigation techniques, derived from a survey of more than 200 related articles, which are designed to tolerate noisy/faulty training data. We experimentally find that the error mitigation capabilities of these techniques vary across datasets, ML models, and different kinds of faults. We further find that ensemble learning offers the highest resilience among all the techniques across different configurations, followed by label smoothing.

查看译文

关键词

Error resilience,Machine learning,Training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要