Auto-Prep: Efficient and Automated Data Preprocessing Pipeline

Mehwish Bilal,Ghulam Ali,Muhammad Waseem Iqbal,Muhammad Anwar,Muhammad Sheraz Arshad Malik,Rabiah Abdul Kadir

IEEE Access（2022）

引用 6|浏览2

暂无评分

摘要

Data preprocessing is crucial in the Machine Learning pipeline because the models' learning ability directly affects the quality of data and the underlying information acquired from this stage. Nevertheless, surprisingly, there are many alternatives for each transformation task, which makes an inexperienced user overwhelmed. A simple Python-based Auto-preprocessing architecture for Automated Machine Learning is developed to offer automated, interactive, and data-driven support to help the users perform data preprocessing tasks efficiently. The suggested method provides valuable insights into a dataset and can handle standard data preprocessing tasks adeptly. Initially, it detects the data problem and presents it to the end-user using compelling visualizations. Then, it recommends the most effective data cleaning and preparation method to the user after evaluating the state-of-the-art candidate techniques. For evaluation, the proposed architecture is employed on ten different and diverse datasets for automatic data preprocessing before passing it to an ML algorithm. The results are then compared with the results generated by the same ML algorithm but implemented on manually preprocessed data. The results have shown that not only did this approach make the whole process uncomplicated and facile, but it was also able to improve the performance of the model significantly.

查看译文

关键词

Encoding, Data preprocessing, Machine learning, Feature extraction, Data models, Dimensionality reduction, Support vector machines, Pipelines, Automated machine learning, data preprocessing, feature engineering

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要