Anovos: A Scalable Feature Engineering Library.

Anindya Datta,Sangaralingam Kajanan, Sinuo Chen, Dat Tran,Sourjya Sen, Ravish Ranjan

Big Data(2022)

引用 0|浏览1
In the current era of big data, the amount of data a company can acquire is growing exponentially. However, the data are only meaningful if they are used wisely. This paper introduces Anovos, an open-source library built on top of Apache Spark. It is designed to perform efficient, end-to-end feature engineering at scale (with TBs of Data), and helps implement a systematic and procedural data pipeline with enterprise data at one end and model-ready features at the other. Besides improving the current exploratory data analysis process, we have also introduced a few key innovations in Anovos: the concept of data stability index, a single-metric indication of the stability of an independent variable in a longitudinal way, as well as Feature Explorer and Feature Mapper, powered by semantic similarity-based AI models, in order to solve the cold-start problem of building high-quality predictive features for the model training process.
scalable feature engineering library
AI 理解论文
Chat Paper