TabText: a Systematic Approach to Aggregate Knowledge Across Tabular Data Structures

Dimitris Bertsimas,Kimberly Villalobos Carballo,Yu Ma,Liangyuan Na,Léonard Boussioux,Cynthia Zeng,Luis R. Soenksen,Ignacio Fuentes

arxiv（2022）

引用 0|浏览5

暂无评分

摘要

Processing and analyzing tabular data in a productive and efficient way is essential for building successful applications of machine learning in fields such as healthcare. However, the lack of a unified framework for representing and standardizing tabular information poses a significant challenge to researchers and professionals alike. In this work, we present TabText, a methodology that leverages the unstructured data format of language to encode tabular data from different table structures and time periods efficiently and accurately. We show using two healthcare datasets and four prediction tasks that features extracted via TabText outperform those extracted with traditional processing methods by 2-5%. Furthermore, we analyze the sensitivity of our framework against different choices for sentence representations of missing values, meta information and language descriptiveness, and provide insights into winning strategies that improve performance.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要