Classification of Poverty Condition Using Natural Language Processing

Muñetón-Santa Guberney,Escobar-Grisales Daniel,López-Pabón Felipe Orlando,Pérez-Toro Paula Andrea,Orozco-Arroyave Juan Rafael

Social indicators research（2022）

引用 1|浏览15

暂无评分

摘要

This work introduces a methodology to classify between poor and extremely poor people through Natural Language Processing. The approach serves as a baseline to understand and classify poverty through the people’s discourses using machine learning algorithms. Based on classical and modern word vector representations we propose two strategies for document level representations: (1) document-level features based on the concatenation of descriptive statistics and (2) Gaussian mixture models. Three classification methods are systematically evaluated: Support Vector Machines, Random Forest, and Extreme Gradient Boosting. The fourth best experiments yielded around 55% of accuracy, while the embeddings based on GloVe word vectors yielded a sensitivity of 79.6% which could be of great interest for the public policy makers to accurately find people who need to be prioritized in social programs.

查看译文

关键词

Poverty,Natural language processing,Text classification,Word embedding,Document-level embedding,Machine learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要