Evaluating and mitigating gender bias in machine learning based resume filtering

Gagandeep,Jaskirat Kaur, Sanket Mathur,Sukhpreet Kaur,Anand Nayyar,Simar Preet Singh,Sandeep Mathur

Multimedia Tools and Applications（2024）

引用 0|浏览0

暂无评分

摘要

Shortlisting resumes for the companies are being automated using artificial intelligence however, training systems to do that incorporate high social biases in the models. Considering the vitality of mitigating gender bias present in society, the research introduces a method for hiding gender specific terms from data, termed as Gender Masking, before finding the similarity with the job requirements. The paper ideates a method of reduction in indulgence of social biases in machine learning based resume filtering algorithms. In addition, an evaluation method is proposed to justify exclusion of gender specific terms from classification of resumes short-listed for a particular role based upon requirements. The novelty of the proposed method is that upon extraction of information from the resume based on probabilistic indexing, the gender specific terms are masked. This corpus is used as the received information in form of word encoding, across the stated requirements in order to retrieve a similarity score of the information using cosine similarity in correspondence to the posting. The proposed model is evaluated using gender-swapped corpus to ensure unbiased performance of the algorithm. The evaluation method represents the performance variation of the text on swapping the gender, it represents the unintentional differences the algorithm captures based on the biases present in the society. The experimental research is taken out on preprocessed datasets (Online Resume Datasets), from which an average of 15.46% are observed to have been affected by gender bias, which is omitted through the proposed method. From the results computed, an average increase of 1.2% accuracy on the trained Random Forest model is experienced outperforming state-of-the-art techniques of training generic Linear SVM, Logistic Regression and Multinomial Naive Bayes models. The model is regularized to have 100 maximum trees in the ensemble along with 20 maximum depth and 10 minimum samples to split the nodes.

查看译文

关键词

Gender Bias,Information extraction,Resume filtering,Resume classification,Word Embeddings,Vectorization,Gender masking

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要