Deciphering Student Coding Behavior: Interpretable Keystroke Features and Ensemble Strategies for Grade Prediction.

Muhammad Fawad Akbar Khan,John Edwards,Paul M. Bodily,Hamid Karimi

2023 IEEE International Conference on Big Data (BigData)（2023）

引用 0|浏览0

暂无评分

摘要

Keystroke data in programming reveals intricate patterns that reflect the behavior of programmers. These patterns hold promise for predicting grades and other applications, providing insights into the skills of both proficient and less proficient programmers. Analyzing these patterns can yield tailored feedback for students who need support, enabling effective interventions. Our study utilizes a keystroke dataset from the CS1 (Introduction to Computer Science) course at Utah State University. We developed novel features by combining elements like key presses, timestamps, source locations, and programming terminology, drawing on prior research, our insights, and an analysis of programming behavior. An ensemble-based feature selection method identifies key features, which are then used in hyperparameter optimization and grade prediction with six classification and three regression algorithms. We categorized grades into three levels: Low, Average, and High. Despite challenges such as class imbalance, plagiarism, limited data per assignment, and the ceiling effect, we attained a notable weighted F1 score of 78%. We also introduce an ensemble classification strategy, merging Isolation Forest outlier detection with a refined Random Forest classifier, achieving 80% accuracy on our test set. Additionally, we provide a detailed interpretation of our features, supported by results and a case study of our dataset. This research aims to enhance computer science education at the undergraduate level, focusing on improving its overall quality. Code and data are available https://github.com/DSAatUSU/Student-Coding-Behavior.git.

查看译文

关键词

Keystroke,Programming,Python,Grade prediction,Machine learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要