Exploring regular expression comprehension.

Carl Chapman,Peipei Wang,Kathryn T. Stolee

ASE（2017）

引用 50|浏览46

暂无评分

摘要

The regular expression (regex) is a powerful tool employed in a large variety of software engineering tasks. However, prior work has shown that regexes can be very complex and that it could be difficult for developers to compose and understand them. This work seeks to identify code smells that impact comprehension. We conduct an empirical study on 42 of pairs of behaviorally equivalent but syntactically different regexes using 180 participants and evaluated the understandability of various regex language features. We further analyzed regexes in GitHub to find the community standards or the common usages of various features. We found that some regex expression representations are more understandable than others. For example, using a range (e.g., [0-9]) is often more understandable than a default character class (e.g., [d]). We also found that the DFA size of a regex significantly affects comprehension for the regexes studied. The larger the DFA of a regex (up to size eight), the more understandable it was. Finally, we identify smelly and non-smelly regex representations based on a combination of community standards and understandability metrics.

查看译文

关键词

Regular expression comprehension, equivalence class, regex representations

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要