A comparison of reinforcement learning frameworks for software testing tasks

Paulina Stevia Nouwou Mindom,Amin Nikanjam,Foutse Khomh

arxiv（2023）

引用 0|浏览41

暂无评分

摘要

Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Although various approaches of software testing have shown to be very promising in revealing defects in software, some of them lack automation or are partly automated which increases the testing time, the manpower needed, and overall software testing costs. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, therefore, we empirically investigate the applications of carefully selected DRL algorithms (based on the characteristics of algorithms and environments) on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run extensive experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. We find some cases where our DRL configurations outperform the implementation of the baseline. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation. Moreover, empirical evaluations on some benchmark problems are recommended for researchers looking to select DRL frameworks, to make sure that DRL algorithms perform as intended.

查看译文

关键词

software testing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要