Applying Pairwise Combinatorial Testing to Large Language Model Testing

Bernhard Garn,Ludwig Kampel,Manuel Leithner,Berina Celic, Ceren Culha,Irene Hiess,Klaus Kieseberg,Marlene Koelbing,Dominik-Philip Schreiber,Michael Wagner, Christoph Wech,Jovan Zivanovic,Dimitris E. Simos

TESTING SOFTWARE AND SYSTEMS, ICTSS 2023（2023）

引用 0|浏览13

暂无评分

摘要

In this paper, we report on applying combinatorial testing to large language models (LLMs) testing. Our aim is to pioneer the usage of combinatorial testing to be used in the realm of LLMs, e.g. for the generation of additional training or test data. We first describe how to create an input parameter model for the input of an LLM. Based on a given original sentence, we derive new sentences by replacing words with synonyms according to a combinatorial test set, leading to a specified level of coverage over synonyms while attaining an efficient diversification. Assuming that the semantics of the original sentence are retained in the derived sentences, we construct a test oracle based on existing annotations. In an experimental evaluation, we apply generated pairwise sentence test sets from the BoolQ benchmark set [4] against two LLMs (T5 [12] and LLaMa [15]). Having automated our approach for test sentence generation, as well as their execution and analysis, our experimental evaluations demonstrate the applicability of pairwise combinatorial testing methods to LLMs.

查看译文

关键词

large language models,combinatorial testing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要