Chrome Extension
WeChat Mini Program
Use on ChatGLM

Automatic Metamorphic Test Oracles for Action-Policy Testing.

Proceedings of the International Conference on Automated Planning and Scheduling(2023)

Saarland University | Aalborg University | TU Wien

Cited 2|Views24
Abstract
Testing is a promising way to gain trust in learned action policies π. Prior work on action-policy testing in AI planning formalized bugs as states t where π is sub-optimal with respect to a given testing objective. Deciding whether or not t is a bug is as hard as (optimal) planning itself. How can we design test oracles able to recognize some states t to be bugs efficiently ? Recent work introduced metamorphic oracles which compare policy behavior on state pairs ( s, t ) where t is easier to solve; if π performs worse on t than on s , we know that t is a bug. Here, we show how to automatically design such oracles in classical planning, based on simulation relations between states. We introduce two oracle families of this kind: first, morphing query states t to obtain suitable s ; second, maintaining and comparing upper bounds on h * across the states encountered during testing. Our experiments on ASNet policies show that these oracles can find bugs much more quickly than the existing alternatives, which are search-based; and that the combination of our oracles with search-based ones almost consistently dominates all other oracles.
More
Translated text
Key words
Automated Testing,Mutation Testing
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers

Curiosity-Driven Testing for Sequential Decision-Making Process

ICSE '24 Proceedings of the IEEE/ACM 46th International Conference on Software Engineering 2024

被引用0

New Fuzzing Biases for Action Policy Testing

Proceedings of the International Conference on Automated Planning and Scheduling/Proceedings of the International Conference on Automated Planning and Scheduling 2024

被引用0

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了一种基于模拟关系在经典规划中自动设计变形测试 oracle 的方法,有效识别行动策略中的错误状态,提高了测试效率。

方法】:通过使用状态间的模拟关系,文中引入了两种变形测试 oracle 家族:一种是变形查询状态 t 以获得合适的 s ,另一种是维护和比较在测试过程中遇到的状态的 h* 上界。

实验】:在 ASNet 策略上的实验表明,这些 oracle 能够比现有基于搜索的方法更快地找到错误,且结合搜索型 oracle 的组合几乎总是优于其他所有 oracle。