Alleviating overfitting in transformation-interaction-rational symbolic regression with multi-objective optimization

Genetic Programming and Evolvable Machines(2023)

引用 0|浏览4
暂无评分
摘要
The Transformation-Interaction-Rational is a representation for symbolic regression that limits the search space of functions to the ratio of two nonlinear functions each one defined as the linear regression of transformed variables. This representation has the main objective to bias the search towards simpler expressions while keeping the approximation power of standard approaches. The performance of using Genetic Programming with this representation was substantially better than with its predecessor (Interaction-Transformation) and ranked close to the state-of-the-art on a contemporary Symbolic Regression benchmark. On a closer look at these results, we observed that the performance could be further improved with an additional selective pressure for smaller expressions when the dataset contains just a few data points. The introduction of a penalization term applied to the fitness measure improved the results on these smaller datasets. One problem with this approach is that it introduces two additional hyperparameters: (i) a criterion for when the penalization should be activated and, (ii) the amount of penalization to the fitness function. One possible solution to alleviate this additional burden of correctly setting these hyperparameters is to pose the search as a multi-objective optimization problem by minimizing the approximation error and the expression size. The main idea is that the selective pressure of finding non-dominating solutions will return the simplest model for each particular approximation error in the pareto front. In this paper, we extend Transformation-Interaction-Rational to support multi-objective optimization, specifically the NSGA-II algorithm, and apply that to the same benchmark. A detailed analysis of the results show that the use of multi-objective optimization benefits the overall performance on a subset of the benchmarks while keeping the results similar to the single-objective approach on the remainder of the datasets. Specifically to the small datasets, we observe a small (and statistically insignificant) improvement of the results suggesting that further strategies must be explored.
更多
查看译文
关键词
Symbolic regression,Genetic programming,Multi-objective
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要