G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection

Cited 10|Views46

Abstract

In this paper, we focus on a realistic yet challenging task, Single DomainGeneralization Object Detection (S-DGOD), where only one source domain's datacan be used for training object detectors, but have to generalize multipledistinct target domains. In S-DGOD, both high-capacity fitting andgeneralization abilities are needed due to the task's complexity.Differentiable Neural Architecture Search (NAS) is known for its high capacityfor complex data fitting and we propose to leverage Differentiable NAS to solveS-DGOD. However, it may confront severe over-fitting issues due to the featureimbalance phenomenon, where parameters optimized by gradient descent are biasedto learn from the easy-to-learn features, which are usually non-causal andspuriously correlated to ground truth labels, such as the features ofbackground in object detection data. Consequently, this leads to seriousperformance degradation, especially in generalizing to unseen target domainswith huge domain gaps between the source domain and target domains. To addressthis issue, we propose the Generalizable loss (G-loss), which is an OoD-awareobjective, preventing NAS from over-fitting by using gradient descent tooptimize parameters not only on a subset of easy-to-learn features but also theremaining predictive features for generalization, and the overall framework isnamed G-NAS. Experimental results on the S-DGOD urban-scene datasetsdemonstrate that the proposed G-NAS achieves SOTA performance compared tobaseline methods. Codes are available at https://github.com/wufan-cse/G-NAS.

Translated text

Key words

Object Detection

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Try using models to generate summary,it takes about 60s

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper