s complexity measure, $d$ is the dimension of the representation (usually $d\\ll \\mathcal{C}(\\Phi)$) and $n$ is the number of samples for the new task. Thus the required $n$ is $O(\\kappa d H^4)$ for the sub-optimality to be close to zero, which is much smaller than $O(\\mathcal{C}(\\Phi)^2\\kappa d H^4)$ in the setting without multitask representation learning, whose sub-optimality gap is $\\tilde{O}(H^2\\sqrt{\\frac{\\kappa \\mathcal{C}(\\Phi)^2d}{n}})$. This theoretically explains the power of multitask representation learning in reducing sample complexity. Further, we note that to ensure high sample efficiency, the LAFA criterion $\\kappa$ should be small. In fact, $\\kappa$ varies widely in magnitude depending on the different sampling distribution for new task. This indicates adaptive sampling technique is important to make $\\kappa$ solely depend on $d$. Finally, we provide empirical results of a noisy grid-world environment to corroborate our theoretical findings. ","authors":[{"name":"Rui Lu"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"name":"Simon S. Du"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"60cbe98b91e011eef576dab8","lang":"en","num_citation":0,"order":1,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F21\u002F2106\u002F2106.08053.pdf","title":"On the Power of Multitask Representation Learning in Linear MDP","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.08053"],"versions":[{"id":"60cbe98b91e011eef576dab8","sid":"2106.08053","src":"arxiv","year":2021}],"year":2021},{"abstract":" Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline multi-agent RL introduces more agents with the larger state and action space, which is more challenging but attracts little attention. We demonstrate current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in the dataset for value estimation. Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint. Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents. We further show that ICQ achieves the state-of-the-art performance in the challenging multi-agent offline tasks (StarCraft II). ","authors":[{"name":"Yiqin Yang"},{"name":"Xiaoteng Ma"},{"name":"Chenghao Li"},{"name":"Zewu Zheng"},{"name":"Qiyuan Zhang"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"name":"Jun Yang"},{"name":"Qianchuan Zhao"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"60c1a0af91e0112cf43c214c","lang":"en","num_citation":0,"order":5,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F21\u002F2106\u002F2106.03400.pdf","title":"Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.03400"],"versions":[{"id":"60c1a0af91e0112cf43c214c","sid":"2106.03400","src":"arxiv","year":2021}],"year":2021},{"abstract":" Vision Transformers (ViT) have achieved remarkable success in large-scale image recognition. They split every 2D image into a fixed number of patches, each of which is treated as a token. Generally, representing an image with more tokens would lead to higher prediction accuracy, while it also results in drastically increased computational cost. To achieve a decent trade-off between accuracy and speed, the number of tokens is empirically set to 16x16. In this paper, we argue that every image has its own characteristics, and ideally the token number should be conditioned on each individual input. In fact, we have observed that there exist a considerable number of \"easy\" images which can be accurately predicted with a mere number of 4x4 tokens, while only a small fraction of \"hard\" ones need a finer representation. Inspired by this phenomenon, we propose a Dynamic Transformer to automatically configure a proper number of tokens for each input image. This is achieved by cascading multiple Transformers with increasing numbers of tokens, which are sequentially activated in an adaptive fashion at test time, i.e., the inference is terminated once a sufficiently confident prediction is produced. We further design efficient feature reuse and relationship reuse mechanisms across different components of the Dynamic Transformer to reduce redundant computations. Extensive empirical results on ImageNet, CIFAR-10, and CIFAR-100 demonstrate that our method significantly outperforms the competitive baselines in terms of both theoretical computational efficiency and practical inference speed. ","authors":[{"name":"Yulin Wang"},{"name":"Rui Huang"},{"name":"Shiji Song"},{"name":"Zeyi Huang"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"60b712c391e011903fc2bbe2","num_citation":0,"order":4,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F21\u002F2105\u002F2105.15075.pdf","title":"Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.15075"],"versions":[{"id":"60b712c391e011903fc2bbe2","sid":"2105.15075","src":"arxiv","year":2021}],"year":2021},{"abstract":"Reinforcement learning (RL) is a promising technique for designing a model-free controller by interacting with the environment. Several researchers have applied RL to autonomous underwater vehicles (AUVs) for motion control, such as trajectory tracking. However, the existing RL-based controller usually assumes that the unknown AUV dynamics keep invariant during the operation period, limiting its further application in the complex underwater environment. In this article, a novel meta-RL-based control scheme is proposed for trajectory tracking control of AUV in the presence of unknown and time-varying dynamics. To this end, we divide the tracking task for AUV with time-varying dynamics into multiple specific tasks with fixed time-varying dynamics, to which we apply meta-RL for training to distill the general control policy. The obtained control policy can transfer to the testing phase with high adaptability. Inspired by the line-of-sight (LOS) tracking rule, we formulate each specific task as a Markov decision process (MDP) with a well-designed state and reward function. Furthermore, a novel policy network with an attention module is proposed to extract the hidden information of AUV dynamics. The simulation environment with time-varying dynamics is established, and the simulation results reveal the effectiveness of our proposed method.","authors":[{"name":"Peng Jiang"},{"name":"Shiji Song"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"}],"doi":"10.1109\u002FTNNLS.2021.3079148","flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"60b0ca96e4510cd7c8ea3e35","num_citation":0,"order":2,"pages":{"end":"14","start":"1"},"title":"Attention-Based Meta-Reinforcement Learning for Tracking Control of AUV With Time-Varying Dynamics.","urls":["https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpubmed\u002F34029197","https:\u002F\u002Feuropepmc.org\u002Farticle\u002FMED\u002F34029197"],"venue":{"info":{"name":"IEEE transactions on neural networks and learning systems"},"volume":"PP"},"versions":[{"id":"60b0ca96e4510cd7c8ea3e35","sid":"34029197","src":"pubmed","vsid":"101616214","year":2021},{"id":"60e2dc835244ab9dcbfb81f1","sid":"3164385739","src":"mag","vsid":"42080949","year":2021}],"year":2021},{"abstract":" In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency. It is observed that the most informative region in each frame of a video is usually a small image patch, which shifts smoothly across frames. Therefore, we model the patch localization problem as a sequential decision task, and propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus). In specific, a light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions. Then the selected patches are inferred by a high-capacity network for the final prediction. During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices. In addition, we demonstrate that the proposed method can be easily extended by further considering the temporal redundancy, e.g., dynamically skipping less valuable frames. Extensive experiments on five benchmark datasets, i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, demonstrate that our method is significantly more efficient than the competitive baselines. Code will be available at https:\u002F\u002Fgithub.com\u002Fblackfeather-wang\u002FAdaFocus. ","authors":[{"name":"Yulin Wang"},{"name":"Zhaoxi Chen"},{"name":"Haojun Jiang"},{"name":"Shiji Song"},{"name":"Yizeng Han"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"6099028591e011aa8bcb6e2c","num_citation":0,"order":5,"title":"Adaptive Focus for Efficient Video Recognition","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.03245"],"versions":[{"id":"6099028591e011aa8bcb6e2c","sid":"2105.03245","src":"arxiv","year":2021}],"year":2021},{"abstract":" Reusing features in deep networks through dense connectivity is an effective way to achieve high computational efficiency. The recent proposed CondenseNet has shown that this mechanism can be further improved if redundant features are removed. In this paper, we propose an alternative approach named sparse feature reactivation (SFR), aiming at actively increasing the utility of features for reusing. In the proposed network, named CondenseNetV2, each layer can simultaneously learn to 1) selectively reuse a set of most important features from preceding layers; and 2) actively update a set of preceding features to increase their utility for later layers. Our experiments show that the proposed models achieve promising performance on image classification (ImageNet and CIFAR) and object detection (MS COCO) in terms of both theoretical efficiency and practical speed. ","authors":[{"name":"Le Yang"},{"name":"Haojun Jiang"},{"name":"Ruojin Cai"},{"name":"Yulin Wang"},{"name":"Shiji Song"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"name":"Qi Tian"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"607420b091e011c3d22ae837","num_citation":0,"order":5,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F21\u002F2104\u002F2104.04382.pdf","title":"CondenseNet V2: Sparse Feature Reactivation for Deep Networks","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.04382","https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2021\u002Fhtml\u002FYang_CondenseNet_V2_Sparse_Feature_Reactivation_for_Deep_Networks_CVPR_2021_paper.html"],"venue":{"info":{"name":"Proceedings of the IEEE\u002FCVF Conference on Computer Vision and Pattern Recognition (CVPR)"}},"versions":[{"id":"607420b091e011c3d22ae837","sid":"2104.04382","src":"arxiv","year":2021},{"id":"60cb2a281bc21f07d0810f76","sid":"cvpr2021#96","src":"conf_cvpr","year":2021}],"year":2021},{"abstract":" Significant progress has been achieved in automating the design of various components in deep networks. However, the automatic design of loss functions for generic tasks with various evaluation metrics remains under-investigated. Previous works on handcrafting loss functions heavily rely on human expertise, which limits their extendibility. Meanwhile, existing efforts on searching loss functions mainly focus on specific tasks and particular metrics, with task-specific heuristics. Whether such works can be extended to generic tasks is not verified and questionable. In this paper, we propose AutoLoss-Zero, the first general framework for searching loss functions from scratch for generic tasks. Specifically, we design an elementary search space composed only of primitive mathematical operators to accommodate the heterogeneous tasks and evaluation metrics. A variant of the evolutionary algorithm is employed to discover loss functions in the elementary search space. A loss-rejection protocol and a gradient-equivalence-check strategy are developed so as to improve the search efficiency, which are applicable to generic tasks. Extensive experiments on various computer vision tasks demonstrate that our searched loss functions are on par with or superior to existing loss functions, which generalize well to different datasets and networks. Code shall be released. ","authors":[{"name":"Hao Li"},{"name":"Tianwen Fu"},{"id":"562c83cd45cedb3398c48fe5","name":"Jifeng Dai"},{"name":"Hongsheng Li"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"name":"Xizhou Zhu"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"605dbacb91e0113c28655a7b","num_citation":0,"order":4,"title":"AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.14026"],"versions":[{"id":"605dbacb91e0113c28655a7b","sid":"2103.14026","src":"arxiv","year":2021}],"year":2021},{"abstract":"This paper addresses the problem of Sketch-Based Image Retrieval (SBIR), for which bridge the gap between the data representations of sketch images and photo images is considered as the key. Previous works mostly focus on learning a feature space to minimize intra-class distances for both sketches and photos. In contrast, we propose a novel loss function, named Euclidean Margin Softmax (EMS), that not only minimizes intra-class distances but also maximizes inter-class distances simultaneously. It enables us to learn a feature space with high discriminability, leading to highly accurate retrieval. In addition, this loss function is applied to a conditional network architecture, which could incorporate the prior knowledge of whether a sample is a sketch or a photo. We show that the conditional information can be conveniently incorporated to the recently proposed Squeeze and Excitation (SE) module, lead to a conditional SE (CSE) module. Extensive experiments are conducted on two widely used SBIR benchmark datasets. Our approach, although being very simple, achieved new state-of-the-art on both datasets, surpassing existing methods by a large margin.","authors":[{"name":"Peng Lu"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"id":"53f43b18dabfaee43ec5dbb7","name":"Yanwei Fu"},{"id":"53f4826cdabfaedf43690092","name":"Guodong Guo"},{"name":"Hangyu Lin"}],"doi":"","flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"5c2c7a9217c44a4e7cf31763","lang":"en","num_citation":8,"order":1,"pages":{"end":"","start":""},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F18\u002F1812\u002F1812.04275.pdf","title":"Learning Large Euclidean Margin for Sketch-based Image Retrieval.","urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F1812.04275","https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.04275"],"venue":{"info":{"name":"arXiv: Computer Vision and Pattern Recognition"},"issue":"","volume":"abs\u002F1812.04275"},"versions":[{"id":"5e8d92949fced0a24b61b240","sid":"journals\u002Fcorr\u002Fabs-1812-04275","src":"dblp","vsid":"journals\u002Fcorr","year":2018},{"id":"5ce2d093ced107d4c6393fbf","sid":"2903978117","src":"mag","vsid":"2597175965","year":2018},{"id":"5f02bcdbdfae54360a4605d5","sid":"1812.04275","src":"arxiv","year":2021}],"year":2021},{"authors":[{"name":"Shu-Li Cheng"},{"name":"Lie-Jun Wang"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"name":"An-Yu Du"}],"doi":"10.1007\u002Fs11042-019-07753-4","flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"5d0b00b18607575390fdb55d","lang":"en","num_citation":0,"order":2,"pages":{"end":"22755","start":"22733"},"title":"A privacy-preserving image retrieval scheme based secure kNN, DNA coding and deep hashing","venue":{"info":{"name":"Multim. Tools Appl."},"issue":"15","volume":"80"},"versions":[{"id":"5d0b00b18607575390fdb55d","sid":"2946753373","src":"mag","vsid":"110206669","year":2019},{"id":"60f2ae375244ab9dcbb99053","sid":"journals\u002Fmta\u002FChengWHD21","src":"dblp","vsid":"journals\u002Fmta","year":2021}],"year":2021},{"abstract":"Abstract Image-text matching aims to find the relationship between image and text data and to establish a connection between them. The main challenge of image-text matching is the fact that images and texts have different data distributions and feature representations. Current methods for image-text matching fall into two basic types: methods that map image and text data into a common space and then use distance measurements and methods that treat image-text matching as a classification problem. In both cases, the two data modes used are image and text data. In our method, we create a fusion layer to extract intermediate modes, thus improving the image-text processing results. We also propose a concise way to update the loss function that makes it easier for neural networks to handle difficult problems. The proposed method was verified on the Flickr30K and MS-COCO datasets and achieved superior matching results compared to existing methods.","authors":[{"name":"Depeng Wang"},{"name":"Liejun Wang"},{"name":"Shiji Song"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"name":"Yuchen Guo"},{"name":"Shuli Cheng"},{"name":"Naixiang Ao"},{"name":"Anyu Du"}],"doi":"10.1016\u002Fj.neucom.2021.01.124","id":"605021c19e795e8427406e85","lang":"en","num_citation":0,"order":3,"pages":{"end":"259","start":"249"},"title":"Fusion Layer Attention for Image-Text Matching","urls":["https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fabs\u002Fpii\u002FS0925231221002319","https:\u002F\u002Facademic.microsoft.com\u002Fpaper\u002F3132567060\u002Fcitedby\u002Fsearch?q=Fusion+Layer+Attention+for+Image-Text+Matching&qe=RId%253D1985105481&f=&orderBy=0","db\u002Fjournals\u002Fijon\u002Fijon442.html#WangWSHGCAD21","https:\u002F\u002Fdoi.org\u002F10.1016\u002Fj.neucom.2021.01.124"],"venue":{"info":{"name":"Neurocomputing"},"volume":"442"},"versions":[{"id":"605021c19e795e8427406e85","sid":"605021c19e795e8427406e85","src":"user-5d8054e8530c708f9920ccce","year":2021},{"id":"605aa4e3e4510cd7c86f990c","sid":"3132567060","src":"mag","vsid":"45693802","year":2021},{"id":"609f9f5ce4510cd7c8322565","sid":"journals\u002Fijon\u002FWangWSHGCAD21","src":"dblp","vsid":"journals\u002Fijon","year":2021}],"year":2021},{"abstract":"Graph Convolutional Networks (GCNs) have attracted a lot of research interest in the machine learning community in recent years. Although many variants have been proposed, we still lack a systematic view of different GCN models and deep understanding of the relations among them. In this paper, we take a step forward to establish a unified framework for convolution-based graph neural networks, by formulating the basic graph convolution operation as an optimization problem in the graph Fourier space. Under this framework, a variety of popular GCN models, including the vanilla-GCNs, attention-based GCNs and topology-based GCNs, can be interpreted as a same optimization problem but with different carefully designed regularizers. This novel perspective enables a better understanding of the similarities and differences among many widely used GCNs, and may inspire new approaches for designing better models. As a showcase, we also present a novel regularization technique under the proposed framework to tackle the oversmoothing problem in graph convolution. The effectiveness of the newly designed model is validated empirically.","authors":[{"name":"Xuran Pan"},{"name":"Shiji Song"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"}],"id":"605021c19e795e8427406e83","lang":"en","num_citation":0,"order":2,"pdf":"https:\u002F\u002Fopenreview.net\u002Fpdf?id=zUMD--Fb9Bt","title":"A Unified Framework for Convolution-based Graph Neural Networks","urls":["https:\u002F\u002Facademic.microsoft.com\u002Fpaper\u002F3125667605\u002Fcitedby\u002Fsearch?q=A+Unified+Framework+for+Convolution-based+Graph+Neural+Networks&qe=RId%253D1985105481&f=&orderBy=0"],"versions":[{"id":"605021c19e795e8427406e83","sid":"605021c19e795e8427406e83","src":"user-5d8054e8530c708f9920ccce","year":2021}],"year":2021},{"authors":[{"id":"542dfe6fdabfaed7c7c2e05e","name":"Le Yang"},{"id":"53f633f7dabfae43d83fa719","name":"Shiji Song"},{"name":"Shuang Li"},{"id":"5447f7c0dabfae87b7db6e28","name":"Yiming Chen"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"}],"doi":"10.1109\u002FTSMC.2019.2931003","flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"5db9289147c8f766461defa1","num_citation":1,"order":4,"pages":{"end":"4273","start":"4262"},"title":"Graph Embedding-Based Dimension Reduction With Extreme Learning Machine","venue":{"info":{"name":"IEEE Trans. Syst. Man Cybern. Syst."},"issue":"7","volume":"51"},"versions":[{"id":"5db9289147c8f766461defa1","sid":"2969266941","src":"mag","vsid":"76152103","year":2019},{"id":"60dc41cf1e8bd4ede2dbe496","sid":"journals\u002Ftsmc\u002FYangSLCH21","src":"dblp","vsid":"journals\u002Ftsmc","year":2021}],"year":2021},{"abstract":" Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However, they are learned independently in each layer and sometimes fail to capture precise patterns. In this paper, we propose a novel and generic mechanism based on evolving attention to improve the performance of transformers. On one hand, the attention maps in different layers share common knowledge, thus the ones in preceding layers can instruct the attention in succeeding layers through residual connections. On the other hand, low-level and high-level attentions vary in the level of abstraction, so we adopt convolutional layers to model the evolutionary process of attention maps. The proposed evolving attention mechanism achieves significant performance improvement over various state-of-the-art models for multiple tasks, including image classification, natural language understanding and machine translation. ","authors":[{"id":"54082e5bdabfae8faa62de9a","name":"Yujing Wang"},{"name":"Yaming Yang"},{"name":"Jiangang Bai"},{"id":"53f3640adabfae4b34992817","name":"Mingliang Zhang"},{"id":"542d65d4dabfae11fc468cf4","name":"Jing Bai"},{"id":"560defcb45cedb3397628128","name":"Jing Yu"},{"id":"542a47c5dabfae61d49632fd","name":"Ce Zhang"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"id":"53f474c3dabfaedf4367defe","name":"Yunhai Tong"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"6038cc6691e011c1c59ed2ba","num_citation":0,"order":7,"pages":{"end":"10980","start":"10971"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F21\u002F2102\u002F2102.12895.pdf","title":"Evolving Attention with Residual Convolutions","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.12895","https:\u002F\u002Ficml.cc\u002FConferences\u002F2021\u002FAcceptedPapersInitial","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Ficml\u002FWangYBZBY0HT21","http:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fwang21ab.html"],"venue":{"info":{"name":"ICML"}},"versions":[{"id":"6038cc6691e011c1c59ed2ba","sid":"2102.12895","src":"arxiv","year":2021},{"id":"60bdde338585e32c38af4e57","sid":"icml2021#78","src":"conf_icml","year":2021},{"id":"60f16acf91e011963c8d406c","sid":"conf\u002Ficml\u002FWangYBZBY0HT21","src":"dblp","vsid":"conf\u002Ficml","year":2021}],"year":2021},{"abstract":" Dynamic neural network is an emerging research topic in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc. In this survey, we comprehensively review this rapidly developing area by dividing dynamic networks into three main categories: 1) instance-wise dynamic models that process each instance with data-dependent architectures or parameters; 2) spatial-wise dynamic networks that conduct adaptive computation with respect to different spatial locations of image data and 3) temporal-wise dynamic models that perform adaptive inference along the temporal dimension for sequential data such as videos and texts. The important research problems of dynamic networks, e.g., architecture design, decision making scheme, optimization technique and applications, are reviewed systematically. Finally, we discuss the open problems in this field together with interesting future research directions. ","authors":[{"name":"Yizeng Han"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"name":"Shiji Song"},{"name":"Le Yang"},{"name":"Honghui Wang"},{"name":"Yulin Wang"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"6023d93f91e0119b5fbd9882","num_citation":3,"order":1,"title":"Dynamic Neural Networks: A Survey","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.04906"],"versions":[{"id":"6023d93f91e0119b5fbd9882","sid":"2102.04906","src":"arxiv","year":2021}],"year":2021},{"abstract":"Due to the need to store the intermediate activations for back-propagation, end-to-end (E2E) training of deep networks usually suffers from high GPUs memory footprint. This paper aims to address this problem by revisiting the locally supervised learning, where a network is split into gradient-isolated modules and trained with local supervision. We experimentally show that simply training local modules with E2E loss tends to collapse task-relevant information at early layers, and hence hurts the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discard task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm. In fact, we show that the proposed method boils down to minimizing the combination of a reconstruction loss and a normal cross-entropy\u002Fcontrastive term. Extensive empirical results on five datasets (i.e., CIFAR, SVHN, STL-10, ImageNet and Cityscapes) validate that InfoPro is capable of achieving competitive performance with less than 40% memory footprint compared to E2E training, while allowing using training data with higher-resolution or larger batch sizes under the same GPU memory constraint. Our method also enables training local modules asynchronously for potential training acceleration.","authors":[{"name":"Yulin Wang"},{"name":"Zanlin Ni"},{"id":"53f633f7dabfae43d83fa719","name":"Shiji Song"},{"id":"542dfe6fdabfaed7c7c2e05e","name":"Le Yang"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"}],"flags":[{"flag":"affirm_author","person_id":"540835d9dabfae44f0870362"}],"id":"600832cd9e795ed227f53133","lang":"en","num_citation":1,"order":4,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F10\u002F1449\u002F806\u002F600832cd9e795ed227f53133_0.pdf","title":"Revisiting Locally Supervised Learning: an Alternative to End-to-end Training","urls":["https:\u002F\u002Fopenreview.net\u002Fforum?id=fAbkE6ant2","https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.10832","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2101.html#abs-2101-10832","https:\u002F\u002Fopenreview.net\u002Fpdf?id=fAbkE6ant2","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Ficlr\u002FWangNSYH21"],"venue":{"info":{"name":"ICLR"}},"versions":[{"id":"600832cd9e795ed227f53133","sid":"600832cd9e795ed227f53133","src":"user-57f9ed429ed5dbd78af2a05d","year":2021},{"id":"60115cbb91e0117654c5ff5b","sid":"2101.10832","src":"arxiv","year":2021},{"id":"605a9ff2e4510cd7c86861bc","sid":"3126438511","src":"mag","vsid":"2584161585","year":2021},{"id":"60d5b37191e011c8cebefedf","sid":"conf\u002Ficlr\u002FWangNSYH21","src":"dblp","vsid":"conf\u002Ficlr","year":2021}],"year":2021},{"abstract":"Semantic segmentation is a challenging task that needs to handle large scale variations, deformations, and different viewpoints. In this paper, we develop a novel network named Gated Path Selection Network (GPSNet), which aims to adaptively select receptive fields while maintaining the dense sampling capability. In GPSNet, we first design a two-dimensional SuperNet, which densely incorporates features from growing receptive fields. And then, a Comparative Feature Aggregation (CFA) module is introduced to dynamically aggregate discriminative semantic context. In contrast to previous works that focus on optimizing sparse sampling locations on regular grids, GPSNet can adaptively harvest free form dense semantic context information. The derived adaptive receptive fields and dense sampling locations are data-dependent and flexible which can model various contexts of objects. On two representative semantic segmentation datasets, i.e., Cityscapes and ADE20K, we show that the proposed approach consistently outperforms previous methods without bells and whistles.","authors":[{"name":"Qichuan Geng"},{"id":"540572f9dabfae450f3af71b","name":"Hong Zhang"},{"id":"562b237645cedb339898d12d","name":"Xiaojuan Qi"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"id":"562ffb0f45cedb33998a4b22","name":"Ruigang Yang"},{"id":"542aa81edabfae646d57cdd9","name":"Zhong Zhou"}],"doi":"10.1109\u002FTIP.2020.3046921","id":"5ffae029d4150a363c2719c8","num_citation":3,"order":3,"pages":{"end":"1","start":"1"},"title":"Gated Path Selection Network for Semantic Segmentation.","urls":["https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpubmed\u002F33417546","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2001.html#abs-2001-06819","https:\u002F\u002Fwww.scilit.net\u002Farticle\u002F014fcc2fea8a9cfbfdc5450ca994bde6"],"venue":{"info":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society"},"issue":"99","volume":"PP"},"versions":[{"id":"5ffae029d4150a363c2719c8","sid":"33417546","src":"pubmed","vsid":"9886191","year":2021},{"id":"603765e5d3485cfff1d98305","sid":"3118882604","src":"mag","vsid":"115304631","year":2021}],"year":2021},{"abstract":" Feature learning for 3D object detection from point clouds is very challenging due to the irregularity of 3D point cloud data. In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively. Specifically, a Local Transformer module is employed to model interactions among points in a local region, which learns context-dependent region features at an object level. A Global Transformer is designed to learn context-aware representations at the scene level. To further capture the dependencies among multi-scale representations, we propose Local-Global Transformer to integrate local features with global features from higher resolution. In addition, we introduce an efficient coordinate refinement module to shift down-sampled points closer to object centroids, which improves object proposal generation. We use Pointformer as the backbone for state-of-the-art object detection models and demonstrate significant improvements over original models on both indoor and outdoor datasets. ","authors":[{"name":"Xuran Pan"},{"name":"Zhuofan Xia"},{"id":"53f633f7dabfae43d83fa719","name":"Shiji Song"},{"id":"53f43999dabfaefedbae66d1","name":"Li Erran Li"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"}],"id":"5fe1e44d91e0119a161edf76","num_citation":0,"order":4,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F20\u002F2012\u002F2012.11409.pdf","title":"3D Object Detection with Pointformer","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.11409","https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2021\u002Fhtml\u002FPan_3D_Object_Detection_With_Pointformer_CVPR_2021_paper.html"],"venue":{"info":{"name":"Proceedings of the IEEE\u002FCVF Conference on Computer Vision and Pattern Recognition (CVPR)"}},"versions":[{"id":"5fe1e44d91e0119a161edf76","sid":"2012.11409","src":"arxiv","year":2020},{"id":"60cb2a281bc21f07d08113ee","sid":"cvpr2021#1240","src":"conf_cvpr","year":2021}],"year":2021}],"profilePubsTotal":111,"profilePatentsPage":1,"profilePatents":[],"profilePatentsTotal":0,"profilePatentsEnd":true,"profileProjectsPage":0,"profileProjects":null,"profileProjectsTotal":null,"newInfo":null,"checkDelPubs":[]}};