Context Normalization Layer with Applications

Bilal Faye,Hanane Azzag,Mustapha Lebbah,Mohamed-Djallel Dilmi,Djamel Bouchaffra

2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023（2023）

引用 0|浏览8

暂无评分

摘要

Deep neural networks (DNNs) have gained prominence in many areas such as computer vision (CV), natural language processing (NLP), robotics, and bioinformatics. While their deep and complex structure enables powerful representation and hierarchical learning, it poses serious challenges (e.g., internal covariate shift, vanishing/exploding gradients, overfitting, and computational complexity), during their training phase. Neuron activity normalization is an effective strategy that lives up to these challenges. This procedure consists in promoting stability, creating a balanced learning, improving performance generalization and gradient flow efficiency. Traditional normalization methods often overlook inherent dataset relationships. For example, batch normalization (BN) estimates mean and standard deviation from randomly constructed mini-batches (composed of unrelated samples), leading to performance dependence solely on the size of mini-batches, without accounting for data correlation within these batches. Conventional techniques such as Layer Normalization, Instance Normalization, and Group Normalization estimate normalization parameters per instance, addressing mini-batch size issues. Mixture Normalization (MN) utilizes a two-step process: (i) training a Gaussian mixture model (GMM) to determine components parameters, and (ii) normalizing activations accordingly. MN outperforms BN but incurs computational overhead due to GMM usage. To overcome these limitations, we propose a novel methodology that we named "Context Normalization" (CN). Our approach assumes that the data distribution can be represented as a mixture of Gaussian components. However, unlike MN that assumes a-priori that data are partitioned with respect to a set of Gaussian distributions, CN introduces the notion of concept that accounts for data relationship via a neural network classification scheme. Samples that are gathered within a cluster define a context. The estimation of the Gaussian components parameters is conducted through a supervised neural network-based concept classification. CN is more precise when clusters are thick and not sparse. Extensive comparative experiments conducted on various datasets demonstrates the superiority of CN over BN and MN in terms of convergence speed and performance generalization. In fact, CN outperforms BN and MN with a convergence speed margin of 5% and a performance margin of 10%. These results reveal the importance and the need of capturing inherent data context to learn the Gaussian component parameters. Our proposed approach harnesses data relationships, and therefore enhances deep learning models in various applications.

查看译文

关键词

Normalization Layer,Neural Network,Deep Neural Network,Normalization Method,Mixture Model,Gaussian Model,Batch Normalization,Normal Parameters,Mixture Components,Gaussian Mixture Model,Gradient Flow,Gaussian Components,Instance Normalization,Concept Note,Internal Covariate Shift,Model Performance,Learning Rate,Convolutional Neural Network,Probability Density Function,Image Patches,Vision Transformer,Domain Adaptation,Neural Network Layers,Normalization Approach,Normalization Techniques,Source Domain,Mini-batch Of Samples,Top-5 Accuracy,Target Task,Target Domain

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要