Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
international conference on learning representations, 2021.
We investigate and improve parameter-sharing strategies in multilingual Transformers by utilizing conditional computation.
Using a mix of shared and language-specific (LS) parameters has shown promise in multilingual neural machine translation (MNMT), but the question of when and where LS capacity matters most is still under-studied. We offer such a study by proposing conditional language-specific routing (CLSR). CLSR employs hard binary gates conditioned on ...More
PPT (Upload PPT)