Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation
Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation
Summary
- Adapt pre-taining parameter to the task from multimodal distribution using the modulation method.
Preliminaries
- Model-Agnostic Meta-Learning (MAML) : With the flexibility in the choice of models, Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates.
Motivation
- MAML rely on a common single initialization shared across the entire task distribution.
- Different tasks sampled from a complex task distributions can require substantially different parameters
- making it difficult to find a single initialization that is close to all target parameters
- limiting the diversity of the task distributions that they are able to learn from.
Proposal
- Goal : to develop a framework to quickly master a novel task from a multimodal task distribution.
By augmenting MAML, this paper propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior parameters according to the identified mode, allowing more efficient fast adaptation.
- Aim to develop a meta-learner that is able to acquire mode-specific prior parameters and adapt quickly given tasks sampled from a multimodal task distribution.
Modulation and Task Network
In Algorithm 1, N is the number of blocks in the task network. Note that the task-specific parameters τi are kept fixed and only the meta-learned prior parameters of the task network are updated.
For general modulation operation in equation
the author empirically observed that Feature-wise Linear Modulation (FiLM) performs better than attention-based(softmax) modulation.
Experiments
- Baseline
- MAML : the same as task network in MMAML
- Multi-MAML
- consists of M (the number of modes) MAML models and each of them is specifically trained on the tasks sampled from a single mode.
- If it outperforms MAML, it indicates that MAML’s performance is degenerated due to the multimodality of task distributions.
- Tasks
- regression : sinusoidal, linear, quadratic, transformed l1 norm, hyperbolic tangent functions as discrete task modes.
- image classification : OMNIGLOT, MINI-IMAGENET, FC100, CUB, AIRCRAFT
- reinforcement learning
Results
Regression
Image Classification
Related Work
- Model-Agnostic Meta-Learning (MAML)
- attention-based (softmax) modulation
- feature-wise linear modula- tion (FiLM)
Information
- Authors : Risto Vuorio, Shao-Hua Sun, Hexiang Hu, Joseph J. Lim
- Affiliations : University of Michigan, University of Southern California
- Published : NeurIPS 2019, Arxiv
- Code/Project Page : https://vuoristo.github.io/MMAML
- Material : Poster, Slides
Discussion
- If we apply MMAML to NMT + ASR, will it be enough to adapt different input modality and different input length distributions?
- What would be optimal General modulation operation for seq2seq modeling?