Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation


  • Adapt pre-taining parameter to the task from multimodal distribution using the modulation method.


  • Model-Agnostic Meta-Learning (MAML) : With the flexibility in the choice of models, Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates.


  • MAML rely on a common single initialization shared across the entire task distribution.
  • Different tasks sampled from a complex task distributions can require substantially different parameters
    • making it difficult to find a single initialization that is close to all target parameters
    • limiting the diversity of the task distributions that they are able to learn from.


  • Goal : to develop a framework to quickly master a novel task from a multimodal task distribution.
  • By augmenting MAML, this paper propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior parameters according to the identified mode, allowing more efficient fast adaptation.

  • Aim to develop a meta-learner that is able to acquire mode-specific prior parameters and adapt quickly given tasks sampled from a multimodal task distribution.

Modulation and Task Network


In Algorithm 1, N is the number of blocks in the task network. Note that the task-specific parameters τi are kept fixed and only the meta-learned prior parameters of the task network are updated.

For general modulation operation in equation


the author empirically observed that Feature-wise Linear Modulation (FiLM) performs better than attention-based(softmax) modulation.


  • Baseline
    • MAML : the same as task network in MMAML
    • Multi-MAML
      • consists of M (the number of modes) MAML models and each of them is specifically trained on the tasks sampled from a single mode.
      • If it outperforms MAML, it indicates that MAML’s performance is degenerated due to the multimodality of task distributions.
  • Tasks
    • regression : sinusoidal, linear, quadratic, transformed l1 norm, hyperbolic tangent functions as discrete task modes.
    • image classification : OMNIGLOT, MINI-IMAGENET, FC100, CUB, AIRCRAFT
    • reinforcement learning





Image Classification





  • If we apply MMAML to NMT + ASR, will it be enough to adapt different input modality and different input length distributions?
  • What would be optimal General modulation operation for seq2seq modeling?