Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

Summary

Propose Task-Adaptive Meta-Learning (Bayesian TAML) to solve imbalance of data or distribution with task and class, and Out-Of-Distribution Task in test data.
Motivation
Classic few-shot classification assume that the number of instances per task and class is fixed, which is an artificial scenario.
In real world,
1. task imbalance : tasks that arrive at the model may have different training instances
2. class imbalance : the number of training instances per class may largely vary
3. out-of-distribution task : the new task may come from a distribution that is different from the task distribution the model has been trained on

Tasks with small number of training data, or close to the tasks trained in meta-training step may want to rely mostly on meta-knowledge obtained over other tasks, whereas tasks that are out-of-distribution or come with more number of training data may obtain better solutions when trained in a task-specific manner.
Propose the model to task- and class-adaptively decide how much to use from the meta-learner, and how much to learn specifically for each task and class.
Three balancing variable:
1. class-dependent learning rate ωτ
2. task-dependent learning rate multiplier γτ
3. task-dependent modulator for initial model parameter zτ

Introduce the variational inference framework for the input of the three balancing variables.
Bayesian modeling: to maximize the conditional log-likelihood

Solving Equation (4) is intractatble, Thus, resort to amortized variational inference with a tractable form of approximate posterior.
The final form of the meta-training minimization objective with Monte-Carlo (MC) approximation:

Naively approximate by taking the expectation inside for computational efficiency

Table 3 show that the MC approximation in Eq. (8) largely outperforms the naive approximation in Eq. (9), which suggests that zτ learns very large variance.
In Figure 4, zτ actually relocates the initial θ far from the initial parameter for these OOD tasks given at meta-test time, with larger displacements for highly heterogeneous tasks (Figure 4, right). - - This allows the model to either stick to, or deviate from the meta-knowledge based on the similarity between the tasks given at the meta-training and meta-test time.
Related Work
Try to solve the similar problem of OOD in few-shot classification problem using meta-learning :
- Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation
Modulation Network for Meta Learning
- Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation
Task-dependent modulation of batch normalization parameters
- Tadam: Task dependent adaptive metric for improved few-shot learning
- Fast and flexible multi-task classification using conditional neural adaptive processes
  Information
Authors : Hae Beom Lee, Hayeon Lee, Donghyun Na, Saehoon Kim, Minseop Park, Eunho Yang, Sung Ju Hwang
Affiliations : Kaist, Tmax
Published : ICLR 2020 oral, Arxiv
Code : git

In seq-to-seq problem, what could be the instance of inbalance?
Among many modulation method within meta-learning, which one would be effective to solve the discrepancy in sequence length difference between tasks?