# Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

# Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

## Summary

- Propose Task-Adaptive Meta-Learning (Bayesian TAML) to solve imbalance of data or distribution with task and class, and Out-Of-Distribution Task in test data.
### Motivation

- Classic few-shot classification assume that the number of instances per task and class is fixed, which is an artificial scenario.
- In real world,
- task imbalance : tasks that arrive at the model may have different training instances
- class imbalance : the number of training instances per class may largely vary
- out-of-distribution task : the new task may come from a distribution that is different from the task distribution the model has been trained on

### Proposal

- Tasks with small number of training data, or close to the tasks trained in meta-training step may want to rely mostly on meta-knowledge obtained over other tasks, whereas tasks that are out-of-distribution or come with more number of training data may obtain better solutions when trained in a task-specific manner.
- Propose the model to task- and class-adaptively decide how much to use from the meta-learner, and how much to learn specifically for each task and class.
- Three balancing variable:
- class-dependent learning rate ωτ
- task-dependent learning rate multiplier γτ
- task-dependent modulator for initial model parameter zτ

#### BAYESIAN TASK-ADAPTIVE META-LEARNING

- Introduce the variational inference framework for the input of the three balancing variables.
- Bayesian modeling: to maximize the conditional log-likelihood

#### VARIATIONAL INFERENCE

- Solving Equation (4) is intractatble, Thus, resort to amortized variational inference with a tractable form of approximate posterior.
- The final form of the meta-training minimization objective with Monte-Carlo (MC) approximation:

- Naively approximate by taking the expectation inside for computational efficiency

### Results

- Table 3 show that the MC approximation in Eq. (8) largely outperforms the naive approximation in Eq. (9), which suggests that zτ learns very large variance.
- In Figure 4, zτ actually relocates the initial θ far from the initial parameter for these OOD tasks given at meta-test time, with larger displacements for highly heterogeneous tasks (Figure 4, right). - - This allows the model to either stick to, or deviate from the meta-knowledge based on the similarity between the tasks given at the meta-training and meta-test time.
## Related Work

- Try to solve the similar problem of OOD in few-shot classification problem using meta-learning :
- Modulation Network for Meta Learning
- Task-dependent modulation of batch normalization parameters
- Authors : Hae Beom Lee, Hayeon Lee, Donghyun Na, Saehoon Kim, Minseop Park, Eunho Yang, Sung Ju Hwang
- Affiliations : Kaist, Tmax
- Published : ICLR 2020 oral, Arxiv
- Code : git

## Discussion

- In seq-to-seq problem, what could be the instance of inbalance?
- Among many modulation method within meta-learning, which one would be effective to solve the discrepancy in sequence length difference between tasks?