Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

Summary

  • Propose Task-Adaptive Meta-Learning (Bayesian TAML) to solve imbalance of data or distribution with task and class, and Out-Of-Distribution Task in test data.

    Motivation

  • Classic few-shot classification assume that the number of instances per task and class is fixed, which is an artificial scenario.
  • In real world,
    1. task imbalance : tasks that arrive at the model may have different training instances
    2. class imbalance : the number of training instances per class may largely vary
    3. out-of-distribution task : the new task may come from a distribution that is different from the task distribution the model has been trained on

image

Proposal

  • Tasks with small number of training data, or close to the tasks trained in meta-training step may want to rely mostly on meta-knowledge obtained over other tasks, whereas tasks that are out-of-distribution or come with more number of training data may obtain better solutions when trained in a task-specific manner.
  • Propose the model to task- and class-adaptively decide how much to use from the meta-learner, and how much to learn specifically for each task and class.
  • Three balancing variable:
    1. class-dependent learning rate ωτ
    2. task-dependent learning rate multiplier γτ
    3. task-dependent modulator for initial model parameter zτ

image

BAYESIAN TASK-ADAPTIVE META-LEARNING

  • Introduce the variational inference framework for the input of the three balancing variables.
  • Bayesian modeling: to maximize the conditional log-likelihood

image

VARIATIONAL INFERENCE

  • Solving Equation (4) is intractatble, Thus, resort to amortized variational inference with a tractable form of approximate posterior.
  • The final form of the meta-training minimization objective with Monte-Carlo (MC) approximation:

image

  • Naively approximate by taking the expectation inside for computational efficiency

image

Results

image

Discussion

  • In seq-to-seq problem, what could be the instance of inbalance?
  • Among many modulation method within meta-learning, which one would be effective to solve the discrepancy in sequence length difference between tasks?