Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Summary

  • It explicitly trains the parameters for sensitivity on a given task distribution to have fair initialization of deep networks, which in turn leads to efficient and fast adaptation to target task, while being agnostic to the form of the model and to the particular learning task.

    Preliminary

  • Meta Learning : to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples.

Proposal

  • This paper propose model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems.
  • This algorithm does not expand the number of learned parameters nor place constraints on the model architecture.

image

Methods

image

where meta objective is as follows:

image

Note that the meta-optimization is performed over the model parameters θ, whereas the objective is computed using the updated model parameters θ'

For regression,

image

For classification

image

About First-order approximation

  • The MAML meta-gradient update involves a gradient through a gradient. Computationally, this requires an addi- tional backward pass through f to compute Hessian-vector products.
  • The paper show a comparison to a first-order approximation of MAML, where these second derivatives are omitted. Surprisingly however, the performance of this method is nearly the same as that obtained with full second derivatives.
  • This suggests that most of the improvement in MAML comes from the gradients of the objective at the post-update parameter values, rather than the second order updates from differentiating through the gradient update.
  • Past work has observed that ReLU neural networks are locally almost linear (Goodfellow et al., 2015), which suggests that second derivatives may be close to zero in most cases, partially explaining the good performance of the first-order approximation.
  • This approximation removes the need for computing Hessian-vector products in an additional backward pass, which we found led to roughly 33% speed-up in network computation.

Experiment

  • Tasks
    • regression : sinusoidal
    • image classification : OMNIGLOT, MINI-IMAGENET
    • reinforcement learning

      Results

      Regression

image

Classification

image

Information

  • Authors : Chelsea Finn, Pieter Abbeel, Sergey Levine
  • Affiliations : University of California Berkeley, OpenAI
  • Published : ICML 2017, Arxiv
  • Code/Project Page : reg & cla, rl
  • Material : video_rl
  • Blog : https://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/

    Discussion

  • How can we reduce training phase from pre-training + fine-tuning to one single altimtate meta-learning to achieve comparable performance?