Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Summary

It explicitly trains the parameters for sensitivity on a given task distribution to have fair initialization of deep networks, which in turn leads to efficient and fast adaptation to target task, while being agnostic to the form of the model and to the particular learning task.
Preliminary
Meta Learning : to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples.

This paper propose model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems.
This algorithm does not expand the number of learned parameters nor place constraints on the model architecture.

where meta objective is as follows:

Note that the meta-optimization is performed over the model parameters θ, whereas the objective is computed using the updated model parameters θ'

For regression,

For classification

The MAML meta-gradient update involves a gradient through a gradient. Computationally, this requires an addi- tional backward pass through f to compute Hessian-vector products.
The paper show a comparison to a first-order approximation of MAML, where these second derivatives are omitted. Surprisingly however, the performance of this method is nearly the same as that obtained with full second derivatives.
This suggests that most of the improvement in MAML comes from the gradients of the objective at the post-update parameter values, rather than the second order updates from differentiating through the gradient update.
Past work has observed that ReLU neural networks are locally almost linear (Goodfellow et al., 2015), which suggests that second derivatives may be close to zero in most cases, partially explaining the good performance of the first-order approximation.
This approximation removes the need for computing Hessian-vector products in an additional backward pass, which we found led to roughly 33% speed-up in network computation.

Authors : Chelsea Finn, Pieter Abbeel, Sergey Levine
Affiliations : University of California Berkeley, OpenAI
Published : ICML 2017, Arxiv
Code/Project Page : reg & cla, rl
Material : video_rl
Blog : https://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/
Discussion
How can we reduce training phase from pre-training + fine-tuning to one single altimtate meta-learning to achieve comparable performance?