fastbook/16_accel_sgd.ipynb
2020-03-18 15:24:13 -03:00

1356 lines
167 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide_input": false
},
"outputs": [],
"source": [
"#hide\n",
"from utils import *"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"[[chapter_accel_sgd]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The training process"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we now know how to create state-of-the-art architectures for computer vision, natural image processing, tabular analysis, and collaborative filtering, and we know how to train them quickly, we're done, right? Not quite yet. We still have to explorea little bit more the training process.\n",
"\n",
"We explained in <<chapter_mnist_basics>> the basis of Stochastic Gradient Descent: pass a minibatch in the model, compare it to our target with the loss function then compute the gradients of this loss function with regards to each weight before updating the weights with the formula:\n",
"\n",
"```python\n",
"new_weight = weight - lr * weight.grad\n",
"```\n",
"\n",
"We implemented this from scratch in a training loop, and also saw that Pytorch provides a simple `nn.SGD` class that does this calculation for each parameter for us. In this chapter, we will build some faster optimizers, using a flexible foundation. But that's not all what we might want to change in the training process. For any tweak of the training loop, we will need a way to add some code to the basis of SGD. The fastai library has a system of callbacks to do this, and we will teach you all about it.\n",
"\n",
"Firs things first, let's start with standard SGD to get a baseline, then we will introduce most commonly used optimizers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Let's start with SGD"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we'll create a baseline, using plain SGD, and compare it to fastai's default optimizer. We'll start by grabbing Imagenette with the same `get_data` we used in <<chapter_resnet>>:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide_input\n",
"def get_data(url, presize, resize):\n",
" path = untar_data(url)\n",
" return DataBlock(\n",
" blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, \n",
" splitter=GrandparentSplitter(valid_name='val'),\n",
" get_y=parent_label, item_tfms=Resize(presize),\n",
" batch_tfms=[*aug_transforms(min_scale=0.5, size=resize),\n",
" Normalize.from_stats(*imagenet_stats)],\n",
" ).dataloaders(path, bs=128)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dls = get_data(URLs.IMAGENETTE_160, 160, 128)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll create a ResNet34 without pretraining, and pass along any arguments received:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_learner(**kwargs):\n",
" return cnn_learner(dls, resnet34, pretrained=False,\n",
" metrics=accuracy, **kwargs).to_fp16()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's the default fastai optimizer, with the usual 3e-3 learning rate:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>2.571932</td>\n",
" <td>2.685040</td>\n",
" <td>0.322548</td>\n",
" <td>00:11</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>1.904674</td>\n",
" <td>1.852589</td>\n",
" <td>0.437452</td>\n",
" <td>00:11</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1.586909</td>\n",
" <td>1.374908</td>\n",
" <td>0.594904</td>\n",
" <td>00:11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn = get_learner()\n",
"learn.fit_one_cycle(3, 0.003)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's try plain SGD. We can pass `opt_func` (optimization function) to `cnn_learner` to get fastai to use any optimizer:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = get_learner(opt_func=SGD)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first thing to look at is `lr_find`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"(0.017378008365631102, 3.019951861915615e-07)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"learn.lr_find()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks like we'll need to use a higher learning rate than we normally use:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>2.969412</td>\n",
" <td>2.214596</td>\n",
" <td>0.242038</td>\n",
" <td>00:09</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>2.442730</td>\n",
" <td>1.845950</td>\n",
" <td>0.362548</td>\n",
" <td>00:09</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>2.157159</td>\n",
" <td>1.741143</td>\n",
" <td>0.408917</td>\n",
" <td>00:09</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(3, 0.03, moms=(0,0,0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(Because accelerated SGD using momentum with is such a good idea, fastai uses it by default in `fit_one_cycle`, so we turn it off with `moms=(0,0,0)`; we'll be learning about momentum shortly.)\n",
"\n",
"Clearly, plain SGD isn't training as fast as we'd like. So let's learn the tricks to get accelerated training!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A generic optimizer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to build up our accelerated SGD tricks, we'll need to start with a nice flexible optimizer foundation. No library prior to fastai provided such a foundation, but during fastai's development we realized that all optimizer improvements we'd seen in the academic literature could be handled using *optimizer callbacks*. These are small pieces of code that an optimizer can add to the optimizer `step`. They are called by fastai's `Optimizer` class. This is a small class (less than a screen of code); these are the definitions in `Optimizer` of the two key methods that we've been using in this book:\n",
"\n",
"```python\n",
"def zero_grad(self):\n",
" for p,*_ in self.all_params():\n",
" p.grad.detach_()\n",
" p.grad.zero_()\n",
"\n",
"def step(self):\n",
" for p,pg,state,hyper in self.all_params():\n",
" for cb in self.cbs:\n",
" state = _update(state, cb(p, **{**state, **hyper}))\n",
" self.state[p] = state\n",
"```\n",
"\n",
"As we saw when training an MNIST model from scratch, `zero_grad` just loops through the parameters of the model and sets the gradients to zero. It also calls `detach_`, which removes any history of gradient computation, since it won't be needed after `zero_grad`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The more interesting method is `step`, which loops through the callbacks (`cbs`) and calls them to update the parameters (the `_update` function just calls `state.update` if there's anything returned by `cb(...)`). As you can see, `Optimizer` doesn't actually do any SGD steps itself. Let's see how we can add SGD to `Optimizer`.\n",
"\n",
"Here's an optimizer callback that does a single SGD step, by multiplying `-lr` by the gradients, and adding that to the parameter (when `Tensor.add_` in PyTorch is passed two parameters, they are multiplied together before the addition): "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def sgd_cb(p, lr, **kwargs): p.data.add_(-lr, p.grad.data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can pass this to `Optimizer` using the `cbs` parameter; we'll need to use `partial` since `Learner` will call this function to create our optimizer later:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"opt_func = partial(Optimizer, cbs=[sgd_cb])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see if this trains:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>2.730918</td>\n",
" <td>2.009971</td>\n",
" <td>0.332739</td>\n",
" <td>00:09</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>2.204893</td>\n",
" <td>1.747202</td>\n",
" <td>0.441529</td>\n",
" <td>00:09</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1.875621</td>\n",
" <td>1.684515</td>\n",
" <td>0.445350</td>\n",
" <td>00:09</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn = get_learner(opt_func=opt_func)\n",
"learn.fit(3, 0.03)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's working! So that's how we create SGD from scratch in fastai. Now let's see see what this momentum is exactly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Momentum"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"SGD is the idea of taking a step in the direction of the steepest slope at each point of time. But what if we have a ball rolling down the mountain? It won't, at each given point, exactly follow the direction of the gradient, as it will have *momentum*. A ball with more momentum (for instance, a heavier ball) will skip over little bumps and holes, and be more likely to get to the bottom of a bumpy mountain. A ping pong ball, on the other hand, will get stuck in every little crevice.\n",
"\n",
"So how could we bring this idea over to SGD? We can use a moving average, instead of only the current gradient, to make our step:\n",
"\n",
"```python\n",
"weight.avg = beta * weight.avg + (1-beta) * weight.grad\n",
"new_weight = weight - lr * weight.avg\n",
"```\n",
"\n",
"Here `beta` is some number we choose which defines how much momentum to use. If `beta` is zero, then the first equation above becomes `weight.avg = weight.grad`, so we end up with plain SGD. But if it's a number close to one, then the main direction chosen is an average of previous steps. (If you have done a bit of statistics, you may recognize in the first equation an *exponentially weighted moving average*, which is very often used to denoise data and get the underlying tendency.)\n",
"\n",
"Note that we are writing `weight.avg` to highlight the fact we need to store thoe moving averages for each parameter of the model (and they all their own independent moving averages).\n",
"\n",
"<<img_momentum>> shows an example of noisy data for a single parameter, with the momentum curve plotted in red, and the gradients of the parameter plotted in blue. The gradients increase, and then decrease, and the momentum does a good job of following the general trend, without getting too influenced by noise."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide_input": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD4CAYAAADvsV2wAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3dd3hU1dbA4d8mBEhACEhNAEFRBGnRWDCCFAUuIgYVBXvnqlgRDVe5dkWxtw8VvIKiwBUMCCgXBKQjwURRBKkKoYUSigQIyf7+2DNhMpkWppwp630enpCZkzkrgaw5Z++111Zaa4QQQkS/SlYHIIQQIjQk4QshRIyQhC+EEDFCEr4QQsQISfhCCBEjKlsdgDt169bVzZo1szoMIYSIKCtXrtytta7n6rmwTfjNmjUjOzvb6jCEECKiKKX+dPecDOkIIUSMkIQvhBAxQhK+EELECEn4QggRIyThCyFEjJCEL4QQMUISvhBCxIiwrcMXIhpl5eQxctZathUUkpyUwNCeLclITbE6LBEjJOELESJZOXkMm7KKwqJiAPIKChk2ZRWAJH0REjKkI0SIjJy1tjTZ2xUWFTNy1lqLIhKxRhK+ECGyraCwQo8LEWiS8IUIkeSkhAo9LkSgScIXIkSG9mxJQnxcmccS4uMY2rOlRRGJWCOTtkKEiH1iVqp0hFUk4YuoECnljhmpKWEZl4gNkvBFxPO13DFS3hSECBZJ+CLieSp3tCf0k62BlzcJEU1k0lZEPF/KHU+mBt7+JpFXUIjmxJtEVk5eQOIWItTkCl9EvOSkBPJcJH3Hckdf3hScr+b/Pnrc452DXP2LSCNX+CLi+VLu6K0G3tXVfEFhkcuv2VZQyFNZq3hkYu7JXf0fOQK//+7T9yZEIEnCFxEvIzWFl69uS0pSAgpISUrg5avblrna9vam4GrIx51aCfGMX/YX2ulxd0NEWTl5pI+YS/PMGXR6aTbrOnaH1q358MJrSH/5exkiEiETkCEdpdQnQB9gl9a6jYvnFfA20Bs4DNymtf4pEOcWkSHYwx+O5Y72cz0yMbfcudzF4Gt7g4T4OJSiXLK3c34d58nigd98xJm5S1jWpA2DfpxCXs16DD3Yl2e/+Y2Cw0Vefzb27y2voJA4pSjWmhQZThI+CtQY/qfAe8A4N8//AzjT9udC4P9sH0UMCGWXSFfnemRiLtl/7uWFjLZuz+duHqB2YjyJVSqXeZN4ZGKu2/M7Dx053jn0XrOI+5Z9xfgOvXiqx318PPl5npo7hl8anUVucsvSeN39bJy/t2KtvX6NEI4CMqSjtV4A7PVwyFXAOG0sA5KUUo0CcW4R/nytkHEc+kgfMfekhjpcnUsD45f95fH13A35PH3lOSzO7MamEVewOLMbGakppUn9lKN/8/ScD2m6bzsAyvY6juxX/Gflb2bkzLdYmXw2z3YfhFaVGHLFo+w85VTezxpB2tbfuGRTDmjtdmjI07CTdN0UvgjVGH4KsMXh8622x8pQSt2jlMpWSmXn5+eHKDQRbL5WyASiBNLduTR4TIi+zAPYDe3ZkobHD/PPZV9x+8pv+O4/g2l0YDc3XtTUHF9cDFu2wKJF3LZpEfcvmcjoyc/zd5UE7s0YxrHK8QDsTziF+67KpO7hfXw1/gk+nzSct6a/RtXjx1x+H96GnVzdoZyMQLzxivAUqrJM5eKxcsOgWuuPgI8A0tLS3A2TigjjS9mkL4un3HGcH6hkG9d2pWhrHgweDNdfD506lXveY9uDbdtg4UJYsICMBQvI+PVX85qV4tCV4pgzdTjV17SARzebZH/8OABP2758a816/DPjX+w65dQyL7uq0Zncct3zNDi0h9P2bWfIovEkH8jnmdteKBeCu5+jXZxy9WtWMbJJS3QL1RX+VqCJw+eNgW0hOrewmC9lkyfbK975zsBdsu+yIZtZnz4I778PXbvC66+D87EbNsBNN5k3hG0O/z0ffhhSUmDAABg3DpKT4YUXYMEC4g//TfXZ31G9/qlw9Ch07AhDh8KHH8J338GaNUxbso7rh00gp3ErUpISuOmipqV3EkkJ8fzUvB3TWnfh3fSBDO77OO23r2PS2Edh3TqPP8fKxcepd+jESKq7793xZ+Xtyl02aYluobrCnwYMVkpNwEzW7tdabw/RuYXFfOkS6ctdgCveyinji4t4bMFnDPpxCvtbnA2ffQuvvgqPPQbLlsEnn5hE/fzz8H//B/Hx5o1g1ix480247TaYNw/S0szzHTpAZadfm06dICfHbQx9gb4dW7h93vEOJadjT5ZndKTz0Lvgootg6lS45BLgxM9xyKSfKdaaN6e/zpVrFnLuA+PZm1iLFA8/K1+v3GWTlugWqLLML4EuQF2l1FbMnWw8gNZ6FDATU5K5HlOWeXsgzisih7cukUN7tiyTkMC3XvGeElFaSQHDxz9L++3r2Nj/Fk4fOwoSEmDyZHjtNcjMhJUrYc8eOHQI7roLnnnG/P3OO+GOO2DiRNi0CW65xSR9J4EoN3X5s+nUBq64Arp3h/Hj4dprS48FmD7yU65csxCAW1dOZ1S3Wzz+rHwdMjvZN14RGQKS8LXWA708r4H7A3EuEZ1Otle8Y4KqVXiQ/dVqgFK0jj/GV58/BYUFMHkyp1999YkvUsoMu5x/vknkl14KI0ZA69Ynjpk/Hz74wLwp/P03NG1a7txBHe9u0QKWLIG+feGGG6BOHejWzbx2w0r0mv0OGxs0Iy+hNrflTOf0kc9ypYdz+nrlfrJvvCIyKO1l3M8qaWlpOjs72+owRJizJ924QwdZ/v4tvN/xOj7tdD3zZ79M/Z9XmInW888/+RNs2mSGdh59FJo1K/NU+oi5Lq+GU5ISWJzZzef4Pb7JFRRAejrk5cGiRVC/PnTpAn/9Zd4QDh40Qz7vvAMPPOD2PPZYqxYdJb6kmENVE93GKj2CIptSaqXWuvztKNI8TUQ4eyKa8XEW1YuOcP/yr+jcrwv1f1xkkqA/yR6geXPzOi74O97t0x1CUhJ8+60Zz+/d23y+ebN5rF07c8zFF8Mbb0CbNrBrl/mzc+eJv+/axawt22DXLmocK6QExZLT2jG9XXcuefyecnHJJi3RS67wRXQYO9ZMsIJJiocOQX6++XuQ+HuFX6Gvz8mBzp2hqAimT4fLLjvx3NSpkJFR9vhKlaBePXNHYPuzgUTm7NEcP3iIvmsW0mTfdg5Urc7ETv1p9O8n6NPpbJ++bxHe5ApfRL81a0yFzYAB8NlnJiEGMdmD/+PdFbpDSE2FxYtNff+555Z9rm9fmD3bVA/ZE3ydOibpOzjD9icrJ48ek3+h9eZfuefHKdw951P2LZ7CvDGT6Dqwp0+xi8gkCV9EhzVr4MwzTX381KlmotMLX8aqPR3j76bkFa6IsQ/hOFOq7BW/FyNnraXweAkrG7dmUOPWtNmxntGTn6PRg4PosmEUD1/hvueQiGyS8EVA+ZtET9qaNabKpmlTM25dpYrXOL2Nn/tyjD/j3VZVxDjfQfzasAWZvR7g06+epf+MTxh29A5AVtZGI+mHLwLGl344Ad828PBhePBBk/BTU81jVauaq14PfFlRGuxVpxXp3xNIru4g5p9xPhPbXs4/l0/mrD9X8/0HE6B6dbNGQUQNucIXAePL4h6vx+zfbxJ2tWreT7h8uamj/+MPk/SHDPE5Vl/Gz0Ox6tTTHUKwyiNd3VkAvND9Li7ZnMvrM94kr1Z982b6/fdw3XV+n1OEB7nCFwHjdxI9cAAaNYLERDjjDLPSdMgQGD3a1KDv3m0OPnYMnnzSlCMeOWKS0ttvm1W0PvK25aGvxwSLqzuhRybm0iwAHSwd7ywcHaxanSf+8SAt9m7l0k22/Yk2bvTjuxDhRhK+CBi/k+iuXVBYCH36mPr5vDyz2vXuu02/mnr1oG5dswr1pZfg1lvhl19KV6BWhC8N3Xw5Jljc9fWHAAyDYZL+4sxuvHV9hzLf46LmqUw4t/eJAyXhRxVJ+CJg/E6iBw6YB+66CyZMgNxcU0+/cSPMnAlvvMGmS3uxJDGZu68eTvpZN5G18VCZ1/K1l7sv4+dWjbGD92GjQM0luPoea7z9xolVxfPnw759fp9HhAcZwxcB40uZYkZqCmjNitc/5mjBAZZ2urL0DWHwqPm8BzzwzTq6N8kzx8bFmdWuzZuT1bAdw/a2ovBM25WvU9VMRXvb+FJhY9WqU2+97+HEm4K/Y/0uv8cpU0wfofnzTcnn7Nmmtl9ENEn4MS7QE4NeE+QPP5Dx+ONk/Pij+XzIlWQBw6asIn1PAQCbiiq7TNTeJnz92UQl3LibWHWUnJQQvAZuqammRfTMmdCvn0n6c+ZI0o9wMqQTwwJeIunN3Xebxl/btpmJ2BYt4K67yH/8Ke79/lOu/+V/AByqkuByyMLbpHA09XJ3nlh1LjK1D4MFfcOS3r0hKwtWrzZJf6+nratFuJOEH8NCtbtRVk4enV+cDaNHM6ddF6ZNmmf6zY8ZA7t3c/ecTxm8dBKdN/3EptqN2GHbBtA5UXubFLayqiYY7BOrm0dcwZvXd3A5lxCSN7l//ONE0u/eXWrzI5gM6cSwUCQL+11Etf3mynBh/ZZMmrmekmoJZHTuDHv30unVeWw5cKzc1zonam8rU6O5l7u7obKQbVjSq5dpWXHVVSbpz5ljKqZERJEr/BgWiiti+11E0hFTTVNQrUbpXURWTh7pry1gy4FjbocsHHmrmrGyqsYqIS0d7dkTpk2DtWvhyisD//oi6OQKP4aF4orYfreQVHgQgP3VTgFOzBfYz60x49Qak6jdTR57mxSOtV7u/jZwc8ftZH6PHmYbyMxM2LEDGjYMwHchQkUSfgwLVrJwZB9yqHXEJPyCBJPw45RyubCoIrtFCSPQb3JeK38uvtgcuHKlWQ0tIoYk/BgX7Cti+12E45BOQnyc23LDSKyoiTZey1tTU02v/RUrJOFHGBnDF0FlH1dvhknk1RrWd9nHxS5SK2qiidfJ/Bo1oFUrkB3pIo4kfBF0GakpPHxuXVCK756+kozUFEv71AjPfJrM79jRrL59+WWz7aKICDKkI1zyZwWu89eOPGU7F48bZyb44kySD8X8gTg5Pk3mv/SS6bHzr3+ZvkejR/u/YbwIOtnEXJTjPGkH5hfelxJHx69NPrCLp74fTe8/lnCoaXNqjPsPXHppsMMXAeDzG35WFtx/v6nYefZZ07bay+YzIrhkE3NRIf70pBk5ay2df13IwJ9nccHWX1EaXu18CzMvH8h8SfYRoUJ3dxkZ0LUr3HcfDB9uupu+/LIk/TAlCV+U427SLq+gkPQRcz0mgk4/ZPHSd++xtVZ9Zp3Zkdc630JerfqoQ+6bgInwcVLN2GrVgs8+g5o14ZVXzKY0b77pNekHa0cv4Z4kfFGOu+X6Ckofd5kI3nqLEd+9y/zm5/HPfsM4En9im0KpvokMJ3N3V5q4a17BiPR8rn/7bfMG0LAhNGgADRqwQVVn9l7ITWjAhg4dubD9aUxemRf4Lp/CI0n4ohxXk3b2VbCOShNBh2R48UUYPpy87r158PxBHNEnKnCk+iZyVLS/Upk7AqV4Iv02fq7dlLsS9nB68SHYuZNDi5fRYOdO/nnMvMbRbyqz5LT2HG2ZzldtL0MrUywYqa2sI4kkfFGOqwoad5txbNt32Cyzf/VVuOUWUsaM4blVO+VWPUJ5a8bmPAxz+NjxsncESvHFOd35wWHFdM8Rc8krKKRa0RHab1/H5euW0XPdMkZ++w6/NWjB6ganl365LLwLLkn4wiXnFbjptl9aR0qX8PoPo2H5NDNp9+67UKlSzPWziSaeSjJdje+742rj+iPx1VjetC3Lm7Zl1lkd+e8XmdQ5vL/M18nQX3BJwo9SzldiXc+ux7w1+Sd91T20Z0vGffA1PX6Zx28NzuDblum8NutdMlZ9D48/DiNGSGVGFPC0PiJ9xFyPO3A50piLhKE9W7q8a9hfrQYA1/z6PU327+TLDr0A6Hp2vcB9M6IcqcOPQq7q6J15ras/doyfXnyXw5O+4ufaTbl06yrabPmdEqWopDV7ayRR51ABvPCCWXwjyT7qNc+cUW4ex5uE+DiuOS+lzAQtQP2De/jxg1tLP7+/7xPMaNVJmucFgKc6fGmtEIVcVVo4c7uz1aFD8MYbFDY5jXOfe4xL1izj/qWTSPj7AC/1GMS33//CT0+/zoGEmjxz2SDS4zqSlbstSN+JCCeehlvi3LzhFxYVM29Nfrn+SQUJNTkcX5Ufmp/LqgZn8PJ379KkYIeM4QeZDOlEIV9/acoct3u3GYN/913Yt4/fT2/PW5feyx91T6PJ/h2saHwOKEXSgm0cLW5N4R0fmK+TcrqYMbRnSx6emOvyuRKtXVZygfl/Zp/XKb37BC69ZzR7EmvS6OAeZv7nAd6b+goP3v9uML+FmCdX+FHI14mv0uPefReaNoXnnjOtD5Yu5Zr+L7Lg9PPYUbMuK5q0KR2yKSgsCsk+uCL8ZKSmUDsx3uVzyUkJPjVds3dPTUqIJ79GbUoqxZFXqz5Dez9M+x3rGLNmclBiF4Yk/CjkqhOls9LaeK3NDkYdOphNqr/+Gi66qMLVEnIrHhuevvIct11Ofe2AmpGaQu7TPXjLYWP23y7oxoYBt9Ni/MfwzTfB/jZilgzpRCFXlRZuq3Q2b4a9e+GWW0yPcxt35XnV4iux73D5drhSThcbfOly6usajHLlu0fTYW0u3HYb5OZCkybB/FZiklTpxLrJk+Haa2H5crjggjJPuep1Apx0J00hvFq3Ds491wwxzpgBzZpZHVHECXq3TKVUL+BtIA4YrbUe4fT8bcBIIM/20Hta69GBOLc4OfZkfsO0SQyqVImZJafS1+kYTwuoZCWtCIozz4Rp0+Dqq+HCC0375Y4drY4qavid8JVSccD7wOXAVmCFUmqa1nq106ETtdaD/T2f8J9jnX6bHetZd2pTnpixjpKq1XxK3LKSVvjLY6fMrl1h6VLo08f8/dNPYcAAS+ONFoGYtL0AWK+13qi1PgZMAK4KwOuKICmt09eac3ZuYFXDFlJpI0LGfsGRV1CI5kSnzKycvBMHnX02LFtmhhkHDjQVZGE6/BxJApHwU4AtDp9vtT3m7Bql1C9Kqa+UUi5nY5RS9yilspVS2fn5+QEITbhir6hpdHA3dQ/v59cGZ5R5XIhg8tSCuYy6dc2+uTffDE8/DY89FsIoo1MgEr6rJXbOb8XfAM201u2AOcBYVy+ktf5Ia52mtU6rV096agSLvaKm7Y71APzasEWZx4UIpgq1YK5aFcaOhUGD4I03YMmSIEcX3QKR8LcCjlfsjYEya+211nu01kdtn34MnBeA8woPsnLySB8xl+aZM0gfMbfM7fLTbRO5M3cGDy6ZQLGqxOr6zaVnvQgZXxZolaEUvPaaKdO85x44diyI0UW3QCT8FcCZSqnmSqkqwABgmuMBSqlGDp/2BX4PwHmFG+7GSH/4dBq0bUuPPh0ZPuv/qFl8lFcvvZVT69WWskoRMr4u0CqjRg14/3347TeT/MVJ8btKR2t9XCk1GJiFKcv8RGv9m1LqOSBbaz0NeFAp1Rc4DuwFbvP3vMI9V2OkiQV7OOfBh6BODXNrfMUVND3rLIYBw6wJU8QoXxZvuXTllWbNyHPPQf/+poRTVIgsvIpCzm1slS5h7KSnuWDrb1RbuQLatbMsNiH8sm2bWRGelgZz5khbbhekPXKMcR4LvW/pf+m8OYe3+g6WZC8iW3Ky2Wxn7lyzUbqoELnCD1P2hSl5BYXEKUWx1qT4eOvruLDqwr9W8cWEJ5nZujPF4z4j49zGIfoOhAiSkhLo1AnWroU1a0z5pigV9NYKIrCcd6wqtr0p5/nYe97+3KivV/DONyPJOzUZNWqUJHsR9tz1byo33v/hh5CaCkOGmLJN4RO5wg9DrjYMd+TzNnD2xmjz5kGXLoELUIggcLU1Z3wlBQqKik/kqdJmfV99AC+9BD/8AJ07WxFyWJIx/AjjbcWrzytit241H9u29TMiIYLPVXVZUYkuk+zBYVXuk0+arpoPPADHj4cy1IglCT8MeVvx6ul5xwVXX0xeTHHVqlCnTqBDFCLgKtLaY1tBISQmwptvwi+/wKhRQYwsekjCD0OedqzytEDFecHVKfk72Jp4qmwyLiJCRVp7lB7brx9cfjkMHw67dgUpsughCT8M2ff9TLH9p46z1RqnJCV4XBHrfEvc6OBu8k6pK10wRURwdaETX0kRH1e21r7MRY9S8M47cOgQPPVUqEKNWFKlE6Yq2nM+Kyev3ERvw4O7Wd6kjXTBFBHB3QpcV4+V+d04+2y4/Xb44guT/KtVsyL8iCAJPwrYh3IcVSoppsGhvWyvWU+6YIqI4e5Cx+vFT0YGfPyxqdjp2TNI0UU+GdKJAq6qG5IP5BNfUszupPrSBVNEv65dISHB7IMr3JKEHwVcDdn0XrsYgIv/OUC6YIrol5AA3bvD9OmyM5YHkvADyFMP+mByNWTT77d5/NqkFT36pockBiEs16cPbNpk2i0IlyThB4hP+3QGiXN1w9m7NtEqfzPFN9wY9HMLETZ69zYfp0+3No4wJgk/QHzepzMIHMs4FXDThkWUVK5M+yH3BP3cQoSNJk2gfXv4739lWMcNqdIJkArt0xkEpdUNR47A6XebSgXZF1jEmvvvN9sgTpgAAwdaHU3YkSv8AKnwPp3B8vHHsH07PPpoaM8rhEUc58465Tej4Oy2MHSoWYwlypCEHyAntU+nr/bt8+24wkLTPfDSS02ZmhBRznnubMuBY9x70e2Ql2d+F0QZkvADxHkc3VsbBJ/Nm2ean82Z4/6YzZtNE6kuXWDHDnj2Wdn6TcQEV3NnSxucxbepl8Prr8Mff1gUWXiSMfwAqmg7BJ/MnGk+LlkCl11m/q41rFoFWVnw9deQm2seb9sW3n7bXOELEQPczZH9++Jb+MefK+G668zvTmJiiCMLT5Lww92ePeZjUREsXmwSfFYWbNhgruI7doSRI83S8hYtrI1ViBCx74zlrhanSuNkGD/elGrefTd8+CHUqBHSGMOR7HhlMVdbupW5S+jYEZYtO/F5fLxZUdivH/TtCw0bhj5oISzkamcsR6U7YqWmwAsvmNbJtWrBXXfB4MHQrFloAw4xTzteScK3kKv/uGX+s5aUQM2a8Pff5tY0I8NcsdSqZWHUQljL0xagKa4umpYuNUOdX31lhkP79oWHHjJDn1E41yVbHIahrJw8hkz62fNirU2bTLIfPRomTjR1xZLsRYxzN26vgMWZ3UqTfWm55tS9pHe4h++mL4XMTFi40FSxdegAY8aY6rYYIQnfAvYr+2I3d1el/6Htdzjt2oUoMiHCny9rXly1Onlk8R6yrr0PtmwxF1FghnmaNIGpU0MQufUk4VvAVSmZo+SkBFN7/8QTZrxREr4QpXxZ8+Kx1UlCAtx5p6lumzfPDPNMnhyS2K0mVToW8NRuISE+jqE9zjKVBXl5sGgRVK0awuiECG/udsZyHLd3N8Zf5ndPKbN2pVkz2Ls3iBGHD0n4FkhOSnD5HzJOKTNhu/wbc8Xx6qtkVWnMyBFz3VfxCBGDXK15sVe8uUv24GY4qE6dmEn4MqRjAXe3pK9f156MuD3w8MPQqxdZ3QZY1nJZiEjiOGbvjgLXrU4k4YtgctuG4awkuP56qF0bxo5l5Ox1lrVcFiKSeJsXA9C42Rs3hhK+DOlYxGUbhjvvhLVrYfZsqF/f8pbLQkQKX34nUtx1rrUnfK2jsi7fkVzhh4svvoBPPoEnnzQraQmjlstChDlvvxPuOtdm5eTxbu4eKC7m8menR/1wqST8IPJ5j9v162HQIEhPh6efLn04qC2XhYgirn5X7Nfq7jrX2sf9/9LVACjcmR/1c2QypBMkzm0T7BOu4DSOqDXccIPpkfPFF1D5xD+JL+VnQoiT+12xj/sXJJwCQK3Cg2y1zZFF6++YJPwg8bTwo8x/ppwcWLECRo2Cpk3LvU5QWi4LEYUq+rtiH/ffX8100Uw6cqjM49FIhnSCxOcJ18mTIS4OrrkmBFEJIezs4/4FtoRfy5bwo3mOTBJ+kPg84TpliunaV7duCKISQtjZx/1PXOEfjPo5Mkn4QeLThOuqVbBmDVx9dYijE0LY18NUTW7E7sRa3PHzTF7p3SKqh1ADkvCVUr2UUmuVUuuVUpkunq+qlJpoe365UqpZIM4bzrzucbt6NfTpY3bhkeEcISyRkZrCgid7UHfyBFrs2ETf/7xqdUhB5fekrVIqDngfuBzYCqxQSk3TWq92OOxOYJ/WuoVSagDwCnC9v+cOd24nkRYuNJswVK0KP/wgu1YJYbVevWDYMHj5ZTPEeuONVkcUFIGo0rkAWK+13giglJoAXAU4JvyrgGdsf/8KeE8ppXS4brcVQM5bGL5ZaR0XDH8QmjeH776L+u3WhIgYzz1nLsYGDYK0NGgZfWP5gRjSSQG2OHy+1faYy2O01seB/cCpzi+klLpHKZWtlMrOz88PQGjWct6EodfsL7ngiX+yp3V7WLJEkr0QYaB0geRTs8i4ZDBH46tC//5RuRNWIBK+q+YTzlfuvhyD1vojrXWa1jqtXr16AQjNWvZafKVLGP79xwyfO5qZZ13MtVc/Z/p3CCEs5XxRlqtrMPgfj5iCiocesjq8gAtEwt8KNHH4vDGwzd0xSqnKQC0g6tvTbSso5Ibcb5k1ZjB3Zk9lTNpV3J+Ryea/PXf1E0KEhqsFkrObpjKuy0D4+GMYP96iyIIjEAl/BXCmUqq5UqoKMACY5nTMNOBW29+vBebGwvh9Zs4UXpr1PoerVOOhPkN4vvvdaFUpqhd2CBFJ3C2QfO78AXDJJWY8f230tCP3O+HbxuQHA7OA34FJWuvflFLPKaX62g4bA5yqlFoPPAqUK92MOm+9xaD/fcLUtt3pd/NrTD2nKyDNz4QIJ+4uvhrUqQFffmn2v+3fHw4fDnFkwRGQOnyt9Uyt9Vla6zO01i/aHvu31nqa7e9HtNb9tdYttNYX2Ct6ooljZ8xXrn4UHnnE1NePGUNy7equa/GFEJbyuECycWMYO9aM548bZ1GEgVKGykgAABLXSURBVKXCdWQlLS1NZ2dnWx2GTxw7Y/Zes4j3pr7CwhZpFIyfyFUXNLc6PCGEB86l02W6bGoNLVpAq1Ywfbq1gfpIKbVSa53m6jnplhkAjhM/N+bOZHPtRtzTN5O6czdJwhcizHnssqmUWRH/0UdmWCcxMbTBBZj00gkAx4mf6scK+SupEUfjq0Z1m1UhYkafPnDkCMyda3UkfpOEHwCOEz+Jx45yOL5quceFEBGqc2fT8ypChnQ8kYQfAI4TP4lFhRRWqSbVOEJEi6pVTa+dSZNg926ro/GLJPwAcOyMmVh0FGrUkGocIaLJM8/AwYPw+ONWR+IXSfgBkpGawuLMbtQpOco1nWTfWSEihWNJdfqIua43MT/nHHjsMfjPf2DBgtAHGSCS8AOpuBiOmit8IUT4c+6lk1dQyLApq1wn/eHDTcPDe++FY8dKv97rm0UYkYQfSH//bT5Wr25tHEIIn7jqpVNYVMzIWS7aKSQmwnvvmc2L/v1v5n8+g8/en0LVDevQWnt+swgTUodfAR4XaAAcMpsgyxW+EJHBXem025LqK64wW5K+8gpdeIUutof73zCCFU3alL5ZhOuQriR8HzmupoUTt37Zf+5l3pp8thUUcsHxPUwEucIXIkIkJyWQ5yK5eyyp/vJLmD+fOz5eQq3Cg7w54w1a7v6LFU3aAB7eLMKADOn4yN2t3/hlf5WO/x3aUwDA8l1HLYhQCFFRHnvpuFOlCvTowdrzOpN1TheOxsXTpGBH6dPhvP5GEr4PsnLyXF4FQNldXBKKjgDw5W97QhCVEMJfjiXVFW1wOLRnS6pViWdrrQalCT/c19/IkI4X9qEcXyQeMwn/r2PyPipEpPDYSwf3c3f2r9k1qRFN9u8kxdW8XpiRhO+Fq6EcO0XZK/xE2xV+jTq1gh+YECLo3M3dgcMbRfc0mDCBxZndrAzVJ3Ip6oWnCZgbL2paZvyvTuEBAG66vE3Q4xJCBJ9PZZunnw779kFBQYijqzhJ+F64m4BJSUrghYy2Zcb/rtz0I4cbptCj94WhDVIIERQ+lW02t7VA37QpBBH5RxK+F95m8e0tFTY9fB4d168k8Y5boZL8WIWIBu4u+Mo8Lgk/evg8i//ll1BSAjfdZEmcQojA63p2PZTTYwozll/aSuH0080TG8N/51ZJ+D6wX8W/eX0HAB6ZmFu+b8bnn8N555mt0IQQES8rJ4/JK/Nw3gTW/nlpK4VNf5uk//33oQ6xwiTh+8hjk6XVq+Gnn+Dmm60OUwgRIJ4q9OxKJ3CvucYk/H37QhTdyZGE7yOPs/Wffw5xcTBwoEXRCSECzdcWCdsKCuHaa6GoCL75JshR+UcSvo/c/ePnFRSy6pNJrDytLVl5RSGOSggRLL62SEhOSoDzz4cmTWDy5CBH5R9J+D5y949fregIrXZtYkmDlmHfGlUI4TtXFXrOSiv2lDLDOrNmwYEDIYqw4iTh+8jdP36bnRuorEvITT7LfR9tIUTEcVWhd9NFTd1X7F17rdkAado0K8P2SFor+Mj+j2rvqaG1pse6ZTy68HOKVSVyG5m6/HBujSqEqBhvfXbK6NgRTjsNxo0L2/JsSfgVkJGaQkaHZJgxgzWDHuXsbevYWDuZ+67KZE/1JCC8W6MKIYKoUiW49VZ4/nnYssWM6YcZGdKpCK2hWze48kqaVDpGZt8hXH7X/zGr5cVA+LdGFUIE2W23QeXK0K8f7N1rdTTlSMKnAhsRHz8O8+fD7bdTfeM6LnrmERrWqVHhPtpCiCjVvDlMmQKrVpmLw/x8qyMqI+aHdLy1Py1D29bYtWgB8fEVG98TQsSGPn1MPf5VV0HXrjBnDjRsaHVUgFzhV2zXeu28yFoIIVzo0QNmzDAN1bp0gbzwKNeO+YRfoV3r7QlfObdTEkIIp+HhH2HhO5+ZZH/ppfDXX1aHJwnfp/anziThCyGcuOq3dc/mRH54bzzs3m2SvsUtlGM+4Vdo1/oVK8zH+vVDEJkQIpK4Gx7+1/YaprHa/v0m6a9fb1GEkvArtmv9iy9CvXowYEDI4xRChDePw8PnnQfz5kFhIXTuDGvWhDg6I+ardMDH1XTZ2aZPxssvQ2JiaAITQkSMWgnxFBSWb6BYOjzcvr0p6+7e3Vzp//QTpIS2yi/mr/B99vrrUKsW3Hef1ZEIIcJMVk4efx87Xu7x+Eqq7PDwOefA3LlmeGfo0BBGaEjC98XBg5CVZfpj1KxpdTRCiDAzctZaiorLl23XqFa5/OhB69bw+ONmW9QFC0IUoeFXwldK1VFKzVZKrbN9rO3muGKlVK7tT/i2knMnKwuOHIEbbrA6EiFEmMnKySPPzfh9wWE3e2RkZkLTpvDAA2YFf4j4e4WfCXyvtT4T+N72uSuFWusOtj99/Txn6I0fD82amW54QghhYy/FdMdteXdiohkm/uUXGDUqSNGV52/CvwoYa/v7WCDDz9cLGZ/75+zcaZZGDxwo9fdCiDI87XvrtZniNdeYCdzhw6GgIEgRluVvwm+gtd4OYPvorkC9mlIqWym1TCnl9k1BKXWP7bjs/CA2HfK4Ibmz//4XiovhxhuDFo8QIjJ52v/CazNFpUzVX0EBTJwYhOjK85rwlVJzlFK/uvhzVQXO01RrnQbcALyllDrD1UFa64+01mla67R69epV4OUrpkL9c774Atq1M7PrQgjhwN2QTUpSgm+NFdPSzCTuuHEBjsw1rwlfa32Z1rqNiz9TgZ1KqUYAto+73LzGNtvHjcB8IDVg38FJKPeurDUZv83jWN62so9v3AhLl8pkrRDCpQqt1HdFKbNpypIlIVmB6++QzjTgVtvfbwWmOh+glKqtlKpq+3tdIB1Y7ed5/eL8rpz+58+8Nf11bt60pOyBEyaYj7KyVgjhQoVW6rtz440m8YfgKt/flbYjgElKqTuBv4D+AEqpNOCfWuu7gFbAh0qpEswbzAittaUJf2jPlmV64N+77L8AXN60+omDtDbVOZdcYvapFEIIF/zeFyMlBS67DD77DJ55xmyVGCR+JXyt9R6gu4vHs4G7bH9fArT15zyB5rghed3ff+aSP38GoFVS/ImDVq2C1avhgw+sCFEIEUtuuQVuvhmWLYOLLw7aaWJ2pW1GagqLM7sx9dAiSEqChAQ4fPjEAV98Yfam7N/fuiCFELGhVy/zMcgrb2M24QOmY93XX8P995uWCYW2ydySErPsuUcPqFvX2hiFENGvbl1o2RIWLQrqaWI74b/6KlSrBg89ZK7w7Ql/yRKzO43U3gshQiU93eSekpKgnSJ22yPv2gWffw6DBpke94mJJ4Z0xo83n/eNvC4QQojIkJWTx8hZa9lWUEhyUgLvNjmHc/d9YkYeWrcOyjlj9wp/6VIoKjItE+DEFf6xYzBpktlxvkYNa2MUQkQlV6v9n9xl68S7eHHQzhu7CX/lSlP+1KEDWTl5/LznKEt/3cJj97wGe/fKYishRNC4Wu3/+ykN2Vc9Kajj+LGb8LOz4ZxzyFq7j2FTVrGfeKoVHeOSFf+jIOEUptaXVgpCiMCyN2102U5ZKVYkny1X+IFS2iHziensXbiUP5u1Kn2nPRJfldqFB+ixbhkzWqbz6lxrd5cXQkQXx2Ecd/5o0R42bDBdeoMgZhK+4w+74cHd1DlUwLhjp5b+8AsrV6VZwXYSi44ytXUXj13whBCiojy1UgbTg6ftdb3NJ0G6yo+ZKh3HH3bbHaZJ0U/1ziBOKYq15kjlKgDknVKPFY1bu9+4QAghToKni8iUpASG9mzJpa3rwrPNzJ63QRAzV/iOP+w2O9ZzXFVidf3mFGtNQnwcR+JNwv+mdWeqVYn3vdudEEL4wFMr5aE9WzJy1lqaPz2H9EFjyOrQIygxxEzCd/xht9uxnnV1m3I0vmppd7vKNUzjtKUX9qx4tzshhPDCXSvlrmfX831DJj/FTMIv/WFrTZud61nVsEVp3+qM1BRufO0xeOopxr51lyR7IUTAuWulPG9Nvu8bMvkpZsbw7Ul87MSF1D28ny3NW5W9kj//fPNHCCGCxFUr5Ucm5ro8NhiFIzGT8MH2w95cFYAh/7oR5EpeCGGx5KQEl6WawSgcicohndJ6+8wZpI+YW3YsbOVKiIsz+9QKIYTF/N4msQKi7grfXm9vHxOzT4CAbVhn5UqzIXmClF0KIaznuCGTvZGafW4x0KIu4bta3GCfAMlIrmx2lOnXz6LohBCiPL+3SfRR1A3puJvoOLx9J1x+uemGee+9IY5KCCGsF3UJ39VER80jh/hy8jPwxx8wbZpU4wghYlLUJXznCZDqRw8zdvKznLVzE0yZAt3L7bkuhBAxIerG8B0nQPbm7+OzaS/SfvsfVJo0CXr3tjg6IYSwTtQlfLBNgLSua7Yo3PSL2crw6qutDksIISwVlQkfrc3Whf/7H4wZI7tXCSEEUTiGD8COHfD11zBsGNxxh9XRCCFEWIjOhL9nj/mYmmptHEIIEUaiM+Hv3Ws+1qljbRxCCBFGJOELIUSMiM5JW0n4QogIlJWTF9SeOpLwhRAiDHht/BgA0TukU7ky1KhhdSRCCOETT40fAyXqEn5WTh5Z369id5UapL8yLyj7QgohRKC5a/wYyJ2voirh22+JKu8voKBajaBuBiyEEIHkboerQO58FVUJ335LlHTkIAUJpwDB2wxYCCECKRQ7X0XVpK391mfUhddSueR4uceFECJchWLnq6hK+PbNgBc1Ty33uBBChLtg73wVVUM6odwMWAghIk1UXeGHcjNgIYSINH4lfKVUf+AZoBVwgdY6281xvYC3gThgtNZ6hD/n9SRUmwELIUSk8XdI51fgamCBuwOUUnHA+8A/gNbAQKVUaz/PK4QQooL8usLXWv8OoJTydNgFwHqt9UbbsROAq4DV/pxbCCFExYRi0jYF2OLw+VbbY+Uope5RSmUrpbLz8/NDEJoQQsQOr1f4Sqk5QEMXTz2ptZ7qwzlcXf5rVwdqrT8CPgJIS0tzeYwQQoiT4zXha60v8/McW4EmDp83Brb5+ZpCCCEqKBRDOiuAM5VSzZVSVYABwLQQnFcIIYQDpfXJj5wopfoB7wL1gAIgV2vdUymVjCm/7G07rjfwFqYs8xOt9Ys+vHY+8OdJhFUX2H0SXxds4RoXhG9sElfFhWtsElfFnWxsp2mt67l6wq+EH46UUtla6zSr43AWrnFB+MYmcVVcuMYmcVVcMGKLqtYKQggh3JOEL4QQMSIaE/5HVgfgRrjGBeEbm8RVceEam8RVcQGPLerG8IUQQrgWjVf4QgghXJCEL4QQMSKqE75S6jGllFZK1bU6FgCl1PNKqV+UUrlKqf/Z1itYTik1Uim1xhbb10qpJKtjslNK9VdK/aaUKlFKWV4+p5TqpZRaq5Rar5TKtDoeO6XUJ0qpXUqpX62OxZFSqolSap5S6nfbv+NDVscEoJSqppT6USn1sy2uZ62OyZFSKk4plaOUmh7I143ahK+UagJcDvxldSwORmqt22mtOwDTgX9bHZDNbKCN1rod8AcwzOJ4HHltwR0qYd7q+1Ogl9VBuHAcGKK1bgVcBNwfJj+zo0A3rXV7oAPQSyl1kcUxOXoI+D3QLxq1CR94E3gcN43arKC1PuDwaXXCJDat9f+01vZd35dh+h2FBa3171rrtVbHYVPa6ltrfQywt/q2nNZ6AbDX6jicaa23a61/sv39ICaJWb5DkTYO2T6Nt/0Ji99HpVRj4ApgdKBfOyoTvlKqL5Cntf7Z6licKaVeVEptAW4kfK7wHd0BfGt1EGHK51bfojylVDMgFVhubSSGbdgkF9gFzNZah0VcmDY0jwMlgX7hiN3T1lPbZuBfQI/QRmR4ayettX4SeFIpNQwYDDwdDnHZjnkScws+PhQxVSS2MOFzq29RllKqBjAZeNjpTtcyWutioINtzuprpVQbrbWlcyBKqT7ALq31SqVUl0C/fsQmfHdtm5VSbYHmwM+2nbgaAz8ppS7QWu+wKi4XvgBmEKKE7y0updStQB+guw7x4owAtOAOFWn1fRKUUvGYZD9eaz3F6nicaa0LlFLzMXMgVk96pwN9bQ0nqwE1lVKfa61vCsSLR92QjtZ6lda6vta6mda6GeaX9NxQJHtvlFJnOnzaF1hjVSyObJvMPwH01VoftjqeMCatvitImauuMcDvWus3rI7HTilVz16NppRKAC4jDH4ftdbDtNaNbblrADA3UMkeojDhh7kRSqlflVK/YIacwqJEDXgPOAWYbSsZHWV1QHZKqX5Kqa1AR2CGUmqWVbHYJrYHA7Mwk4+TtNa/WRWPI6XUl8BSoKVSaqtS6k6rY7JJB24Gutn+b+Xarl6t1giYZ/tdXIEZww9oCWQ4ktYKQggRI+QKXwghYoQkfCGEiBGS8IUQIkZIwhdCiBghCV8IIWKEJHwhhIgRkvCFECJG/D93auf9SR44sAAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"#hide_input\n",
"#id img_mommentum\n",
"#caption An example of momentum\n",
"#alt Graph showing an example of momentum\n",
"x = np.linspace(-4, 4, 100)\n",
"y = 1 - (x/3) ** 2\n",
"x1 = x + np.random.randn(100) * 0.1\n",
"y1 = y + np.random.randn(100) * 0.1\n",
"plt.scatter(x1,y1)\n",
"idx = x1.argsort()\n",
"beta,avg,res = 0.7,0,[]\n",
"for i in idx:\n",
" avg = beta * avg + (1-beta) * y1[i]\n",
" res.append(avg/(1-beta**(i+1)))\n",
"plt.plot(x1[idx],np.array(res), color='red');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It works particularly well if the loss function has narrow canyons we need to navigate: vanilla SGD would send us from one side to the other while SGD with momentum will average those to roll down inside. The parameter `beta` determines the strength of that momentum we are using: with a small beta we stay closer to the actual gradient values whereas with a high beta, we will mostly go in the direction of the average of the gradients and it will take a while before any change in the gradients makes that trend move.\n",
"\n",
"With a large beta, we might miss that the gradients have changed directions and roll over a small local minima which is a desired side-effect: intuitively, when we show a new picture/text/data to our model, it will look like something in the training set but won't be exactly like it. That means it will correspond to a point in the loss function that is closest to the minimum we ended up with at the end of training, but not exactly *at* that minimum. We then would rather end up training in a wide minimum, where nearby points have approximately the same loss (or if you prefer, a point where the loss is as flat as possible). <<img_betas>> shows how the chart in <<img_momentum>> varies as we change beta."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide_input": true
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x576 with 4 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"#hide_input\n",
"#id img_betas\n",
"#caption Momentum with different beta values\n",
"#alt Graph showing how the beta value imfluence momentum\n",
"x = np.linspace(-4, 4, 100)\n",
"y = 1 - (x/3) ** 2\n",
"x1 = x + np.random.randn(100) * 0.1\n",
"y1 = y + np.random.randn(100) * 0.1\n",
"_,axs = plt.subplots(2,2, figsize=(12,8))\n",
"betas = [0.5,0.7,0.9,0.99]\n",
"idx = x1.argsort()\n",
"for beta,ax in zip(betas, axs.flatten()):\n",
" ax.scatter(x1,y1)\n",
" avg,res = 0,[]\n",
" for i in idx:\n",
" avg = beta * avg + (1-beta) * y1[i]\n",
" res.append(avg)#/(1-beta**(i+1)))\n",
" ax.plot(x1[idx],np.array(res), color='red');\n",
" ax.set_title(f'beta={beta}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see in these examples that a beta that's too high results in the overall changes in gradient getting ignored. In SGD with momentum, a value of `beta` that is often used is 0.9.\n",
"\n",
"`fit_one_cycle` by default starts with a beta of 0.95, gradually adjusts it to 0.85, and then gradually moves it back to 0.95 at the end of training. Let's see how our training goes with momentum added to plain SGD:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to add momentum to our optimizer, we'll first need to keep track of the moving average gradient, which we can do with another callback. When an optimizer callback returns a dict, it is used to update the state of the optimizer, and is passed back to the optimizer on the next step. So this callback will keep track of the gradient averages in a parameter called `grad_avg`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def average_grad(p, mom, grad_avg=None, **kwargs):\n",
" if grad_avg is None: grad_avg = torch.zeros_like(p.grad.data)\n",
" return {'grad_avg': grad_avg*mom + p.grad.data}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To use it, we just have to replace `p.grad.data` with `grad_avg` in our step function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def momentum_step(p, lr, grad_avg, **kwargs): p.data.add_(-lr, grad_avg)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"opt_func = partial(Optimizer, cbs=[average_grad,momentum_step], mom=0.9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Learner` will automatically schedule `mom` and `lr`, so fit_one_cycle will even work with our custom Optimizer:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>2.856000</td>\n",
" <td>2.493429</td>\n",
" <td>0.246115</td>\n",
" <td>00:10</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>2.504205</td>\n",
" <td>2.463813</td>\n",
" <td>0.348280</td>\n",
" <td>00:10</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>2.187387</td>\n",
" <td>1.755670</td>\n",
" <td>0.418853</td>\n",
" <td>00:10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn = get_learner(opt_func=opt_func)\n",
"learn.fit_one_cycle(3, 0.03)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"learn.recorder.plot_sched()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're still not getting great results, so let's see what else we can do."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## RMSProp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"RMSProp is another variant of SGD introduced by Geoffrey Hinton in [Lecture 6e of his Coursera class](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf). The main difference with SGD is that it uses an adaptive learning rate: instead of using the same learning rate for every parameter, each parameter gets it's own specific learning rate controlled by a global learning rate. That way we can speed up training by giving a high learning rate to the weights that needs to change a lot while the ones that are good enough get a lower learning rate.\n",
"\n",
"How do we decide which parameter should have a high learning rate and which should not? We can look at the gradients to get an idea. Not just the one we computed, but all of them: if they have been close to 0 for a while, it means this parameter will need a higher learning rate because the loss is very flat. On the opposite, if they are all over the place, we should probably be careful and pick a low learning rate to avoid divergence. We can't just average the gradients to see if they're changing a lot, since the average of a large positive and a large negative number is close to zero. So we can use the usual trick of either taking the absolute value, or the squared values (and then taking the square root after the mean).\n",
"\n",
"Once again, to pick the general tendency behind the noise, we will use a moving average, specifically the moving average of the gradients squared. Then, we will update the corresponding weight by using the current gradient (for the direction) divided by the square root of this moving average (that way if it's low, the effective learning rate will be higher, and if it's big, the effective learning rate will be lower).\n",
"\n",
"```python\n",
"w.square_avg = alpha * w.square_avg + (1-alpha) * (w.grad ** 2)\n",
"new_w = w - lr * w.grad / math.sqrt(w.square_avg + eps)\n",
"```\n",
"\n",
"The `eps` (*epsilon*) is added for numerical stability (usually set at 1e-8) and the default value for `alpha` is usually 0.99."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can add this to `Optimizer` by doing much the same thing we did for `avg_grad`, but with an extra `**2`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def average_sqr_grad(p, sqr_mom, sqr_avg=None, **kwargs):\n",
" if sqr_avg is None: sqr_avg = torch.zeros_like(p.grad.data)\n",
" return {'sqr_avg': sqr_avg*sqr_mom + p.grad.data**2}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And we can define our step function and optimizer as before:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def rms_prop_step(p, lr, sqr_avg, eps, grad_avg=None, **kwargs):\n",
" denom = sqr_avg.sqrt().add_(eps)\n",
" p.data.addcdiv_(-lr, p.grad, denom)\n",
"\n",
"opt_func = partial(Optimizer, cbs=[average_sqr_grad,rms_prop_step],\n",
" sqr_mom=0.99, eps=1e-7)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try it out:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>epoch</th>\n",
" <th>train_loss</th>\n",
" <th>valid_loss</th>\n",
" <th>accuracy</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>2.766912</td>\n",
" <td>1.845900</td>\n",
" <td>0.402548</td>\n",
" <td>00:11</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>2.194586</td>\n",
" <td>1.510269</td>\n",
" <td>0.504459</td>\n",
" <td>00:11</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>1.869099</td>\n",
" <td>1.447939</td>\n",
" <td>0.544968</td>\n",
" <td>00:11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn = get_learner(opt_func=opt_func)\n",
"learn.fit_one_cycle(3, 0.003)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Much better! Now we just have to bring these ideas together, and we have Adam, fastai's default optimizer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adam"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Adam mixes the ideas of SGD with momentum and RMSProp together: it uses the moving average of the gradients as a direction and divides by the square root of the moving average of the gradients squared to give an adaptive learning rate to each parameter.\n",
"\n",
"There is one other difference with how Adam calculates moving averages, is that it takes the *unbiased* moving average which is:\n",
"\n",
"``` python\n",
"w.avg = beta * w.avg + (1-beta) * w.grad\n",
"unbias_avg = w.avg / (1 - (beta**(i+1)))\n",
"```\n",
"\n",
"if we are the `i`-th iteration (starting at 0 like python does). This divisor of `1 - (beta**(i+1))` makes sure the unbiased average looks more like the gradients at the beginning (since `beta < 1` the denominator is very quickly very close to 1).\n",
"\n",
"Putting everything together, our update step looks like:\n",
"``` python\n",
"w.avg = beta1 * w.avg + (1-beta1) * w.grad\n",
"unbias_avg = w.avg / (1 - (beta1**(i+1)))\n",
"w.sqr_avg = beta2 * w.sqr_avg + (1-beta2) * (w.grad ** 2)\n",
"new_w = w - lr * unbias_avg / sqrt(w.sqr_avg + eps)\n",
"```\n",
"\n",
"Like for RMSProp, `eps` is usually set to 1e-8, and the default for `(beta1,beta2)` suggested by the literature `(0.9,0.999)`. \n",
"\n",
"In fastai, Adam is the default optimizer we use since it allows faster training, but we found that `beta2=0.99` is better suited for the type of schedule we are using. `beta1` is the momentum parameter, which we specify with the argument `moms` in our call to `fit_one_cycle`. As for `eps`, fastai uses a default of 1e-5. `eps` is not just useful for numerical stability. A higher `eps` limits the maximum value of the adjusted learning rate. To take an extreme example, if `eps` is 1, then the adjusted learning will never be higher than the base learning rate. \n",
"\n",
"Rather than show all the code for this in the book, we'll let you look at the optimizer notebook in fastai's GitHub repository--you'll see all the code we've seen so far, along with Adam and other optimizers, and lots of examples and tests.\n",
"\n",
"One thing that changes when we go from SGD to Adam is the way we apply weight decay, and it can have important consequences."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Decoupled weight_decay"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We've discussed weight decay before, which is equivalent to (in the case of vanilla SGD) updating the parameters\n",
"with:\n",
"\n",
"``` python\n",
"new_weight = weight - lr*weight.grad - lr*wd*weight\n",
"```\n",
"\n",
"This last formula explains why the name of this technique is weight decay, as each weight is decayed by a factor `lr * wd`. \n",
"\n",
"However, this only works correctly for standard SGD, because we have seen that with momentum, RMSProp or in Adam, the update has some additional formulas around the gradient. In those cases, the formula that comes from L2 regularization:\n",
"\n",
"``` python\n",
"weight.grad += wd*weight\n",
"```\n",
"\n",
"is different than weight decay:\n",
"\n",
"``` python\n",
"new_weight = weight - lr*weight.grad - lr*wd*weight\n",
"```\n",
"\n",
"Most libraries use the first formulation, but it was pointed out in [Decoupled Weight Regularization](https://arxiv.org/pdf/1711.05101.pdf) by Ilya Loshchilov and Frank Hutter, second one is the only correct approach with the Adam optimizer or momentum, which is why fastai makes it its default.\n",
"\n",
"Now you know everything that is hidden behind the line `learn.fit_one_cycle`!\n",
"\n",
"OPtimizers are only one part of the training process. When you need to change the training loop with fastai, you can't directly change the code inside the library. Instead, we have designed a system of callbacks to let you write any tweak in independent blocks you can then mix and match. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Callbacks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes you need to change how things work a little bit. In fact, we have already seen examples of this: mixup, FP16 training, resetting the model after each epoch for training RNNs, and so forth. How do we go about making these kinds of tweaks to the training process?\n",
"\n",
"We've seen the basic training loop, which, with the help of the `Optimizer` class, looks like this for a single epoch:\n",
"\n",
"```python\n",
"for xb,yb in dl:\n",
" loss = loss_func(model(xb), yb)\n",
" loss.backward()\n",
" opt.step()\n",
" opt.zero_grad()\n",
"```\n",
"\n",
"<<basic_loop>> shows how to picture that."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Basic training loop\" width=\"300\" caption=\"Basic training loop\" id=\"basic_loop\" src=\"images/att_00048.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The usual way for deep learning practitioners to customise the training loop is to make a copy of an existing training loop, and then insert their code necessary for their particular changes into it. This is how nearly all code that you find online will look. But it has some very serious problems.\n",
"\n",
"It's not very likely that some particular tweaked training loop is going to meet your particular needs. There are hundreds of changes that can be made to a training loop, which means there are billions and billions of possible permutations. You can't just copy one tweak from a training loop here, another from a training loop there, and expect them all to work together. Each will be based on different assumptions about the environment that it's working in, use different naming conventions, and expect the data to be in different formats.\n",
"\n",
"We need a way to allow users to insert their own code at any part of the training loop, but in a consistent and well-defined way. Computer scientists have already come up with an answer to this question: the callback. A callback is a piece of code that you write, and inject into another piece of code at some predefined point. In fact, callbacks have been used with deep learning training loops for years. The problem is that only a small subset of places that may require code injection have been available in previous libraries, and, more importantly, callbacks were not able to do all the things they needed to do.\n",
"\n",
"In order to be just as flexible as manually copying and pasting a training loop and directly inserting code into it, a callback must be able to read every possible piece of information available in the training loop, modify all of it as needed, and fully control when a batch, epoch, or even all the whole training loop should be terminated. fastai is the first library to provide all of this functionality. It modifies the training loop so it looks like <<cb_loop>>."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Training loop with callbacks\" width=\"550\" caption=\"Training loop with callbacks\" id=\"cb_loop\" src=\"images/att_00049.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The real test of whether this works has been borne out over the last couple of years — it has turned out that every single new paper implemented, or use a request fulfilled, for modifying the training loop has successfully been achieved entirely by using the fastai callback system. The training loop itself has not required modifications. <<some_cbs>> shows just a few of the callbacks that have been added."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Some fastai callbacks\" width=\"500\" caption=\"Some fastai callbacks\" id=\"some_cbs\" src=\"images/att_00050.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The reason that this is important for all of us is that it means that whatever idea we have in our head, we can implement it. We need never dig into the source code of PyTorch or fastai and act together some one-off system to try out our ideas. And when we do implement our own callbacks to develop our own ideas, we know that they will work together with all of the other functionality provided by fastai so we will get progress bars, mixed precision training, hyperparameter annealing, and so forth.\n",
"\n",
"Another advantage is that it makes it easy to gradually remove or add functionality and perform ablation studies. You just need to adjust the list of callbacks you pass along to your fit function."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As an example, here is the fastai source code that is run for each batch of the training loop:\n",
"\n",
"```python\n",
"try:\n",
" self._split(b); self('begin_batch')\n",
" self.pred = self.model(*self.xb); self('after_pred')\n",
" self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')\n",
" if not self.training: return\n",
" self.loss.backward(); self('after_backward')\n",
" self.opt.step(); self('after_step')\n",
" self.opt.zero_grad()\n",
"except CancelBatchException: self('after_cancel_batch')\n",
"finally: self('after_batch')\n",
"```\n",
"\n",
"The calls of the form `self('...')` are where the callbacks are called. As you see, after every step a callback is called. The callback will receive the entire state of training, and can also modify it. For instance, as you see above, the input data and target labels are in `self.xb` and `self.yb` respectively. A callback can modify these to modify the data the training loop sees. It can also modify `self.loss`, or even modify the gradients.\n",
"\n",
"Let's see how this work in practice by writing a `Callback`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating a callback"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you want to write your own callback, the full list of available events is:\n",
"\n",
"- `begin_fit`:: called before doing anything, ideal for initial setup.\n",
"- `begin_epoch`:: called at the beginning of each epoch, useful for any behavior you need to reset at each epoch.\n",
"- `begin_train`:: called at the beginning of the training part of an epoch.\n",
"- `begin_batch`:: called at the beginning of each batch, just after drawing said batch. It can be used to do any setup necessary for the batch (like hyper-parameter scheduling) or to change the input/target before it goes in the model (change of the input with techniques like mixup for instance).\n",
"- `after_pred`:: called after computing the output of the model on the batch. It can be used to change that output before it's fed to the loss.\n",
"- `after_loss`:: called after the loss has been computed, but before the backward pass. It can be used to add any penalty to the loss (AR or TAR in RNN training for instance).\n",
"- `after_backward`:: called after the backward pass, but before the update of the parameters. It can be used to do any change to the gradients before said update (gradient clipping for instance).\n",
"- `after_step`:: called after the step and before the gradients are zeroed.\n",
"- `after_batch`:: called at the end of a batch, for any clean-up before the next one.\n",
"- `after_train`:: called at the end of the training phase of an epoch.\n",
"- `begin_validate`:: called at the beginning of the validation phase of an epoch, useful for any setup needed specifically for validation.\n",
"- `after_validate`:: called at the end of the validation part of an epoch.\n",
"- `after_epoch`:: called at the end of an epoch, for any clean-up before the next one.\n",
"- `after_fit`:: called at the end of training, for final clean-up.\n",
"\n",
"This list is available as attributes of the special variable `event`; so just type `event.` and hit `Tab` in your notebook to see a list of all the options"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at an example. Do you recall how in <<chapter_nlp_dive>> we needed to ensure that our special `reset` method was called at the start of training and validation for each epoch? We used the `ModelReseter` callback provided by fastai to do this for us. But how did `ModelReseter` do that exactly? Here's the full actual source code to that class:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class ModelReseter(Callback):\n",
" def begin_train(self): self.model.reset()\n",
" def begin_validate(self): self.model.reset()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Yes, that's actually it! It just does what we said in the paragraph above: after completing training and epoch or validation for an epoch, call a method named `reset`.\n",
"\n",
"Callbacks are often \"short and sweet\" like this one. In fact, let's look at one more. Here's the fastai source for the callback that add RNN regularization (*AR* and *TAR*):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class RNNRegularizer(Callback):\n",
" def __init__(self, alpha=0., beta=0.): self.alpha,self.beta = alpha,beta\n",
"\n",
" def after_pred(self):\n",
" self.raw_out,self.out = self.pred[1],self.pred[2]\n",
" self.learn.pred = self.pred[0]\n",
"\n",
" def after_loss(self):\n",
" if not self.training: return\n",
" if self.alpha != 0.:\n",
" self.learn.loss += self.alpha * self.out[-1].float().pow(2).mean()\n",
" if self.beta != 0.:\n",
" h = self.raw_out[-1]\n",
" if len(h)>1:\n",
" self.learn.loss += self.beta * (h[:,1:] - h[:,:-1]\n",
" ).float().pow(2).mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> stop: Go back to where we discussed TAR and AR regularization, and compare to the code here. Made sure you understand what it's doing, and why."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In both of these examples, notice how we can access attributes of the training loop by directly checking `self.model` or `self.pred`. That's because a `Callback` will always try to get an attribute it doesn't have inside the `Learner` associated to it. This is a shortcut for `self.learn.model` or `self.learn.pred`. Note that this shortcut works for reading attributes, but not for writing them, which is why when `RNNRegularizer` changes the loss or the predictions, you see `self.learn.loss = ` or `self.learn.pred = `. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When writing a callback, the following attributes of `Learner` are available:\n",
"\n",
"- `model`: the model used for training/validation\n",
"- `data`: the underlying `DataLoaders`\n",
"- `loss_func`: the loss function used\n",
"- `opt`: the optimizer used to udpate the model parameters\n",
"- `opt_func`: the function used to create the optimizer\n",
"- `cbs`: the list containing all `Callback`s\n",
"- `dl`: current `DataLoader` used for iteration\n",
"- `x`/`xb`: last input drawn from `self.dl` (potentially modified by callbacks). `xb` is always a tuple (potentially with one element) and `x` is detuplified. You can only assign to `xb`.\n",
"- `y`/`yb`: last target drawn from `self.dl` (potentially modified by callbacks). `yb` is always a tuple (potentially with one element) and `y` is detuplified. You can only assign to `yb`.\n",
"- `pred`: last predictions from `self.model` (potentially modified by callbacks)\n",
"- `loss`: last computed loss (potentially modified by callbacks)\n",
"- `n_epoch`: the number of epochs in this training\n",
"- `n_iter`: the number of iterations in the current `self.dl`\n",
"- `epoch`: the current epoch index (from 0 to `n_epoch-1`)\n",
"- `iter`: the current iteration index in `self.dl` (from 0 to `n_iter-1`)\n",
"\n",
"The following attributes are added by `TrainEvalCallback` and should be available unless you went out of your way to remove that callback:\n",
"\n",
"- `train_iter`: the number of training iterations done since the beginning of this training\n",
"- `pct_train`: from 0. to 1., the percentage of training iterations completed\n",
"- `training`: flag to indicate if we're in training mode or not\n",
"\n",
"The following attribute is added by `Recorder` and should be available unless you went out of your way to remove that callback:\n",
"\n",
"- `smooth_loss`: an exponentially-averaged version of the training loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Callbacks can also interrupt any part of the training loop by using a system of exceptions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Callback ordering and exceptions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes, callbacks need to be able to tell fastai to skip over a batch, or an epoch, or stop training altogether. For instance, consider `TerminateOnNaNCallback`. This handy callback will automatically stop training any time the loss becomes infinite or `NaN` (*not a number*). Here's the fastai source for this callback:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class TerminateOnNaNCallback(Callback):\n",
" run_before=Recorder\n",
" def after_batch(self):\n",
" if torch.isinf(self.loss) or torch.isnan(self.loss):\n",
" raise CancelFitException"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The way it tells the training loop to interrupt training at this point is to `raise CancelFitException`. The training loop catches this exception and does not run any further training or validation. The callback control flow exceptions available are:\n",
"\n",
"- `CancelFitException`:: Skip the rest of this batch and go to `after_batch\n",
"- `CancelEpochException`:: Skip the rest of the training part of the epoch and go to `after_train\n",
"- `CancelTrainException`:: Skip the rest of the validation part of the epoch and go to `after_validate\n",
"- `CancelValidException`:: Skip the rest of this epoch and go to `after_epoch\n",
"- `CancelBatchException`:: Interrupts training and go to `after_fit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can detect one of those exceptions occurred and add code that executes right after with the following events:\n",
"\n",
"- `after_cancel_batch`:: reached immediately after a `CancelBatchException` before proceeding to `after_batch`\n",
"- `after_cancel_train`:: reached immediately after a `CancelTrainException` before proceeding to `after_epoch`\n",
"- `after_cancel_valid`:: reached immediately after a `CancelValidException` before proceeding to `after_epoch`\n",
"- `after_cancel_epoch`:: reached immediately after a `CancelEpochException` before proceeding to `after_epoch`\n",
"- `after_cancel_fit`:: reached immediately after a `CancelFitException` before proceeding to `after_fit`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes, callbacks need to be called in a particular order. In the case of `TerminateOnNaNCallback`, it's important that `Recorder` runs its `after_batch` after this callback, to avoid registering an NaN loss. You can specify `run_before` (this callback must run before ...) or `run_after` (this callback must run after ...) in your callback to ensure the ordering that you need."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have seen how to tweak the training loop of fastai to do anything we need, let's take a step back and dig a little bit deeper in the foundations of that training loop."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Write a conclusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Questionnaire"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. What is the equation for a step of SGD, in math or code (as you prefer)?\n",
"1. What do we pass to `cnn_learner` to use a non-default optimizer?\n",
"1. What are optimizer callbacks?\n",
"1. What does `zero_grad` do in an optimizer?\n",
"1. What does `step` do in an optimizer? How is it implemented in the general optimizer?\n",
"1. Rewrite `sgd_cb` to use the `+=` operator, instead of `add_`.\n",
"1. What is momentum? Write out the equation.\n",
"1. What's a physical analogy for momentum? How does it apply in our model training settings?\n",
"1. What does a bigger value for momentum do to the gradients?\n",
"1. What are the default values of momentum for 1cycle training?\n",
"1. What is RMSProp? Write out the equation.\n",
"1. What do the squared values of the gradients indicate?\n",
"1. How does Adam differ from momentum and RMSProp?\n",
"1. Write out the equation for Adam.\n",
"1. Calculate the value of `unbias_avg` and `w.avg` for a few batches of dummy values.\n",
"1. What's the impact of having a high eps in Adam?\n",
"1. Read through the optimizer notebook in fastai's repo, and execute it.\n",
"1. In what situations do dynamic learning rate methods like Adam change the behaviour of weight decay?\n",
"1. What are the four steps of a training loop?\n",
"1. Why is the use of callbacks better than writing a new training loop for each tweak you want to add?\n",
"1. What are the necessary points in the design of the fastai's callback system that make it as flexible as copying and pasting bits of code?\n",
"1. How can you get the list of events available to you when writing a callback?\n",
"1. Write the `ModelResetter` callback (without peeking).\n",
"1. How can you access the necessary attributes of the training loop inside a callback? When can you use or not use the shortcut that goes with it?\n",
"1. How can a callback influence the control flow of the training loop.\n",
"1. Write the `TerminateOnNaN` callback (without peeking if possible).\n",
"1. How do you make sure your callback runs after or before another callback?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Further research"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Look up the \"rectified Adam\" paper and implement it using the general optimizer framework, and try it out. Search for other recent optimizers that work well in practice, and pick one to implement.\n",
"1. Look at the mixed precision callback with the documentation. Try to understand what each event and line of code does.\n",
"1. Implement your own version of ther learning rate finder from scratch. Compare it with fastai's version.\n",
"1. Look at the source code of the callbacks that ship with fastai. See if you can find one that's similar to what you're looking to do, to get some inspiration."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Foundations of Deep Learning: Wrap up"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulations, you have made it to the end of the \"foundations of deep learning\" section. You now understand how all of fastai's applications and most important architectures are built, and the recommended ways to train them, and have all the information you need to build these from scratch. Whilst you probably won't need to create your own training loop, or batchnorm layer, for instance, knowing what is going on behind the scenes is very helpful for debugging, profiling, and deploying your solutions.\n",
"\n",
"Since you understand all of the foundations of fastai's applications now, be sure to spend some time digging through fastai's source notebooks, and running and experimenting with parts of them, since you can and see exactly how everything in fastai is developed.\n",
"\n",
"In the next section, we will be looking even further under the covers, to see how the actual forward and backward passes of a neural network are done, and we will see what tools are at our disposal to get better performance. We will then finish up with a project that brings together everything we have learned throughout the book, which we will use to build a method for interpreting convolutional neural networks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"jupytext": {
"split_at_heading": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}