diff --git a/04_mnist_basics.ipynb b/04_mnist_basics.ipynb index 6394775..4168d91 100644 --- a/04_mnist_basics.ipynb +++ b/04_mnist_basics.ipynb @@ -3150,7 +3150,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can use these gradients to improve our parameters. We'll need to pick a learning rate (we'll discuss how to do that in practice in the next chapter; for now we'll just pick `1.0`):" + "We can use these gradients to improve our parameters. We'll need to pick a learning rate (we'll discuss how to do that in practice in the next chapter; for now we'll just pick `1e-5`(0.00001)):" ] }, { @@ -3621,13 +3621,13 @@ "source": [ "Now that we have a loss function which is suitable to drive SGD, we can consider some of the details involved in the next phase of the learning process, which is *step* (i.e., change or update) the weights based on the gradients. This is called an optimisation step.\n", "\n", - "In order to take an optimiser step we need to calculate the loss over one or more data items. How many should we use? We could calculate it for the whole dataset, and take the average, or we could calculate it for a single data item. But neither of these is ideal. Calculating it for the whole dataset would take a very long time. Calculating it for a single item would not use much inofmration, and so it would result in a very imprecise and unstable gradient. That is, you'd be going to the trouble of updating the weights but taking into account only how that would improve the model's performance on that single item.\n", + "In order to take an optimiser step we need to calculate the loss over one or more data items. How many should we use? We could calculate it for the whole dataset, and take the average, or we could calculate it for a single data item. But neither of these is ideal. Calculating it for the whole dataset would take a very long time. Calculating it for a single item would not use much information, and so it would result in a very imprecise and unstable gradient. That is, you'd be going to the trouble of updating the weights but taking into account only how that would improve the model's performance on that single item.\n", "\n", "So instead we take a compromise between the two: we calculate the average loss for a few data items at a time. This is called a *mini-batch*. The number of data items in the mini batch is called the *batch size*. A larger batch size means that you will get a more accurate and stable estimate of your datasets gradient on the loss function, but it will take longer, and you will get less mini-batches per epoch. Choosing a good batch size is one of the decisions you need to make as a deep learning practitioner to train your model quickly and accurately. We will talk about how to make this choice throughout this book.\n", "\n", "Another good reason for using mini-batches rather than calculating the gradient on individual data items is that, in practice, we nearly always do our training on an accelerator such as a GPU. These accelerators only perform well if they have lots of work to do at a time. So it is helpful if we can give them lots of data items to work on at a time. Using mini-batches is one of the best ways to do this. However, if you give them too much data to work on at once, they run out of memory--making GPUs happy is also tricky!.\n", "\n", - "As we've seen, in the discussion of data augmentation, we get better generalisation if we can very things during training. A simple and effective thing we can vary during training is what data items we put in each mini batch. Rather than simply enumerating our data set in order for every epoch, instead what we normally do in practice is to randomly shuffle it on every epoch, before we create mini batches. PyTorch and fastai provide a class that will do the shuffling and mini batch collation for you, called `DataLoader`.\n", + "As we've seen, in the discussion of data augmentation, we get better generalisation if we can vary things during training. A simple and effective thing we can vary during training is what data items we put in each mini batch. Rather than simply enumerating our data set in order for every epoch, instead what we normally do in practice is to randomly shuffle it on every epoch, before we create mini batches. PyTorch and fastai provide a class that will do the shuffling and mini batch collation for you, called `DataLoader`.\n", "\n", "A `DataLoader` can take any Python collection, and turn it into an iterator over many batches, like so:" ]