diff --git a/01_intro.ipynb b/01_intro.ipynb index 39b96cc..7138821 100644 --- a/01_intro.ipynb +++ b/01_intro.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -57,7 +57,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "TK Add an introduction here. Todo when preface is settled" + "TK Add an introduction here. Todo when preface is settled. Maybe the `Deep learning is for everyone` can be this intro." ] }, { @@ -91,14 +91,14 @@ "|======\n", "```\n", "\n", - "Deep learning is a computer technique to extract and transform data – with use cases ranging from human speech recognition to animal imagery classification – by using multiple layers of neural networks. Each of these layers takes the inputs from previous layers and progressively refines them. The algorithms involved can train the layers by learning to minimize errors and improve their own accuracy." + "Deep learning is a computer technique to extract and transform data – with use cases ranging from human speech recognition to animal imagery classification – by using multiple layers of neural networks. Each of these layers takes the inputs from previous layers and progressively refines them. The algorithms involved can train the layers by learning to minimize errors and improve their own accuracy (we will discuss those in details in the next section)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Deep learning has power, flexibility, and simplicity. That's why we believe it should be applied across many disciplines. These include the social and physical sciences, the arts, medicine, finance, scientific research, and much more. To give a personal example, despite having no background in medicine, Jeremy started Enlitic, a company that uses deep learning algorithms to diagnose illness and disease. And Enlitic now does better than doctors in certain cases. TK Melissa: Give an example\n", + "Deep learning has power, flexibility, and simplicity. That's why we believe it should be applied across many disciplines. These include the social and physical sciences, the arts, medicine, finance, scientific research, and much more. To give a personal example, despite having no background in medicine, Jeremy started Enlitic, a company that uses deep learning algorithms to diagnose illness and disease. And Enlitic now does better than doctors in certain cases. TK Jeremy: Give an example\n", "\n", "Here's a list of some of the thousands of tasks that deep learning (or methods heavily using deep learning) is now the best in the world at:\n", "\n", @@ -117,7 +117,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Deep learning is based on a type of models called neural networks. Before we explain to you all about it, let's start with a bit of history." + "Deep learning is based on a type of models called neural networks. Before we explain all about them to you, let's start with a bit of history." ] }, { @@ -167,7 +167,7 @@ "\n", "> : _…people are smarter than today's computers because the brain employs a basic computational architecture that is more suited to deal with a central aspect of the natural information processing tasks that people are so good at. …we will introduce a computational framework for modeling cognitive processes that seems… closer than other frameworks to the style of computation as it might be done by the brain._ (Parallel distributed processing, chapter 1)\n", "\n", - "TK Melissa: Tell the reader what the takeaways from this are in your own words, before you dive into the list of requirements.\n", + "TK Jeremy: Tell the reader what the takeaways from this are in your own words, before you dive into the list of requirements.\n", "\n", "It defined \"Parallel Distributed Processing\" as requiring:\n", "\n", @@ -182,7 +182,7 @@ "\n", "We will learn in this book about how modern neural networks handle each of these requirements. In the 1980's most models were built with a second layer of neurons, thus avoiding the problem that had been identified by Minsky (this was their \"pattern of connectivity among units\", to use the framework above). And indeed, neural networks were widely used during the 80s and 90s for real, practical projects. However, again a misunderstanding of the theoretical issues held back the field. In theory, adding just one extra layer of neurons was enough to allow any mathematical model to be approximated with these neural networks, but in practice such networks were often too big and slow to be useful.\n", "\n", - "Although there were researchers 30 years ago showing that to get good performance in practice you need to use even more layers of neurons, it is only in the last decade that this has been more widely appreciated. Thanks to this understanding, along with the improved ability to use these in practice thanks to improvements in computer hardware, increases in data availability, and algorithmic tweaks that allow neural networks to be trained faster and more easily, neural networks are now finally living up to their potential. We now have what Rosenblatt had promised: \"a machine capable of perceiving, recognizing and identifying its surroundings without any human training or control\". And you will learn how to build them in this book." + "Although researchers showed 30 years ago that to get practical good performance you need to use even more layers of neurons, it is only in the last decade that this has been more widely appreciated. Neural networks are now finally living up to their potential, thanks to the understanding to use more layers as well as improved ability to do so thanks to improvements in computer hardware, increases in data availability, and algorithmic tweaks that allow neural networks to be trained faster and more easily. We now have what Rosenblatt had promised: \"a machine capable of perceiving, recognizing and identifying its surroundings without any human training or control\". And you will learn how to build them in this book." ] }, { @@ -233,7 +233,7 @@ "source": [ "Since we are going to be spending a lot of time together, let's get to know each other a bit… We are Sylvain and Jeremy, your guides on this journey. We hope that you will find us well suited for this position.\n", "\n", - "Jeremy has been using and teaching machine learning for around 30 years. He started using neural networks 25 years ago. During this time he has led many companies and projects which have machine learning at their core, including founding the first company to focus on deep learning and medicine, Enlitic, and taking on the role of Pres and chief scientist of the world's largest machine learning community, Kaggle. He is the co-founder, along with Dr Rachel Thomas, of fast.ai, the organisation that built the course this book is based on.\n", + "Jeremy has been using and teaching machine learning for around 30 years. He started using neural networks 25 years ago. During this time he has led many companies and projects which have machine learning at their core, including founding the first company to focus on deep learning and medicine, Enlitic, and taking on the role of President and Chief Scientist of the world's largest machine learning community, Kaggle. He is the co-founder, along with Dr Rachel Thomas, of fast.ai, the organisation that built the course this book is based on.\n", "\n", "From time to time you will hear directly from us, in sidebars like this one from Jeremy:" ] @@ -310,7 +310,7 @@ "source": [ "The hardest part of deep learning is artisanal: how do you know if you've got enough data; whether it is in the right format; if your model is training properly; and if it's not, what should you do about it? That is why we believe in learning by doing. As with basic data science skills, with deep learning you only get better through practical experience. Trying to spend too much time on the theory can be counterproductive. The key is to just code and try to solve problems: the theory can come later, when you have context and motivation.\n", "\n", - "There will be times when the journey will feel hard. Times where you feel stuck. Don't give up! Rewind through the book to find the last bit where you definitely weren't stuck, and then read slowly through from there to find the first thing that isn't clear. Then try some code experiments yourself, and Google around for more tutorials on whatever the issue you're stuck with is--often you'll find some different angle on the material which might help it to click. Also, it's ok to not understand everything on first reading. Trying to understand the material serially before proceeding can sometimes be hard. Sometimes things click into place after you got more context from parts down the road, from having a bigger picture. So if you do get stuck on a section, try moving on anyway and make a note to come back to it later.\n", + "There will be times when the journey will feel hard. Times where you feel stuck. Don't give up! Rewind through the book to find the last bit where you definitely weren't stuck, and then read slowly through from there to find the first thing that isn't clear. Then try some code experiments yourself, and Google around for more tutorials on whatever the issue you're stuck with is--often you'll find some different angle on the material which might help it to click. Also, it's expected and normal to not understand everything (especially the code) on first reading. Trying to understand the material serially before proceeding can sometimes be hard. Sometimes things click into place after you got more context from parts down the road, from having a bigger picture. So if you do get stuck on a section, try moving on anyway and make a note to come back to it later.\n", "\n", "Remember, you don't need any particular academic background to succeed at deep learning. Many important breakthroughs are made in research and industry by folks without a PhD, such as the paper [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434), one of the most influential papers of the last decade, with over 5000 citations, which was written by Alec Radford when he was an under-graduate. Even at Tesla, where they're trying to solve the extremely tough challenge of making a self-driving car, CEO [Elon Musk says](https://twitter.com/elonmusk/status/1224089444963311616):\n", "\n", @@ -335,7 +335,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Whether you're excited to identify if plants are diseased from pictures of their leaves, auto-generate knitting patterns, diagnose TB from x-rays, or determine when a raccoon is using your cat door, we will get you using deep learning on your own problems (via pre-trained models from others) as quickly as possible, and then will progressively drill into more details. You'll learn how to use deep learning to solve your own problems at state-of-the-art accuracy within the first 30 minutes of the next chapter! (And feel free to skip straight to there now if you're dying to get coding right away.) There is a pernicious myth out there that you need to have computing resources and datasets the size of those at Google to be able to do deep learning, and it's not true.\n", + "Whether you're excited to identify if plants are diseased from pictures of their leaves, auto-generate knitting patterns, diagnose TB from x-rays, or determine when a raccoon is using your cat door, we will get you using deep learning on your own problems (via pre-trained models from others) as quickly as possible, and then will progressively drill into more details. You'll learn how to use deep learning to solve your own problems at state-of-the-art accuracy within the first 30 minutes of the next chapter! (And feel free to skip straight there now if you're dying to get coding right away.) There is a pernicious myth out there that you need to have computing resources and datasets the size of those at Google to be able to do deep learning, and it's not true.\n", "\n", "So, what sort of tasks make for good test cases? You could train your model to distinguish between Picasso and Monet paintings or to pick out pictures of your daughter instead of pictures of your son. It helps to focus on your hobbies and passions–setting yourself four of five little projects rather than striving to solve a big, grand problem tends to work better when you're getting started. Since it is easy to get stuck, trying to be too ambitious too early can often backfire. Then, once you've got the basics mastered, aim to complete something you're really proud of!" ] @@ -344,7 +344,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "> J: Deep learning can be set to work on almost any problem. For instance, my first startup was a company called FastMail, which provided enhanced email services when it launched in 1999 (and still does to this day). In 2002 I set it up to use a primitive form of deep learning – single-layer neural networks – to help to categorise emails and stop customers from receiving spam." + "> J: Deep learning can be set to work on almost any problem. For instance, my first startup was a company called FastMail, which provided enhanced email services when it launched in 1999 (and still does to this day). In 2002 I set it up to use a primitive form of deep learning – single-layer neural networks – to help categorise emails and stop customers from receiving spam." ] }, { @@ -434,7 +434,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To do nearly everything in this course, you'll need access to a computer with an NVIDIA GPU (unfortunately other brands of GPU are not fully supported by the main deep learning libraries). However, we don't recommend you buy one; in fact, even if you already have one, we don't suggest you use it just yet! Setting up a computer takes time and energy, and you want all your energy to focus on deep learning right now. Therefore, we instead suggest you rent access to a computer that already has everything you need preinstalled and ready to go. Costs can be as little as US$0.25 per hour while you're using it, and some options are even free." + "To do nearly everything in this book, you'll need access to a computer with an NVIDIA GPU (unfortunately other brands of GPU are not fully supported by the main deep learning libraries). However, we don't recommend you buy one; in fact, even if you already have one, we don't suggest you use it just yet! Setting up a computer takes time and energy, and you want all your energy to focus on deep learning right now. Therefore, we instead suggest you rent access to a computer that already has everything you need preinstalled and ready to go. Costs can be as little as US$0.25 per hour while you're using it, and some options are even free." ] }, { @@ -452,7 +452,7 @@ "\n", "> A: My two cents: heed this advice! If you like computers you will be tempted to setup your own box. Beware! It is feasible but surprisingly involved and distracting. There is a good reason this book is not titled, _Everything you ever wanted to know about Ubuntu system administration, NVIDIA driver installation, apt-get, conda, pip, and Jupyter notebook configuration_. That would be a book of its own. Having designed and deployed our production machine learning infrastructure at work, I can testify it has its satisfactions but it is as unrelated to understanding models as maintaining an airplane is from flying one.\n", "\n", - "Each option shown on the book website includes a tutorial; after completing the tutorial, you will end up with a screen looking like this:" + "Each option shown on the book website includes a tutorial; after completing the tutorial, you will end up with a screen looking like <>." ] }, { @@ -501,7 +501,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To open a notebook, just click on it. The notebook will open, and it will look something like this (note that there may be slight differences in details across different platforms; you can ignore those differences):" + "To open a notebook, just click on it. The notebook will open, and it will look something like <> (note that there may be slight differences in details across different platforms; you can ignore those differences):" ] }, { @@ -533,7 +533,7 @@ "source": [ "When you click on a cell it will be selected. Click on the cell now which begins with the line \"# CLICK ME\". The first character in that line represents a comment in Python, so is ignored when executing the cell. The rest of the cell is, believe it or not, a complete system for creating and training a state-of-the-art model for recognizing cats versus dogs. So, let's train it now! To do so, just press shift-enter on your keyboard, or press the \"play\" button on the toolbar. Then, wait a few minutes while the following things happen:\n", "\n", - "1. A dataset containing called the [Oxford-IIT Pet Dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/) that contains 7,349 images of cats and dogs from 37 different breeds will be downloaded from the fast.ai datasets collection to your GPU server, and will then be extracted\n", + "1. A dataset containing called the [Oxford-IIT Pet Dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/) that contains 7,349 images of cats and dogs from 37 different breeds will be downloaded from the fast.ai datasets collection to the GPU server you are using, and will then be extracted\n", "2. A *pretrained model* will be downloaded from the Internet, which has already been trained on 1.3 million images, using a competition winning model\n", "3. The pretrained model will be *fine-tuned* using the latest advances in transfer learning, to create a model that is specially customised for recognising dogs and cats\n", "\n", @@ -542,7 +542,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -561,9 +561,9 @@ " \n", " \n", " 0\n", - " 0.167097\n", - " 0.032373\n", - " 0.008796\n", + " 0.188673\n", + " 0.012107\n", + " 0.002706\n", " 00:14\n", " \n", " \n", @@ -592,9 +592,9 @@ " \n", " \n", " 0\n", - " 0.044406\n", - " 0.008025\n", - " 0.002706\n", + " 0.056649\n", + " 0.009575\n", + " 0.004736\n", " 00:18\n", " \n", " \n", @@ -647,7 +647,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -656,7 +656,7 @@ "2" ] }, - "execution_count": 3, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -674,17 +674,17 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ - "" + "" ] }, - "execution_count": 4, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -705,20 +705,20 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "So, how do we know if this model is any good? You can see the error rate (proportion of images that were incorrectly identified) printed as the last column of the table. As you can see, the model is nearly perfect, even though the training time was only a few seconds (not including the one-time downloading of the dataset and pretrained model). In fact, the accuracy you've achieved already is far better than anybody had ever achieved just 10 years ago!\n", + "So, how do we know if this model is any good? You can see the error rate (proportion of images that were incorrectly identified) printed as the second last column of the table. As you can see, the model is nearly perfect, even although the training time was only a few seconds (not including the one-time downloading of the dataset and pretrained model). In fact, the accuracy you've achieved already is far better than anybody had ever achieved just 10 years ago!\n", "\n", "Finally, let's check that this model actually works. Go and get a photo of a dog, or a cat; if you don't have one handy, just search Google images and download an image that you find there. Now execute the cell with `uploader` defined. It will output a button you can click, so you can select the image you want to classify." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "396649734fe5456893208bc29eb1d8db", + "model_id": "22613e22f0c9475cb9696273da3c3d60", "version_major": 2, "version_minor": 0 }, @@ -782,7 +782,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Is this a cat?: True; Probability it's a cat: 0.999060\n" + "Is this a cat?: True; Probability it's a cat: 0.999992\n" ] } ], @@ -821,7 +821,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "hide_input": true + "hide_input": false }, "outputs": [ { @@ -875,7 +875,7 @@ "\n" ], "text/plain": [ - "" + "" ] }, "execution_count": null, @@ -926,7 +926,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To understand this statement, we need to understand what Samuel means by a *weight assignment*. To do so, we need to change our basic program model diagram above, and replace it with something like this (where *inputs* might be the pixels of a photo, and *results* might be the word \"dog\" or \"cat\"):" + "To understand this statement, we need to understand what Samuel means by a *weight assignment*. To do so, we need to change our basic program model of <>, and replace it with something like <> (where *inputs* might be the pixels of a photo, and *results* might be the word \"dog\" or \"cat\"):" ] }, { @@ -999,7 +999,7 @@ "\n" ], "text/plain": [ - "" + "" ] }, "execution_count": null, @@ -1009,6 +1009,8 @@ ], "source": [ "#hide_input\n", + "#caption A program using weight assignment\n", + "#id weight_assignment\n", "gv('''model[shape=box3d width=1 height=0.7]\n", "inputs->model->results; weights->model''')" ] @@ -1023,7 +1025,7 @@ "\n", "Finally, he says we need *a mechanism for altering the weight assignment so as to maximize the performance*. For instance, he could look at the difference in weights between the winning model and the losing model, and adjust the weights a little further in the winning *direction*. We can now see why he said that such a procedure *could be made entirely automatic and... a machine so programmed would \"learn\" from its experience*.\n", "\n", - "Here is the full picture of Samuel's idea of training a machine learning model:" + "<> shows the full picture of Samuel's idea of training a machine learning model." ] }, { @@ -1141,7 +1143,7 @@ "For instance, the *results* for a checkers model are the moves that are made, and the *performance*\n", "is the win or loss (possibly also including the number of moves the game lasted).\n", "\n", - "Note that once the model is trained, we can think of the weights as being *part of the model*, since we're not varying them any more. Therefore actually *using* a model after it's trained looks like this:" + "Note that once the model is trained, we can think of the weights as being *part of the model*, since we're not varying them any more. Therefore actually *using* a model after it's trained looks like <>." ] }, { @@ -1202,7 +1204,7 @@ "\n" ], "text/plain": [ - "" + "" ] }, "execution_count": null, @@ -1212,6 +1214,8 @@ ], "source": [ "#hide_input\n", + "#caption Using a trained model as a program\n", + "#id using_model\n", "gv('''model[shape=box3d width=1 height=0.7]\n", "inputs->model->results''')" ] @@ -1279,7 +1283,7 @@ "- The measure of *performance* is called the *loss* (or *cost* or *error*);\n", "- The loss depends not only on the predictions, but also the correct *labels* (or *targets*), e.g. \"dog\" or \"cat\".\n", "\n", - "After making these changes, our diagram in <> looks like this:" + "After making these changes, our diagram in <> looks like <>." ] }, { @@ -1393,6 +1397,8 @@ ], "source": [ "#hide_input\n", + "#caption Detailed training loop\n", + "#id detailed_loop\n", "gv('''ordering=in\n", "model[shape=box3d width=1 height=0.7 label=architecture]\n", "inputs->model->predictions; parameters->model; labels->loss; predictions->loss\n", @@ -1406,7 +1412,7 @@ "We can now see some critically important things about training a deep learning model:\n", "\n", "- A model can not be created without data;\n", - "- A model model can only learn to operate on the patterns seen in the input data used to train it;\n", + "- A model can only learn to operate on the patterns seen in the input data used to train it;\n", "- This learning approach only creates *predictions*, not recommended *actions*;\n", "- It's not enough to just have examples of input data; we need *labels* for that data too (e.g. pictures of dogs and cats aren't enough to train a model; we need a label for each one, saying which ones are dogs, and which are cats).\n", "\n", @@ -1440,7 +1446,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### What our image recognizer did" + "### How our image recognizer works" ] }, { @@ -1492,15 +1498,13 @@ " label_func=is_cat, item_tfms=Resize(224))\n", "```\n", "\n", - "The third line tells fastai what kind of dataset we have, and how it is structured. There are various different classes for different kinds of deep learning dataset and problem--here we're using `ImageDataLoaders`. The first part of the class name will generally be the type of data you have, such as image, or text. The second part will generally be the type of problem you are solving, such as classification, or regression.\n", + "The fourth line tells fastai what kind of dataset we have, and how it is structured. There are various different classes for different kinds of deep learning dataset and problem--here we're using `ImageDataLoaders`. The first part of the class name will generally be the type of data you have, such as image, or text. The second part will generally be the type of problem you are solving, such as classification, or regression.\n", "\n", - "The other important piece of information that we have to tell fastai is how to get the labels from the dataset. Computer vision datasets are normally structured in such a way that the label for an image is part of the file name, or path, most commonly the parent folder name. Fastai comes with a number of standardized labelling methods, and ways to write your own. Here we define a function `is_cat` which labels cats based on a filename rule provided by the dataset creators.\n", - "\n", - "TK Sylvain. Check conversion here, there is a problem with formatting\n", + "The other important piece of information that we have to tell fastai is how to get the labels from the dataset. Computer vision datasets are normally structured in such a way that the label for an image is part of the file name, or path, most commonly the parent folder name. Fastai comes with a number of standardized labelling methods, and ways to write your own. Here we define a function on the third line: `is_cat` which labels cats based on a filename rule provided by the dataset creators.\n", "\n", "Finally, we define the `Transform`s that we need. A `Transform` contains code that is applied automatically during training; fastai includes many pre-defined `Transform`s, and adding new ones is as simple as creating a Python function. There are two kinds: `item_tfms` are applied to each item (in this case, each item is resized to a 224 pixel square); `batch_tfms` are applied to a *batch* of items at a time using the GPU, so they're particularly fast (we'll see many examples of these throughout this book).\n", "\n", - "Why 224 pixels? This is the standard size for historical reasons (old pretrained models require this size exactly), but you can pass pretty much anything. If you increase the size, you'll often get a model with better results (since it will be able to focus on more details) but at the price of speed and memory consumption; or visa versa if you decrease the size. " + "Why 224 pixels? This is the standard size for historical reasons (old pretrained models require this size exactly), but you can pass pretty much anything. If you increase the size, you'll often get a model with better results (since it will be able to focus on more details) but at the price of speed and memory consumption; or vice versa if you decrease the size. " ] }, { @@ -1522,21 +1526,21 @@ "\n", "Even when your model has not fully memorized all your data, earlier on in training it may have memorized certain parts of it. As a result, the longer you train for, the better your accuracy will get on the training set; and the validation set accuracy will also improve for a while, but eventually it will start getting worse, as the model starts to memorize the training set, rather than finding generalizable underlying patterns in the data. When this happens, we say that the model is *over-fitting*.\n", "\n", - "Here's an example of what happens when you overfit, using a simplified example where we have just one parameter, and some randomly generated data based on the function `x**2`; as you see, although the predictions in the overfit model are accurate for data near the observed data, they are way off when outside of that range:" + "<> shows what happens when you overfit, using a simplified example where we have just one parameter, and some randomly generated data based on the function `x**2`; as you see, although the predictions in the overfit model are accurate for data near the observed data, they are way off when outside of that range." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\"Example" + "\"Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "**Overfitting is the singe most important and challenging issue** when training for all machine learning practitioners, and all algorithms. As we will see, it is very easy to create a model that does a great job at making predictions on the exact data which it has been trained on, but it is much harder to make predictions on data that it has never seen before. And of course this is the data that will actually matter in practice. For instance, if you create a hand-written digit classifier (as we will very soon!) and use it to recognise numbers written on cheques, then you are never going to see any of the numbers that the model was trained on -- every cheque will have slightly different variations of writing to deal with. We will learn many methods to avoid overfitting in this book. However, you should only use those methods after you have confirmed that overfitting is actually occurring (i.e. you have actually observed the validation accuracy getting worse during training). We often see practitioners using over-fitting avoidance techniques even when they have enough data that they didn't need to do so, ending up with a model that could be less accurate than what they could have gotten." + "**Overfitting is the single most important and challenging issue** when training for all machine learning practitioners, and all algorithms. As we will see, it is very easy to create a model that does a great job at making predictions on the exact data which it has been trained on, but it is much harder to make predictions on data that it has never seen before. And of course this is the data that will actually matter in practice. For instance, if you create a hand-written digit classifier (as we will very soon!) and use it to recognise numbers written on cheques, then you are never going to see any of the numbers that the model was trained on -- every cheque will have slightly different variations of writing to deal with. We will learn many methods to avoid overfitting in this book. However, you should only use those methods after you have confirmed that overfitting is actually occurring (i.e. you have actually observed the validation accuracy getting worse during training). We often see practitioners using over-fitting avoidance techniques even when they have enough data that they didn't need to do so, ending up with a model that could be less accurate than what they could have achieved." ] }, { @@ -1554,7 +1558,7 @@ "learn = cnn_learner(dls, resnet34, metrics=error_rate)\n", "```\n", "\n", - "The fourth line tells fastai to create a *convolutional neural network* (CNN), and selects what *architecture* to use (i.e. what kind of model to create), what data we want to train it on, and what *metric* to use. A CNN is the current state of the art approach to creating computer vision models. We'll be learning all about how they work in this book. Their structure is inspired by how the human vision system works.\n", + "The fifth line tells fastai to create a *convolutional neural network* (CNN), and selects what *architecture* to use (i.e. what kind of model to create), what data we want to train it on, and what *metric* to use. A CNN is the current state of the art approach to creating computer vision models. We'll be learning all about how they work in this book. Their structure is inspired by how the human vision system works.\n", "\n", "There are many different architectures in fastai, which we will be learning about in this book, as well as discussing how to create your own. Most of the time, however, picking an architecture isn't a very important part of the deep learning process. It's something that academics love to talk about, but in practice it is unlikely to be something you need to spend much time on. There are some standard architectures that work most of the time, and in this case we're using one called _ResNet_ that we'll be learning a lot about during the book; it is both fast and accurate for many datasets and problems. The \"34\" in `resnet34` refers to the number of layers in this variant of the architecture (other options are \"18\", \"50\", \"101\", and \"152\"). Models using architectures with more layers take longer to train, and are more prone to overfitting (i.e. you can't train them for as many epochs before the accuracy on the validation set starts getting worse). On the other hand, when using more data, they can be quite a bit more accurate.\n", "\n", @@ -1580,7 +1584,7 @@ "learn.fine_tune(1)\n", "```\n", "\n", - "The fifth line tells fastai how to *fit* the model. As we've discussed, the architecture only describes a *template* for a mathematical function; but it doesn't actually do anything until we provide values for the millions of parameters it contains.\n", + "The sixth line tells fastai how to *fit* the model. As we've discussed, the architecture only describes a *template* for a mathematical function; but it doesn't actually do anything until we provide values for the millions of parameters it contains.\n", "\n", "This is the key to deep learning — how to fit the parameters of a model to get it to solve your problem. In order to fit a model, we have to provide at least one piece of information: how many times to look at each image (known as number of *epochs*). The number of epochs you select will largely depend on how much time you have available, and how long you find it takes in practice to fit your model. If you select a number that is too small, you can always train for more epochs later.\n", "\n", @@ -1619,46 +1623,58 @@ "source": [ "At this stage we have an image recogniser that is working very well, but we have no idea what it is actually doing! Although many people complain that deep learning results in impenetrable \"black box\" models (that is, something that gives predictions but that no one can understand), this really couldn't be further from the truth. There is a vast body of research showing how to deeply inspect deep learning models, and get rich insights from them.\n", "\n", - "In 2013 a PhD student, Matt Zeiler, and his supervisor, Rob Fergus, published the paper [Visualizing and Understanding Convolutional Networks](https://arxiv.org/pdf/1311.2901.pdf), which showed how to visualise the neural network weights learned in each layer of a model. They carefully analysed the model that won the 2012 ImageNet competition, and used this analysis to greatly improve the model, such that they were able to go on to win the 2013 competition! Here is the picture that they published of the first two layers' weights:" + "In 2013 a PhD student, Matt Zeiler, and his supervisor, Rob Fergus, published the paper [Visualizing and Understanding Convolutional Networks](https://arxiv.org/pdf/1311.2901.pdf), which showed how to visualise the neural network weights learned in each layer of a model. They carefully analysed the model that won the 2012 ImageNet competition, and used this analysis to greatly improve the model, such that they were able to go on to win the 2013 competition! <> is the picture that they published of the first layers' weights." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\"Activations" + "\"Activations" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This picture requires some explanation. For each layer, the image part with the light grey background shows the reconstructed weights pictures, and the other section shows the parts of the training images which most strongly matched each set of weights. For layer 1, what we can see is that the model has discovered weights which represent diagonal, horizontal, and vertical edges, as well as various different gradients. (Note that for each layer only a subset of the features are shown; in practice there are thousands across all of the layers.) These are the basic building blocks that it has created automatically for computer vision. They have been widely analysed by neuroscientists and computer vision researchers, and it turns out that these learned building blocks are very similar to the basic visual machinery in the human eye, as well as the handcrafted computer vision features that were developed prior to the days of deep learning. The next layer is represented in <>." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Activations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "This picture requires some explanation. For each layer, the image part with the light grey background shows the reconstructed weights pictures, and the other section shows the parts of the training images which most strongly matched each set of weights. For layer 1, what we can see is that the model has discovered weights which represent diagonal, horizontal, and vertical edges, as well as various different gradients. (Note that for each layer only a subset of the features are shown; in practice there are thousands across all of the layers.) These are the basic building blocks that it has created automatically for computer vision. They have been widely analysed by neuroscientists and computer vision researchers, and it turns out that these learned building blocks are very similar to the basic visual machinery in the human eye, as well as the handcrafted computer vision features that were developed prior to the days of deep learning.\n", - "\n", "For layer 2, there are nine examples of weight reconstructions for each of the features found by the model. We can see that the model has learned to create feature detectors that look for corners, repeating lines, circles, and other simple patterns. These are built from the basic building blocks developed in the first layer. For each of these, the right-hand side of the picture shows small patches from actual images which these features most closely match. For instance, the particular pattern in row 2 column 1 matches the gradients and textures associated with sunsets.\n", "\n", - "Here is the image from the paper showing the results of reconstructing the features of layer 3:" + "<> shows the image from the paper showing the results of reconstructing the features of layer 3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\"Activations" + "\"Activations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "As you can see by looking at the right-hand side of this picture, the features are now able to identify and match with higher levels' semantic components, such as car wheels, text, and flower petals. Using these components layers four and five can identify even higher-level concepts:" + "As you can see by looking at the right-hand side of this picture, the features are now able to identify and match with higher levels semantic components, such as car wheels, text, and flower petals. Using these components, layers four and five can identify even higher-level concepts, as shown in <>." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\"Activations" + "\"Activations" ] }, { @@ -1683,14 +1699,14 @@ "source": [ "An image recogniser can, as its name suggests, only recognise images. But a lot of things can be represented as images, which means that an image recogniser can learn to complete many tasks.\n", "\n", - "For instance, a sound can be converted to a spectrogram, which is a chart that shows the amount of each frequency at each time in an audio file. Fast.ai student Ethan Sutin used this approach to easily beat the published accuracy on [environmental sound detection](https://medium.com/@etown/great-results-on-audio-classification-with-fastai-library-ccaf906c5f52) using a dataset of 8732 urban sounds. fastai's `show_batch` clearly shows how each different sound has a quite distinctive spectrogram:" + "For instance, a sound can be converted to a spectrogram, which is a chart that shows the amount of each frequency at each time in an audio file. Fast.ai student Ethan Sutin used this approach to easily beat the published accuracy on [environmental sound detection](https://medium.com/@etown/great-results-on-audio-classification-with-fastai-library-ccaf906c5f52) using a dataset of 8732 urban sounds. fastai's `show_batch` clearly shows how each different sound has a quite distinctive spectrogram, as you can see in <>." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\"show_batch" + "\"show_batch" ] }, { @@ -1725,7 +1741,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Another example comes from the paper [Malware Classification with Deep Convolutional Neural Networks](https://ieeexplore.ieee.org/abstract/document/8328749) which explains that \"the malware binary file is divided into 8-bit sequences which are then converted to equivalent decimal values. This decimal vector is reshaped and gray-scale image is generated that represent the malware sample\", like in <>" + "Another examples comes from the paper [Malware Classification with Deep Convolutional Neural Networks](https://ieeexplore.ieee.org/abstract/document/8328749) which explains that \"the malware binary file is divided into 8-bit sequences which are then converted to equivalent decimal values. This decimal vector is reshaped and gray-scale image is generated that represent the malware sample\", like in <>." ] }, { @@ -1769,7 +1785,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We have introduced quite a few new terms, <> contains a handy recap of most of them.\n", + "We just covered a lot of information so let's recap breifly. <> provides a handy list.\n", "\n", "```asciidoc\n", "[[dljargon]]\n", @@ -2105,7 +2121,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This model is using the the IMDb dataset from the paper [Learning Word Vectors for Sentiment Analysis]((http://ai.stanford.edu/~amaas/data/sentiment/)). It works well with movie reviews of many thousands of words. But let's test it out on a very short one, to see it do its thing:" + "This model is using the IMDb dataset from the paper [Learning Word Vectors for Sentiment Analysis]((https://ai.stanford.edu/~amaas/data/sentiment/)). It works well with movie reviews of many thousands of words. But let's test it out on a very short one, to see it do its thing:" ] }, { @@ -2168,11 +2184,11 @@ "learn.fine_tune(4, 1e-2)\n", "```\n", "\n", - "The outputs themselves can be deceiving: they have the results of the last time the cell was executed, but if you change the code inside a cell without executing it, you will keep them.\n", + "The outputs themselves can be deceiving: they have the results of the last time the cell was executed, but if you change the code inside a cell without executing it, the old (misleading) results will remain.\n", "\n", "Except when we mention it explicitly, the notebooks provided on the book website are meant to be run in order, from top to bottom. In general, when experimenting, you will find yourself executing cells in any order to go fast (which is a super neat feature of Jupyter Notebooks) but once you have explored and arrive at the final version of your code, make sure you can run the cells of your notebooks in order (your future self won't necessarily remember the convoluted path you took otherwise!). \n", "\n", - "In edit mode, pressing `0` twice will restart the *kernel* (which is the engine powering your notebook). This will wipe your state clean and make it as if you had just started in the notebook. Click on the \"Cell\" menu and then on \"Run All Above\" to run all cells above the point where you are. We have found this to be very useful when developing the fastai library." + "In command mode, pressing `0` twice will restart the *kernel* (which is the engine powering your notebook). This will wipe your state clean and make it as if you had just started in the notebook. Click on the \"Cell\" menu and then on \"Run All Above\" to run all cells above the point where you are. We have found this to be very useful when developing the fastai library." ] }, { @@ -2723,7 +2739,7 @@ "\n", "One case might be if you are looking at time series data. For a time series, choosing a random subset of the data will be both too easy (you can look at the data both before and after the dates your are trying to predict) and not representative of most business use cases (where you are using historical data to build a model for use in the future). If your data includes the date and you are building a model to use in the future, you will want to choose a continuous section with the latest dates as your validation set (for instance, the last two weeks or last month of the available data).\n", "\n", - "Suppose you want to split the time series data in <> into training and validation sets:" + "Suppose you want to split the time series data in <> into training and validation sets." ] }, { @@ -2737,7 +2753,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "A random subset is a poor choice (too easy to fill in the gaps, and not indicative of what you'll need in production), as we can see in <>" + "A random subset is a poor choice (too easy to fill in the gaps, and not indicative of what you'll need in production), as we can see in <>." ] }, { @@ -2765,7 +2781,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For example, Kaggle had a competition to [predict the sales in a chain of Ecuadorian grocery stores](https://www.kaggle.com/c/favorita-grocery-sales-forecasting). Kaggle's *training data* ran from Jan 1 2013 to Aug 15 2017 and the test data spanned Aug 16 2017 to Aug 31 2017. That way, the competition organizer ensured that entrants were making predictions for a time period that was *in the future*, from the perspective of their model. This is similar to the way quant hedge fund traders do *back-testing* to check whether their models are predictive of future periods, based on passed data." + "For example, Kaggle had a competition to [predict the sales in a chain of Ecuadorian grocery stores](https://www.kaggle.com/c/favorita-grocery-sales-forecasting). Kaggle's *training data* ran from Jan 1 2013 to Aug 15 2017 and the test data spanned Aug 16 2017 to Aug 31 2017. That way, the competition organizer ensured that entrants were making predictions for a time period that was *in the future*, from the perspective of their model. This is similar to the way quant hedge fund traders do *back-testing* to check whether their models are predictive of future periods, based on past data." ] }, { @@ -2774,21 +2790,14 @@ "source": [ "After time series, a second common case is when you can easily anticipate ways the data you will be making predictions for in production may be *qualitatively different* from the data you have to train your model with.\n", "\n", - "In the Kaggle [distracted driver competition](https://www.kaggle.com/c/state-farm-distracted-driver-detection), the independent data are pictures of drivers at the wheel of a car, and the dependent variable is a category such as texting, eating, or safely looking ahead. If you were the insurance company building a model from this data, note that you would be most interested in how the model performs on drivers you haven't seen before (since you would likely have training data only for a small group of people). This is true of the Kaggle competition as well: the test data consists of people that weren't used in the training set." + "In the Kaggle [distracted driver competition](https://www.kaggle.com/c/state-farm-distracted-driver-detection), the independent data are pictures of drivers at the wheel of a car, and the dependent variable is a category such as texting, eating, or safely looking ahead. Lots of pictures were of the same drivers in different positions, as we can see in <>. If you were the insurance company building a model from this data, note that you would be most interested in how the model performs on drivers you haven't seen before (since you would likely have training data only for a small group of people). This is true of the Kaggle competition as well: the test data consists of people that weren't used in the training set." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\"A" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\"A" + "\"Two" ] }, { @@ -2941,5 +2950,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/02_production.ipynb b/02_production.ipynb index 0237d68..74790ed 100644 --- a/02_production.ipynb +++ b/02_production.ipynb @@ -29,25 +29,29 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The five lines of code we've seen in <> are just one small part of the process of using deep learning in practice. In this chapter, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically: we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, learn about how to create datasets, look at possible gotchas when using deep learning in practice, and more. Let's start with how you should frame your problem.\n", + "The five lines of code we saw in <> are just one small part of the process of using deep learning in practice. In this chapter, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically: we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, learn about how to create datasets, look at possible gotchas when using deep learning in practice, and more. Many of the key points will apply equally well to other deep learning problems, such as we showed in <>. If you work through a problem similar in key respects to our example problems, we expect you to get excellent results with little code, quickly.\n", "\n", - "TK: the next section title seems a bit inadequate, let's double check" + "Let's start with how you should frame your problem." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Picking a problem" + "## The practice of deep learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We've seen that deep learning can solve a lot of challenging problems quickly and with little code. However, deep learning isn't magic! We often talk to people who overestimate both the constraints, and the capabilities of deep learning. Both of these can be problems: underestimating the capabilities means that you might not even try things which could be very beneficial; underestimating the constraints might mean that you fail to consider and react to important issues.\n", + "We've seen that deep learning can solve a lot of challenging problems quickly and with little code. As a beginner there's a sweet spot of problems that are similar enough to our example problems that you can very quickly get extremely useful results. However, deep learning isn't magic! The same 5 lines of code won't work on every problem anyone can think of today. Underestimating the constraints and overestimating the capabilities of deep learning may lead to frustratingly poor results. At least until you gain some experience to solve the problems that arise. Overestimating the constraints and underestimating the capabilities of deep learning may mean you do not attempt a solvable problem because you talk yourself out of it. \n", "\n", - "The best thing to do is to keep an open mind. If you remain open to the possibility that deep learning might solve part of your problem with less data or complexity than you expect, then it is possible to design a process where you can find the specific capabilities and constraints related to your particular problem as you work through the process. This doesn't mean making any risky bets — we will show you how you can gradually roll out models so that they don't create significant risks, and can even backtest them prior to putting them in production." + "We often talk to people who overestimate both the constraints, and the capabilities of deep learning. Both of these can be problems: underestimating the capabilities means that you might not even try things which could be very beneficial; underestimating the constraints might mean that you fail to consider and react to important issues.\n", + "\n", + "The best thing to do is to keep an open mind. If you remain open to the possibility that deep learning might solve part of your problem with less data or complexity than you expect, then it is possible to design a process where you can find the specific capabilities and constraints related to your particular problem as you work through the process. This doesn't mean making any risky bets — we will show you how you can gradually roll out models so that they don't create significant risks, and can even backtest them prior to putting them in production.\n", + "\n", + "Let's start with how you should frame your problem." ] }, { @@ -103,7 +107,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "First things first, let's make sure that deep learning cn be any good at the problem you are considering. In general, here is a summary of the state of deep learning is at the start of 2020. However, things move very fast, and by the time you read this some of these constraints may no longer exist. We will try to keep the book website up-to-date; in addition, a Google search for \"what can AI do now\" there is likely to provide some up-to-date information." + "Let's start by considering whether deep learning can be any good at the problem you are looking to work on. In general, here is a summary of the state of deep learning is at the start of 2020. However, things move very fast, and by the time you read this some of these constraints may no longer exist. We will try to keep the book website up-to-date; in addition, a Google search for \"what can AI do now\" there is likely to provide some up-to-date information." ] }, { @@ -117,9 +121,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "There are many domains in which deep learning has not been used to analyse images yet, but those where it has been tried have nearly universally shown that computers can recognise what items are in an image at least as well as people can — even specially trained people, such as radiologists. This is known as *object recognition*. Deep learning is also good at recognizing whereabouts objects in an image are, and can highlight their location and name each found object. This is known as *object detection* (there is also a variant of this we saw in <>, where every pixel is categorized based on what kind of object it is part of--this is called *segmentation*). Deep learning algorithms are generally not good at recognizing images that are significantly different in structure or style to those used to train the model. For instance, if there were no black-and-white images in the training data, the model may well do poorly on black-and-white images. If the training data did not contain hand-drawn images then the model will probably do poorly on hand-drawn images. There is no general way to check what types of image are missing in your training set, but we will show in this chapter some ways to try to recognize when unexpected image types arise in the data when the model is being used in production (this is known as checking for *out of domain* data).\n", + "There are many domains in which deep learning has not been used to analyse images yet, but those where it has been tried have nearly universally shown that computers can recognise what items are in an image at least as well as people can — even specially trained people, such as radiologists. This is known as *object recognition*. Deep learning is also good at recognizing whereabouts objects in an image are, and can highlight their location and name each found object. This is known as *object detection* (there is also a variant of this we saw in <>, where every pixel is categorized based on what kind of object it is part of--this is called *segmentation*). Deep learning algorithms are generally not good at recognizing images that are significantly different in structure or style to those used to train the model. For instance, if there were no black-and-white images in the training data, the model may do poorly on black-and-white images. If the training data did not contain hand-drawn images then the model will probably do poorly on hand-drawn images. There is no general way to check what types of image are missing in your training set, but we will show in this chapter some ways to try to recognize when unexpected image types arise in the data when the model is being used in production (this is known as checking for *out of domain* data). TK previous chapter showed a parabola overfitting. possibly a good place to show out of domain data outside the sampled parabola?\n", "\n", - "One major challenge for object detection systems is that image labelling can be slow and expensive. There is a lot of work at the moment going into tools to try to make this labelling faster and more easy, and require less handcrafted labels to train accurate object detection models. One approach which is particularly helpful is to synthetically generate variations of input images, such as by rotating them, or changing their brightness and contrast; this is called *data augmentation* and also works well for text and other types of model. We will be discussing it in detail in this chapter.\n", + "One major challenge for object detection systems is that image labelling can be slow and expensive. There is a lot of work at the moment going into tools to try to make this labelling faster and easier, and require less handcrafted labels to train accurate object detection models. One approach which is particularly helpful is to synthetically generate variations of input images, such as by rotating them, or changing their brightness and contrast; this is called *data augmentation* and also works well for text and other types of model. We will be discussing it in detail in this chapter.\n", "\n", "Another point to consider is that although your problem might not look like a computer vision problem, it might be possible with a little imagination to turn it into one. For instance, if what you are trying to classify is sounds, you might try converting the sounds into images of their acoustic waveforms and then training a model on those images." ] @@ -135,11 +139,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Just like in computer vision, computers are very good at categorising both short and long documents based on categories such as spam, sentiment, author, source website, and so forth. We are not aware of any rigourous work done in this area to compare to human performance, but anecdotally it seems to us that deep learning performance is similar to human performance here. Deep learning is also very good at generating context-appropriate text, such as generating replies to social media posts, and imitating a particular author's style. It is also good at making this content compelling to humans, and has been shown to be even more compelling than human-generated text. However, deep learning is currently not good at generating *correct* responses! We don't currently have a reliable way to, for instance, combine a knowledge base of medical information, along with a deep learning model for generating medically correct natural language responses. This is very dangerous, because it is so easy to create content which appears to a layman to be compelling, but actually is entirely incorrect.\n", + "Just like in computer vision, computers are very good at categorising both short and long documents based on categories such as spam, sentiment (e.g. is the review positive or negative), author, source website, and so forth. We are not aware of any rigourous work done in this area to compare to human performance, but anecdotally it seems to us that deep learning performance is similar to human performance here. Deep learning is also very good at generating context-appropriate text, such as generating replies to social media posts, and imitating a particular author's style. It is also good at making this content compelling to humans, and has been shown to be even more compelling than human-generated text. However, deep learning is currently not good at generating *correct* responses! We don't currently have a reliable way to, for instance, combine a knowledge base of medical information, along with a deep learning model for generating medically correct natural language responses. This is very dangerous, because it is so easy to create content which appears to a layman to be compelling, but actually is entirely incorrect.\n", "\n", "Another concern is that context-appropriate, highly compelling responses on social media can be used at massive scale — thousands of times greater than any troll farm previously seen — to spread disinformation, create unrest, and encourage conflict. As a rule of thumb, text generation will always be technologically a bit ahead of the ability of models to recognize automatically generated text. For instance, it is possible to use a model that can recognize artificially generated content to actually improve the generator that creates that content, until the classification model is no longer able to complete its task.\n", "\n", - "Despite these issues, deep learning can be used to translate text from one language to another, summarize long documents into something which can be digested more quickly, find all mentions of a concept of interest, and so forth. Unfortunately, the translation or summary could well include completely incorrect information! However, it is already good enough that many people are using the systems — for instance Google's online translation system (and every other online service we are aware of) is based on deep learning." + "Despite these issues, deep learning can be used to translate text from one language to another, summarize long documents into something which can be digested more quickly, find all mentions of a concept of interest, and many more. Unfortunately, the translation or summary could well include completely incorrect information! However, it is already good enough that many people are using the systems — for instance Google's online translation system (and every other online service we are aware of) is based on deep learning." ] }, { @@ -155,7 +159,7 @@ "source": [ "The ability of deep learning to combine text and images into a single model is, generally, far better than most people intuitively expect. For example, a deep learning model can be trained on input images, and output captions written in English, and can learn to generate surprisingly appropriate captions automatically for new images! But again, we have the same warning that we discussed in the previous section: there is no guarantee that these captions will actually be correct.\n", "\n", - "Because of this serious issue we generally recommend that deep learning be used not as a entirely automated process, but as part of a process in which the model and a human user interact closely. This can potentially make humans orders of magnitude more productive than they would be with entirely manual methods, and actually result in more accurate processes than using a human alone. For instance, an automatic system can be used to identify potential strokes directly from CT scans, send a high priority alert to have potential/scans looked at quickly. There is only a three-hour window to treat strokes, so this fast feedback loop could save lives. At the same time, however, all scans could continue to be sent to radiologists in the usual way, so there would be no reduction in human input. Other deep learning models could automatically measure items seen on the scan, and insert those measurements into report, warn the radiologist about findings that they may have missed, and tell the radiologist about other cases which might be relevant." + "Because of this serious issue we generally recommend that deep learning be used not as an entirely automated process, but as part of a process in which the model and a human user interact closely. This can potentially make humans orders of magnitude more productive than they would be with entirely manual methods, and actually result in more accurate processes than using a human alone. For instance, an automatic system can be used to identify potential strokes directly from CT scans, send a high priority alert to have potential/scans looked at quickly. There is only a three-hour window to treat strokes, so this fast feedback loop could save lives. At the same time, however, all scans could continue to be sent to radiologists in the usual way, so there would be no reduction in human input. Other deep learning models could automatically measure items seen on the scan, and insert those measurements into reports, warning the radiologist about findings that they may have missed, and tell the radiologist about other cases which might be relevant." ] }, { @@ -169,7 +173,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For analysing timeseries and tabular data, deep learning has recently been making great strides. However, deep learning is generally used as part of a ensemble of multiple types of model. If you already have a system that is using random forests or gradient boosting machines (popular tabular modelling tools that we will learn about soon) then switching to, or adding, deep learning may not result in any dramatic improvement. Deep learning does greatly increase the variety of columns that you can include, for example columns containing natural language (e.g. book titles, reviews, etc), and *high cardinality categorical* columns (i.e. something that contains a large number of discrete choices, such as zip code or product id). On the downside, deep learning models generally take longer to train than random forests or gradient boosting machines, although this is changing thanks to libraries such as [RAPIDS](https://rapids.ai/), which provides GPU acceleration for the whole modeling pipeline. We cover the pros and cons of all these methods in detail in <> in this book." + "For analysing timeseries and tabular data, deep learning has recently been making great strides. However, deep learning is generally used as part of an ensemble of multiple types of model. If you already have a system that is using random forests or gradient boosting machines (popular tabular modelling tools that we will learn about soon) then switching to, or adding, deep learning may not result in any dramatic improvement. Deep learning does greatly increase the variety of columns that you can include, for example columns containing natural language (e.g. book titles, reviews, etc), and *high cardinality categorical* columns (i.e. something that contains a large number of discrete choices, such as zip code or product id). On the downside, deep learning models generally take longer to train than random forests or gradient boosting machines, although this is changing thanks to libraries such as [RAPIDS](https://rapids.ai/), which provides GPU acceleration for the whole modeling pipeline. We cover the pros and cons of all these methods in detail in <> in this book." ] }, { @@ -256,16 +260,28 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For many types of projects, you may be able to find all the data you need online. The project we'll be completing in this chapter is a *bear detector*. It will discriminate between three types of bear: grizzly, black, and teddy bear. There are many images on the internet of each type of bear we can use. We just need a way to find them and download them. We've provided a tool you can use for this purpose, so you can follow along with this chapter, creating your own image recognition application for whatever kinds of object you're interested in. In the fast.ai course, thousands of students have presented their work on the course forums, displaying everything from Trinidad hummingbird varieties, to Panama bus types, and even an application that helped one student let his fiancee recognize his sixteen cousins during Christmas vacation!" + "For many types of projects, you may be able to find all the data you need online. The project we'll be completing in this chapter is a *bear detector*. It will discriminate between three types of bear: grizzly, black, and teddy bear. There are many images on the Internet of each type of bear we can use. We just need a way to find them and download them. We've provided a tool you can use for this purpose, so you can follow along with this chapter, creating your own image recognition application for whatever kinds of object you're interested in. In the fast.ai course, thousands of students have presented their work on the course forums, displaying everything from Trinidad hummingbird varieties, to Panama bus types, and even an application that helped one student let his fiancee recognize his sixteen cousins during Christmas vacation!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "To download images, you should sign up at Microsoft for *Bing Image Search*. You will be given a key, which you can either paste over `os.environ('AZURE_SEARCH_KEY')` below, or you can set in your terminal as:\n", - "\n", - " export AZURE_SEARCH_KEY=your_key_here" + "As at the time of writing, Bing Image Search is the best option we know of for finding and downloading images. It's free for up to 1000 queries per month, and each query can download up to 150 images. However, something better might have come along between when we wrote this and when you're reading the book, so be sure to check out [book.fast.ai](https://book.fast.ai) where we'll let you know our current recommendation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> important: Services that can be used for creating datasets come and go all the time, and their features, interfaces, and pricing change regularly too. In this section, we'll show how to use one particular provider, *Bing Image Search*, using the service they have as this book as written. We'll be providing more options and more up to date information on the [book website](https://book.fast.ai), so be sure to have a look there now to get the most current information on how to download images from the web to create a dataset for deep learning." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To download images with Bing Image Search, you should sign up at Microsoft for *Bing Image Search*. You will be given a key, which you can either paste here, replacing \"XXX\":" ] }, { @@ -274,14 +290,44 @@ "metadata": {}, "outputs": [], "source": [ - "key = os.environ['AZURE_SEARCH_KEY']" + "key = 'XXX'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "As at the time of writing, Bing Image Search is the best option we know of for finding and downloading images. It's free for up to 1000 queries per month, and each query can download up to 150 images. However, something better might have come along between when we wrote this and when you're reading the book, so be sure to check out [book.fast.ai](https://book.fast.ai) where we'll let you know our current recommendation." + "...or, if you're comfortable at the command line, you can set it in your terminal with:\n", + "\n", + " export AZURE_SEARCH_KEY=your_key_here\n", + "\n", + "and then restart jupyter notebooks, and finally execute in this notebook:\n", + "\n", + "```python\n", + "key = os.environ['AZURE_SEARCH_KEY']\n", + "```\n", + "\n", + "Once you've set `key`, you can use `search_images_bing`. This function is provided by the small `utils` class included in the book. Remember, if you're not sure where a symbol is defined, you can just type it in your notebook to find out (or prefix with `?` to get help, including the name of the file where it's defined, or with `??` to get its source code):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "search_images_bing" ] }, { @@ -428,7 +474,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Often when we download files from the internet, there's a few that are corrupt. Let's check:" + "Often when we download files from the Internet, there are a few that are corrupt. Let's check:" ] }, { @@ -462,6 +508,22 @@ "failed" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To remove the failed images, we can use `unlink` on each. Note that, like most fastai functions that return a collection, `verify_images` returns an object of type `L`, which includes the `map` method. This calls the passed function on each element of the collection." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "failed.map(Path.unlink);" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -490,7 +552,7 @@ "```\n", "It tells us what argument the function accepts (`fns`) then shows us the source code and the file it comes from. Looking at that source code, we can see it applies the function `verify_image` in parallel and only keep the ones for which the result of that function is `False`, which is consistent with the doc string: it finds the images in `fns` that can't be opened.\n", "\n", - "Here are the commands that are very useful in jupyter notebooks:\n", + "Here are the commands that are very useful in Jupyter notebooks:\n", "\n", "- at any point, if you don't remember the exact spelling of a function or argument name, you can press \"tab\" to get suggestions of auto-completion.\n", "- when inside the parenthesis of a function, pressing \"shift\" and \"tab\" simultaneously will display a window with the signature of the function and a short documentation. Pressing it twice will expand the documentation and pressing it three times will open a full window with the same information at the bottom of your screen.\n", @@ -507,22 +569,6 @@ "### End sidebar" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To remove the failed images, we can use `unlink` on each. Note that, like most fastai functions that return a collection, `verify_images` returns an object of type `L`, which includes the `map` method. This calls the passed function on each element of the collection." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "failed.map(Path.unlink);" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -541,7 +587,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data. (Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.)" + "So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data.footnote:[Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.]" ] }, { @@ -562,7 +608,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that we have downloaded and verified of the data that we want to use, we need to turn it into a `DataLoaders` object. `DataLoaders` is a thin class which just stores whatever `DataLoader` objects you pass to it, and makes them available as `train` and `valid` . Although it's a very simple class, it's very important in fastai: it provides the data for your model. The key functionality in `DataLoaders` is provided with just these 4 lines of code (it has some other minor functionality we'll skip over for now):\n", + "Now that we have downloaded and verified the data that we want to use, we need to turn it into a `DataLoaders` object. `DataLoaders` is a thin class which just stores whatever `DataLoader` objects you pass to it, and makes them available as `train` and `valid` . Although it's a very simple class, it's very important in fastai: it provides the data for your model. The key functionality in `DataLoaders` is provided with just these 4 lines of code (it has some other minor functionality we'll skip over for now):\n", "\n", "```python\n", "class DataLoaders(GetAttr):\n", @@ -646,7 +692,7 @@ "item_tfms=Resize(128)\n", "```\n", "\n", - "Our images are all different sizes, and this is a problem for deep learning: we don't feed the model one image at a time but several (what we call a *mini-batch*) of them. To group them in a big array (usually called *tensor*) that is going to go through our model, they all need to be of the same size. So we need to add a transform twhich will resize these images to the same size. *item transforms* are pieces of code which run on each individual item, whether it be an image, category, or so forth. fastai includes many predefined transforms; we will use the `Resize` transform here.\n", + "Our images are all different sizes, and this is a problem for deep learning: we don't feed the model one image at a time but several (what we call a *mini-batch*) of them. To group them in a big array (usually called *tensor*) that is going to go through our model, they all need to be of the same size. So we need to add a transform which will resize these images to the same size. *item transforms* are pieces of code which run on each individual item, whether it be an image, category, or so forth. fastai includes many predefined transforms; we will use the `Resize` transform here.\n", "\n", "This command has given us a `DataBlock` object. This is like a *template* for creating a `DataLoaders`. We still need to tell fastai the actual source of our data — in this case, the path where the images can be found." ] @@ -748,7 +794,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "All of these approaches seem somewhat wasteful, or problematic. If we squished or stretch the images then the end up unrealistic shapes, leading to a model that learns that things look different to how they actually are, which we would expect to result in lower accuracy. If we crop the images then we remove some of the features that allow us to recognize them. For instance, if we were trying to recognise the breed of dog or cat, we may end up cropping out a key part of the body or the face necessary to distinguish between similar breeds. If we pad the images then we have a whole lot of empty space, which is just wasted computation for our model, and results in a lower effective resolution for the part of the image we actually use.\n", + "All of these approaches seem somewhat wasteful, or problematic. If we squished or stretch the images then they end up unrealistic shapes, leading to a model that learns that things look different to how they actually are, which we would expect to result in lower accuracy. If we crop the images then we remove some of the features that allow us to recognize them. For instance, if we were trying to recognise the breed of dog or cat, we may end up cropping out a key part of the body or the face necessary to distinguish between similar breeds. If we pad the images then we have a whole lot of empty space, which is just wasted computation for our model, and results in a lower effective resolution for the part of the image we actually use.\n", "\n", "Instead, what we normally do in practice is to randomly select part of the image, and crop to just that part. On each epoch (which is one complete pass through all of our images in the dataset) we randomly select a different part of each image. This means that our model can learn to focus on, and recognize, different features in our images. It also reflects how images work in the real world; different photos of the same thing may be framed in slightly different ways.\n", "\n", @@ -784,7 +830,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "> note: The second line in this code is a little bit magic, and you absolutely don't have to understand it at this point. So feel free to ignore the entirety of this paragraph! This is for just if you're curious… Showing different randomly varied versions of the same image is not something we normally have to do in deep learning, so it's not something that fastai provides directly. Therefore to draw the picture of data augmentation on the same image, we had to take advantage of fastai's sophisticated customisation features. DataLoader has a method called `get_idx`, which is called to decide which items should be selected next. Normally when we are training, this returns a random permutation of all of the indexes in the dataset. But pretty much everything in fastai can be changed, including how the `get_idx` method is defined, which means we can change how we sample data. So in this case, we are replacing it with a version which always returns the number one. That way, our DataLoader shows the same image again and again! This is a great example of the flexibility that fastai provides. " + "> note: The `…get_idx` assignment in this code is a little bit magic, and you absolutely don't have to understand it at this point. So feel free to ignore the entirety of this paragraph! This is just if you're curious… Showing different randomly varied versions of the same image is not something we normally have to do in deep learning, so it's not something that fastai provides directly. Therefore to draw the picture of data augmentation on the same image, we had to take advantage of fastai's sophisticated customisation features. DataLoader has a method called `get_idx`, which is called to decide which items should be selected next. Normally when we are training, this returns a random permutation of all of the indexes in the dataset. But pretty much everything in fastai can be changed, including how the `get_idx` method is defined, which means we can change how we sample data. So in this case, we are replacing it with a version which always returns the number one. That way, our DataLoader shows the same image again and again! This is a great example of the flexibility that fastai provides. " ] }, { @@ -807,7 +853,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Data augmentation refers to creating random variations of our input data, such that they appear a different, but are not expected to change the meaning of the data. Examples of common data augmentation for images are rotation, flipping, perspective warping, brightness changes, contrast changes, and much more. For natural photo images such as the ones we are using here, there is a standard set of augmentations which we have found work pretty well, and are provided with the get transforms function. Because the images are now all the same size, we can apply these augmentations to an entire batch of them using the GPU, which will save a lot of time. To tell fastai we want to use these transforms to a batch, we use the `batch_tfms` parameter. (Note that's we're not using `RandomResizedCrop` in this example, so you can see the differences more clearly; we're also using double the amount of augmentation compared to the default, for the same reason)." + "Data augmentation refers to creating random variations of our input data, such that they appear different, but are not expected to change the meaning of the data. Examples of common data augmentation for images are rotation, flipping, perspective warping, brightness changes, contrast changes, and much more. For natural photo images such as the ones we are using here, there is a standard set of augmentations which we have found work pretty well, and are provided with the get transforms function. Because the images are now all the same size, we can apply these augmentations to an entire batch of them using the GPU, which will save a lot of time. To tell fastai we want to use these transforms to a batch, we use the `batch_tfms` parameter. (Note that's we're not using `RandomResizedCrop` in this example, so you can see the differences more clearly; we're also using double the amount of augmentation compared to the default, for the same reason)." ] }, { @@ -855,7 +901,7 @@ "source": [ "Time to use the same lined of codes as in <> to train our bear classifier.\n", "\n", - "We don't have a lot of data for our pblem (150 pictures of each sort of bear at most), so to train our model, we'll use `RandomResizedCrop` and default `aug_transforms` for our model, on an image size of 224px, which is fairly standard for image classification." + "We don't have a lot of data for our problem (150 pictures of each sort of bear at most), so to train our model, we'll use `RandomResizedCrop` and default `aug_transforms` for our model, on an image size of 224px, which is fairly standard for image classification." ] }, { @@ -1017,7 +1063,9 @@ "source": [ "Each row here represents all the black, grizzly, and teddy bears in our dataset, respectively. Each column represents the images which the model predicted as black, grizzly, and teddy bears, respectively. Therefore, the diagonal of the matrix shows the images which were classified correctly, and the other, off diagonal, cells represent those which were classified incorrectly. This is called a *confusion matrix* and is one of the many ways that fastai allows you to view the results of your model. It is (of course!) calculated using the validation set. With the color coding, the goal is to have white everywhere, except the diagonal where we want dark blue. Our bear classifier isn't making many mistakes!\n", "\n", - "It's helpful to see where exactly our errors are occuring, to see whether it's due to a dataset problem (e.g. images that aren't bears at all, or are labelled incorrectly, etc), or a model problem (e.g. perhaps it isn't handling images taken with unusual lighting, or from a different angle, etc.) To do this, we can sort out images by their *loss*. The *loss* is a number that is higher if the model is incorrect (and especially if it's also confident of its incorrect answer), or if it's correct, but not confident of its correct answer. (We'll learn how loss is calculated later in the book.) `plot_top_losses` shows us the images with the highest loss in our dataset. As the title of the output says, each image is labeled with four things: prediction, actual (target label), loss, and probability. The *probability* here is the confidence level, from zero to one, that the model has assigned to its prediction." + "It's helpful to see where exactly our errors are occuring, to see whether it's due to a dataset problem (e.g. images that aren't bears at all, or are labelled incorrectly, etc), or a model problem (e.g. perhaps it isn't handling images taken with unusual lighting, or from a different angle, etc). To do this, we can sort out images by their *loss*.\n", + "\n", + "The *loss* is a number that is higher if the model is incorrect (and especially if it's also confident of its incorrect answer), or if it's correct, but not confident of its correct answer. In a couple chapters we'll learn in depth how loss is calculated and used in training process. For now, `plot_top_losses` shows us the images with the highest loss in our dataset. As the title of the output says, each image is labeled with four things: prediction, actual (target label), loss, and probability. The *probability* here is the confidence level, from zero to one, that the model has assigned to its prediction." ] }, { @@ -1103,7 +1151,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"Cleaner" + "\"Cleaner" ] }, { @@ -1166,7 +1214,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Once you've got a model you're happy with, you need to save it, so that you can then copy it over to a server where you'll use it in production. Do you remember exactly what a model is? It consists of two parts: the *architecture*, and the trained *parameters*. The easiest way to save a model is to save both of these, because that way when you load a model you can be sure that you have the matching architecture and parameters. To save both parts, use the `export` method.\n", + "Once you've got a model you're happy with, you need to save it, so that you can then copy it over to a server where you'll use it in production. Remember that a model consists of two parts: the *architecture*, and the trained *parameters*. The easiest way to save a model is to save both of these, because that way when you load a model you can be sure that you have the matching architecture and parameters. To save both parts, use the `export` method.\n", "\n", "This method even saves the definition of how to create your `DataLoaders`. This is important, because otherwise you would have to redefine how to transform your data in order to use your model in production. When you call export, fastai will save a file called `export.pkl`." ] @@ -1681,14 +1729,14 @@ "\n", "There is quite a few upsides to this approach. The initial installation is easier, because you only have to deploy a small GUI application, which connects to the server to do all the heavy lifting. More importantly perhaps, upgrades of that core logic can happen on your server, rather than needing to be distributed to all of your users. Your server can have a lot more memory and processing capacity than most edge devices, and it is far easier to scale those resources if your model becomes more demanding. The hardware that you will have on a server is going to be more standard and more easily supported by fastai and PyTorch, so you don't have to compile your model into a different form.\n", "\n", - "There are downsides too, of course. Your application will require a network connection, and there will be some latency each time the model is called. It takes a while for a neural network model to run anyway, so this additional network latency may not make a big difference to your users in practice. In fact, since you can use better hardware on the server, the overall latency may even be less! If your application uses sensitive data then your users may be concerned about an approach which sends that data to a remote server, so sometimes privacy considerations will mean that you need to run the model on the edge device. Sometimes this can be avoided by having a *on premise* server, such as inside a company's firewall. Managing the complexity and scaling the server can create additional overhead, whereas if your model runs on the edge devices then each user is bringing their own compute resources, which leads to easier scaling with an increasing number of users (also known as _horizontal scaling_)." + "There are downsides too, of course. Your application will require a network connection, and there will be some latency each time the model is called. It takes a while for a neural network model to run anyway, so this additional network latency may not make a big difference to your users in practice. In fact, since you can use better hardware on the server, the overall latency may even be less! If your application uses sensitive data then your users may be concerned about an approach which sends that data to a remote server, so sometimes privacy considerations will mean that you need to run the model on the edge device. Sometimes this can be avoided by having an *on premise* server, such as inside a company's firewall. Managing the complexity and scaling the server can create additional overhead, whereas if your model runs on the edge devices then each user is bringing their own compute resources, which leads to easier scaling with an increasing number of users (also known as _horizontal scaling_)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "> A: I've had a chance to see up close how the mobile ML landscape is changing in my work. We offer an iPhone app that depends on computer vision and for years we ran our own computer vision models in the cloud. This was the only way to do it then since those models needed significant memory and compute resources and took minutes to process. This approach required building not only the models (fun!) but infrastructure to ensure a certain number of \"compute worker machines\" was absolutely always running (scary), that more machines would automatically come online if traffic increased, that there was stable storage for large inputs and outputs, that the iOS app could know and tell the user how their job was doing, etc... Nowadays, Apple provides APIs for converting models to run efficiently on device and most iOS devices have dedicated ML hardware, so we run our new models on device. So, in a few years that strategy has gone from impossible to possible but it's still not easy. In our case it's worth it, for a faster user experiene and to worry less about servers. What works for you will depend, realistically, on the user experience you're trying to create and what you personally find it easy to do. If you really know how to run servers, do it. If you really know how to build native mobile apps, do that. There are many roads up the hill.\n", + "> A: I've had a chance to see up close how the mobile ML landscape is changing in my work. We offer an iPhone app that depends on computer vision and for years we ran our own computer vision models in the cloud. This was the only way to do it then since those models needed significant memory and compute resources and took minutes to process. This approach required building not only the models (fun!) but infrastructure to ensure a certain number of \"compute worker machines\" was absolutely always running (scary), that more machines would automatically come online if traffic increased, that there was stable storage for large inputs and outputs, that the iOS app could know and tell the user how their job was doing, etc... Nowadays, Apple provides APIs for converting models to run efficiently on device and most iOS devices have dedicated ML hardware, so we run our new models on device. So, in a few years that strategy has gone from impossible to possible but it's still not easy. In our case it's worth it, for a faster user experience and to worry less about servers. What works for you will depend, realistically, on the user experience you're trying to create and what you personally find it easy to do. If you really know how to run servers, do it. If you really know how to build native mobile apps, do that. There are many roads up the hill.\n", "\n", "Overall, we'd recommend using a simple CPU-based server approach where possible, for as long as you can get away with it. If you're lucky enough to have a very successful application, then you'll be able to justify the investment in more complex deployment approaches at that time.\n", "\n", @@ -1706,7 +1754,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In practice, a deep learning model will be just one piece of a much bigger system. As we discussed at the start of this chapter, a *data product* requires thinking about the entire end to end process within which our model lives.\n", + "In practice, a deep learning model will be just one piece of a much bigger system. As we discussed at the start of this chapter, a *data product* requires thinking about the entire end to end process within which our model lives. In this book, we can't hope to cover all the complexity of managing deployed data products, such as managing multiple versions of models, A/B testing, canarying, refreshing the data (should we just grow and grow our datasets all the time, or should we regularly remove some of the old data), handling data labelling, monitoring all this, detecting model rot, and so forth. However, there is an excellent book that covers many deployment issues, which is [Building Machine Learning Powered Applications](https://www.amazon.com/Building-Machine-Learning-Powered-Applications/dp/149204511X), by Emmanuel Ameisen. In this section, we will give an overview of some of the most important issues to consider.\n", "\n", "One of the biggest issues with this is that understanding and testing the behavior of a deep learning model is much more difficult than most code that you would write. With normal software development you can analyse the exact steps that the software is taking, and carefully study with of these steps match the desired behaviour that you are trying to create. But with a neural network the behavior emerges from the models attempt to match the training data, rather than being exactly defined.\n", "\n", @@ -1757,13 +1805,6 @@ "> j: I started a company 20 years ago called *Optimal Decisions* which used machine learning and optimisation to help giant insurance companies set their pricing, impacting tens of billions of dollars of risks. We used the approaches described above to manage the potential downsides of something that might go wrong. Also, before we worked with our clients to put anything in production, we tried to simulate the impact by testing the end to end system on their previous year's data. It was always quite a nerve-wracking process, putting these new algorithms in production, but every rollout was successful." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As you analyze the results while deploying your model progressively, you should check for the following unexpected behaviors." - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -1777,20 +1818,13 @@ "source": [ "One of the biggest challenges in rolling out a model is that your model may change the behaviour of the system it is a part of. For instance, consider YouTube's recommendation system. A couple of years ago Google talked about how they had introduced reinforcement learning (closely related to deep learning, but where your loss function represents a result which could be a long time after an action occurs) to improve their recommendation system. They described how they used an algorithm which made recommendations such that watch time would be optimised.\n", "\n", - "However, human beings tend to be drawn towards controversial content. This meant that videos about wings like conspiracy theories started to get recommended more and more by the recommendation system. Furthermore, it turns out that the kinds of people that are interested in conspiracy theories are also people that watch a lot of online videos! So, they started to get drawn more and more towards YouTube. The increasing number of conspiracy theorists watching YouTube resulted in the algorithm recommending more and more conspiracy theories and other extremist content, which resulted in more extremists watching videos on YouTube, and more people watching YouTube developing extremist views, which led to the algorithm recommending more extremist content... The system became so out of control that in February 2019 it led the New York Times to run the headline \"YouTube Unleashed a Conspiracy Theory Boom. Can It Be Contained?\"\n", + "However, human beings tend to be drawn towards controversial content. This meant that videos about things like conspiracy theories started to get recommended more and more by the recommendation system. Furthermore, it turns out that the kinds of people that are interested in conspiracy theories are also people that watch a lot of online videos! So, they started to get drawn more and more towards YouTube. The increasing number of conspiracy theorists watching YouTube resulted in the algorithm recommending more and more conspiracy theories and other extremist content, which resulted in more extremists watching videos on YouTube, and more people watching YouTube developing extremist views, which led to the algorithm recommending more extremist content... The system became so out of control that in February 2019 it led the New York Times to run the headline \"YouTube Unleashed a Conspiracy Theory Boom. Can It Be Contained?\"footnote:[https://www.nytimes.com/2019/02/19/technology/youtube-conspiracy-stars.html]\n", "\n", "A helpful exercise prior to rolling out a significant machine learning system is to consider this question: \"what would happen if it went really, really well?\" In other words, what if the predictive power was extremely high, and its ability to influence behaviour was extremely significant? In that case, who would be most impacted? What would the most extreme results potentially look like? How would you know what was really going on?\n", "\n", "Such a thought exercise might help you to construct a more careful rollout plan, ongoing monitoring systems, and human oversight. Of course, human oversight isn't useful if it isn't listened to; so make sure that there are reliable and resilient communication channels so that the right people will be aware of issues, and will have the power to fix them." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Congratulations, you have finished your first deep learning project! To help with understanding the material, we really recommend you start writing about what you learned." - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -1924,5 +1958,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/03_ethics.ipynb b/03_ethics.ipynb index b237e82..74a329b 100644 --- a/03_ethics.ipynb +++ b/03_ethics.ipynb @@ -18,7 +18,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Acknowledgement: Dr Rachel Thomas" + "**Acknowledgement: Dr Rachel Thomas**" ] }, { @@ -28,13 +28,6 @@ "This chapter was co-authored by Dr Rachel Thomas, the co-founder of fast.ai, and founding director of the Center for Applied Data Ethics at the University of San Francisco. It largely follows a subset of her syllabus for the \"Introduction to Data Ethics\" course that she developed." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Introduction to data ethics" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -73,7 +66,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Getting started with some examples" + "## Key examples for data ethics" ] }, { @@ -130,7 +123,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Dr. Latanya Sweeney is a professor at Harvard and director of their data privacy lab. In the paper [Discrimination in Online Ad Delivery](https://arxiv.org/abs/1301.6822) she describes her discovery that googling her name resulted in advertisements saying \"Latanya Sweeney arrested\" even although she is the only Latanya Sweeney and has never been arrested. However when she googled other names, such as Kirsten Lindquist, she got more neutral ads, even though Kirsten Lindquist has been arrested three times." + "Dr. Latanya Sweeney is a professor at Harvard and director of their data privacy lab. In the paper [Discrimination in Online Ad Delivery](https://arxiv.org/abs/1301.6822) (see <>) she describes her discovery that googling her name resulted in advertisements saying \"Latanya Sweeney arrested\" even although she is the only Latanya Sweeney and has never been arrested. However when she googled other names, such as Kirsten Lindquist, she got more neutral ads, even though Kirsten Lindquist has been arrested three times." ] }, { @@ -153,7 +146,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## So what?" + "TK Jeremy: \"Why does this matter?\" as an alternative title." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### So what?" ] }, { @@ -222,7 +222,7 @@ "\n", "These are not just algorithm questions. They are data product design questions. But the product managers, executives, judges, journalists, doctors… whoever ends up developing and using the system of which your model is a part will not be well-placed to understand the decisions that you made, let alone change them.\n", "\n", - "For instance, two studies found that Amazon’s facial recognition software produced [inaccurate](https://www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html) and [racially biased results](https://www.theverge.com/2019/1/25/18197137/amazon-rekognition-facial-recognition-bias-race-gender). Amazon claimed that the researchers should have changed the default parameters. However, it turned out that [Amazon was not instructing police departments](https://gizmodo.com/defense-of-amazons-face-recognition-tool-undermined-by-1832238149) that use its software to do this either. There was, presumably, a big distance between the researchers that developed these algorithms, and the Amazon documentation staff that wrote the guidelines provided to the police. A lack of tight integration led to serious problems for society, the police, and Amazon themselves. It turned out that their system erroneously *matched* 28 members of congress to criminal mugshots! (And these members of congress wrongly matched to criminal mugshots disproportionately included people of color.)" + "For instance, two studies found that Amazon’s facial recognition software produced [inaccurate](https://www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html) and [racially biased results](https://www.theverge.com/2019/1/25/18197137/amazon-rekognition-facial-recognition-bias-race-gender). Amazon claimed that the researchers should have changed the default parameters, they did not explain how it would change the racially baised results. Further more, it turned out that [Amazon was not instructing police departments](https://gizmodo.com/defense-of-amazons-face-recognition-tool-undermined-by-1832238149) that use its software to do this either. There was, presumably, a big distance between the researchers that developed these algorithms, and the Amazon documentation staff that wrote the guidelines provided to the police. A lack of tight integration led to serious problems for society, the police, and Amazon themselves. It turned out that their system erroneously *matched* 28 members of congress to criminal mugshots! (And these members of congress wrongly matched to criminal mugshots disproportionately included people of color as seen in <>.)" ] }, { @@ -255,6 +255,7 @@ "metadata": {}, "source": [ "Data ethics is a big field, and we can't cover everything. Instead, we're going to pick a few topics which we think are particularly relevant:\n", + "\n", "- need for recourse and accountability\n", "- feedback loops\n", "- bias\n", @@ -265,14 +266,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Errors and recourse" + "TK Jeremy-Rachel: Explain why those topics are important and transition to errors and recourse." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "In a complex system it is easy for no one person to feel responsible for outcomes. While this is understandable, it does not lead to good results. In the example above of the Arkansas healthcare system in which a bug led to people with cerebral palsy losing access to needed care, the creator of the algorithm blamed government officials, and government officials could blame those who implemented the software. NYU professor danah boyd described this phenomenon: \"bureaucracy has often been used to evade responsibility, and today's algorithmic systems are extending bureaucracy.\"\n", + "### Errors and recourse" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In a complex system it is easy for no one person to feel responsible for outcomes. While this is understandable, it does not lead to good results. In the earlier example of the Arkansas healthcare system in which a bug led to people with cerebral palsy losing access to needed care, the creator of the algorithm blamed government officials, and government officials could blame those who implemented the software. NYU professor danah boyd described this phenomenon: \"bureaucracy has often been used to evade responsibility, and today's algorithmic systems are extending bureaucracy.\"\n", "\n", "An additional reason why recourse is so necessary, is because data often contains errors. Mechanisms for audits and error-correction are crucial. A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as “admitting to being gang members”). In this case, there was no process in place for correcting mistakes or removing people once they’ve been added. Another example is the US credit report system; in a large-scale study of credit reports by the FTC in 2012, it was found that 26% of consumers had at least one mistake in their files, and 5% had errors that could be devastating. Yet, the process of getting such errors corrected is incredibly slow and opaque. When public-radio reporter Bobby Allyn discovered that he was erroneously listed as having a firearms conviction, it took him \"more than a dozen phone calls, the handiwork of a county court clerk and six weeks to solve the problem. And that was only after I contacted the company’s communications department as a journalist.\" (as covered in the article [How the careless errors of credit reporting agencies are ruining people’s lives](https://www.washingtonpost.com/posteverything/wp/2016/09/08/how-the-careless-errors-of-credit-reporting-agencies-are-ruining-peoples-lives/))\n", "\n", @@ -283,14 +291,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Feedback loops" + "### Feedback loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The New York Times published another article on YouTube's recommendation system, titled [On YouTube’s Digital Playground, an Open Gate for Pedophiles](https://www.nytimes.com/2019/06/03/world/americas/youtube-pedophiles.html). The article started with this chilling story:" + "We have already explained in <> how an algorithm can interact with its enviromnent to create a feedback loop, making prediction that reinforces actions taken in the field, which lead to predictions even more pronounced in the same direciton. The New York Times published another article on YouTube's recommendation system, titled [On YouTube’s Digital Playground, an Open Gate for Pedophiles](https://www.nytimes.com/2019/06/03/world/americas/youtube-pedophiles.html). The article started with this chilling story:" ] }, { @@ -312,7 +320,7 @@ "\n", "Part of the problem here is the centrality of metrics in driving a financially important system. When an algorithm has a metric to optimise, as you have seen, it will do everything it can to optimise that number. This tends to lead to all kinds of edge cases, and humans interacting with a system will search for, find, and exploit these edge cases and feedback loops for their advantage.\n", "\n", - "There are signs that this is exactly what has happened with YouTube's recommendation system. The Guardian ran an article [How an ex-YouTube insider investigated its secret algorithm](https://www.theguardian.com/technology/2018/feb/02/youtube-algorithm-election-clinton-trump-guillaume-chaslot) about Guillaume Chaslot, an ex-YouTube engineer who created AlgoTransparency, which tracks these issues. Chaslot published this chart, following the release of Robert Mueller's \"Report on the Investigation Into Russian Interference in the 2016 Presidential Election\":" + "There are signs that this is exactly what has happened with YouTube's recommendation system. The Guardian ran an article [How an ex-YouTube insider investigated its secret algorithm](https://www.theguardian.com/technology/2018/feb/02/youtube-algorithm-election-clinton-trump-guillaume-chaslot) about Guillaume Chaslot, an ex-YouTube engineer who created AlgoTransparency, which tracks these issues. Chaslot published the chart in <>, following the release of Robert Mueller's \"Report on the Investigation Into Russian Interference in the 2016 Presidential Election\"." ] }, { @@ -342,6 +350,13 @@ "> : \"once people join a single conspiracy-minded \\[Facebook\\] group, they are algorithmically routed to a plethora of others. Join an anti-vaccine group, and your suggestions will include anti-GMO, chemtrail watch, flat Earther (yes, really), and ‘curing cancer naturally’ groups. Rather than pulling a user out of the rabbit hole, the recommendation engine pushes them further in.\"" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is extremely important to keep in mind this kind of behavior can happen, and to either anticipate a feedback loop or take positive action to break it when you can the first signs of it in your own projects. Another thing to keep in mind is bias." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -355,7 +370,7 @@ "source": [ "Discussions of bias online tend to get pretty confusing pretty fast. The word bias mean so many different things. Statisticians often think that when data ethicists are talking about bias that they're talking about the statistical definition of the term bias. But they're not. And they're certainly not talking about the bias is that appear in the weights and bias is which are the parameters of your model!\n", "\n", - "What they're talking about is the social science concept of bias. In [A Framework for Understanding Unintended Consequences of Machine Learning](https://arxiv.org/abs/1901.10002) MIT's Suresh and Guttag describe six types of bias in machine learning, summarized in this figure from their paper:" + "What they're talking about is the social science concept of bias. In [A Framework for Understanding Unintended Consequences of Machine Learning](https://arxiv.org/abs/1901.10002) MIT's Suresh and Guttag describe six types of bias in machine learning, summarized in <> from their paper." ] }, { @@ -365,6 +380,13 @@ "\"A" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "TK Jeremy: \"Why only four? Tell the reader.\" If you have anything interesting to say about that here, otherwise we can ignore." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -454,7 +476,7 @@ "\n", "One of the MIT researchers, Joy Buolamwini, warned, \"We have entered the age of automation overconfident yet underprepared. If we fail to make ethical and inclusive artificial intelligence, we risk losing gains made in civil rights and gender equity under the guise of machine neutrality\".\n", "\n", - "Part of the issue appears to be a systematic imbalance in the make up of popular datasets used for training models. The abstract to the paper [No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World](https://arxiv.org/abs/1711.08536) states, \"We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales\". Here is one of the charts from the paper, showing the geographic make up of what was, at the time (and still, as this book is being written), the two most important image datasets for training models:" + "Part of the issue appears to be a systematic imbalance in the make up of popular datasets used for training models. The abstract to the paper [No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World](https://arxiv.org/abs/1711.08536) states, \"We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales\". <> shows one of the charts from the paper, showing the geographic make up of what was, at the time (and still, as this book is being written), the two most important image datasets for training models." ] }, { @@ -468,7 +490,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The vast majority of the images are from the United States and other Western countries, leading to models trained on ImageNet performing worse on scenes from other countries and cultures. For instance, [research](https://arxiv.org/pdf/1906.02659.pdf) found that such models are worse at identifying household items (such as soap, spices, sofas, or beds) from lower-income countries. Below is an image from the paper, [Does Object Recognition Work for Everyone?](https://arxiv.org/pdf/1906.02659.pdf)." + "The vast majority of the images are from the United States and other Western countries, leading to models trained on ImageNet performing worse on scenes from other countries and cultures. For instance, [research](https://arxiv.org/pdf/1906.02659.pdf) found that such models are worse at identifying household items (such as soap, spices, sofas, or beds) from lower-income countries. <> shows an image from the paper, [Does Object Recognition Work for Everyone?](https://arxiv.org/pdf/1906.02659.pdf)." ] }, { @@ -482,7 +504,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As we will discuss shortly, in addition, the vast majority of AI researchers and developers are young white men. Most projects that we have seen do most user testing using friends and families of the immediate product development group. Given this, the kinds of problems we saw above should not be surprising.\n", + "TK Jeremy: \"Tell the reader what the figure shows, what's the takeaway?\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As we will discuss shortly, in addition, the vast majority of AI researchers and developers are young white men. Most projects that we have seen do most user testing using friends and families of the immediate product development group. Given this, the kinds of problems we just discussed should not be surprising.\n", "\n", "Similar historical bias is found in the texts used as data for natural language processing models. This crops up in downstream machine learning tasks in many ways. For instance, until last year Google Translate showed systematic bias in how it translated the Turkish gender-neutral pronoun \"bir\" into English. For instance, when applied to jobs which are often associated with males, it used \"he\", and when applied to jobs which are often associated with females, it used \"she\":" ] @@ -494,6 +523,13 @@ "" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "TK Jeremy: Link to the study needed" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -553,7 +589,7 @@ "source": [ "The abstract of the paper [Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting](https://arxiv.org/abs/1901.09451) notes that there is gender imbalance in occupations (e.g. females are more likely to be nurses, and males are more likely to be pastors), and says that: \"differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances\".\n", "\n", - "What this is saying is that the researchers noticed that models predicting occupation did not only reflect the actual gender imbalance in the underlying population, but actually amplified it! This is quite common, particularly for simple models. When there is some clear, easy to see underlying relationship, a simple model will often simply assume that that relationship holds all the time. As the show with the paper, for occupations which had a higher percentage of females, the model tended to overestimate the prevalence of that occupation:" + "What this is saying is that the researchers noticed that models predicting occupation did not only reflect the actual gender imbalance in the underlying population, but actually amplified it! This is quite common, particularly for simple models. When there is some clear, easy to see underlying relationship, a simple model will often simply assume that that relationship holds all the time. As <> from the paper shows, for occupations which had a higher percentage of females, the model tended to overestimate the prevalence of that occupation." ] }, { @@ -567,14 +603,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For example, in the training dataset, 14.6% of surgeons were women, yet in the model predictions, only 11.6% of the true positives were women." + "For example, in the training dataset, 14.6% of surgeons were women, yet in the model predictions, only 11.6% of the true positives were women. The model is thus amplifying the bias existing in the training set.\n", + "\n", + "Now that we saw those bias existed, what can we do to mitigate them?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Addressing different types of bias" + "## Addressing different types of bias" ] }, { @@ -597,10 +635,10 @@ "source": [ "We often hear this question — \"humans are biased, so does algorithmic bias even matter?\" This comes up so often, there must be some reasoning that makes sense to the people that ask it, but it doesn't seem very logically sound to us! Independently of whether this is logically sound, it's important to realise that algorithms and people are different. Machine learning, particularly so. Consider these points about machine learning algorithms:\n", "\n", - " - *Machine learning can create feedback loops*: small amounts of bias can very rapidly, exponentially increase due to feedback loops\n", - " - *Machine learning can amplify bias*: human bias can lead to larger amounts of machine learning bias\n", - " - *Algorithms & humans are used differently*: human decision makers and algorithmic decision makers are not used in a plug-and-play interchangeable way in practice. For instance, algorithmic decisions are more likely to be implemented at scale and without a process for recourse. Furthermore, people are more likely to mistakenly believe that the result of an algorithm is objective and error-free.\n", - " - *Technology is power*. And with that comes responsibility.\n", + " - _Machine learning can create feedback loops_:: small amounts of bias can very rapidly, exponentially increase due to feedback loops\n", + " - _Machine learning can amplify bias_:: human bias can lead to larger amounts of machine learning bias\n", + " - _Algorithms & humans are used differently_:: human decision makers and algorithmic decision makers are not used in a plug-and-play interchangeable way in practice. For instance, algorithmic decisions are more likely to be implemented at scale and without a process for recourse. Furthermore, people are more likely to mistakenly believe that the result of an algorithm is objective and error-free.\n", + " - _Technology is power_:: And with that comes responsibility.\n", "\n", "As the Arkansas healthcare example showed, machine learning is often implemented in practice not because it leads to better outcomes, but because it is cheaper and more efficient. Cathy O'Neill, in her book *Weapons of Math Destruction*, described the pattern of how the privileged are processed by people, the poor are processed by algorithms. This is just one of a number of ways that algorithms are used differently than human decision makers. Others include:\n", "\n", @@ -614,14 +652,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Data contains errors" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Because data is likely to contain errors, mechanisms for audits and error-correction are important. A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as *admitting to being gang members*). In this case, there was no process in place for correcting mistakes or removing people once they’ve been added. Another example is the US credit report system; in a large-scale study of credit reports by the FTC in 2012, it was found that 26% of consumers had at least one mistake in their files, and 5% had errors that could be devastating. Yet, the process of getting such errors corrected is incredibly slow and opaque. When public-radio reporter Bobby Allyn discovered that he was erroneously listed as having a firearms conviction, it took him \"more than a dozen phone calls, the handiwork of a county court clerk and six weeks to solve the problem. And that was only after I contacted the company’s communications department as a journalist.\" (as covered in the article [How the careless errors of credit reporting agencies are ruining people’s lives](https://www.washingtonpost.com/posteverything/wp/2016/09/08/how-the-careless-errors-of-credit-reporting-agencies-are-ruining-peoples-lives/))" + "TK Jeremy: Takeaway for readers and transition to disinformation." ] }, { @@ -662,6 +693,13 @@ "One proposed approach is to develop some form of digital signature, implement it in a seamless way, and to create norms that we should only trust content which has been verified. Head of the Allen Institute on AI, Oren Etzioni, wrote such a proposal in an article titled [How Will We Prevent AI-Based Forgery?](https://hbr.org/2019/03/how-will-we-prevent-ai-based-forgery), \"AI is poised to make high-fidelity forgery inexpensive and automated, leading to potentially disastrous consequences for democracy, security, and society. The specter of AI forgery means that we need to act to make digital signatures de rigueur as a means of authentication of digital content.\"" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "TK Jeremy: Wrap up section and transition to next. Also change next title to What to do about bla or What to do with foo." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -683,6 +721,13 @@ "- increase diversity" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's walk through each step next, staring with analyzing a project you are working on." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -705,6 +750,13 @@ " - How diverse is the team that built it?" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "TK Jeremy: Expand--add some additional details and takeaways from the reader. What will they get out of doing this and how should they go about it? Then transition to \"Process to Implement\"" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -726,6 +778,13 @@ " - Who might use this product that we didn’t expect to use it, or for purposes we didn’t initially intend?" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "TK Jeremy: Add takeaways and transition to Ethical Lenses" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -739,11 +798,11 @@ "source": [ "Another useful resource from the Markkula Center is [Conceptual Frameworks in Technology and Engineering Practice](https://www.scu.edu/ethics-in-technology-practice/conceptual-frameworks/). This considers how different foundational ethical lenses can help identify concrete issues, and lays out the following approaches and key questions:\n", "\n", - " - The Rights Approach: Which option best respects the rights of all who have a stake?\n", - " - The Justice Approach: Which option treats people equally or proportionately?\n", - " - The Utilitarian Approach: Which option will produce the most good and do the least harm?\n", - " - The Common Good Approach: Which option best serves the community as a whole, not just some members?\n", - " - The Virtue Approach: Which option leads me to act as the sort of person I want to be?" + " - The Rights Approach:: Which option best respects the rights of all who have a stake?\n", + " - The Justice Approach:: Which option treats people equally or proportionately?\n", + " - The Utilitarian Approach:: Which option will produce the most good and do the least harm?\n", + " - The Common Good Approach:: Which option best serves the community as a whole, not just some members?\n", + " - The Virtue Approach:: Which option leads me to act as the sort of person I want to be?" ] }, { @@ -807,6 +866,13 @@ "In philosophy, and especially philosophy of ethics, this is one of the most effective tools: first, come up with a process, definition, set of questions, etc, which is designed to resolve some problem. Then try to come up with an example where that apparent solution results in a proposal that no-one would consider acceptable. This can then lead to a further refinement of the solution." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "TK Jeremy: Add takeaways for the reader and transition to Role of Policy." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -821,16 +887,25 @@ "The ethical issues that arise in the use of automated decision systems, such as machine learning, can be complex and far-reaching. To better address them, we will need thoughtful policy, in addition to the ethical efforts of those in industry. Neither is sufficient on its own.\n", "\n", "Policy is the appropriate tool for addressing:\n", + "\n", "- Negative externalities\n", "- Misaligned economic incentives\n", "- “Race to the bottom” situations\n", "- Enforcing accountability.\n", "\n", "Ethical behavior in industry is necessary as well, since:\n", + "\n", "- Law will not always keep up\n", "- Edge cases will arise in which practitioners must use their best judgement." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "TK Jeremy: Expand this section. What does this mean for the reader? Add transition to The Power of Diversity" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/04_mnist_basics.ipynb b/04_mnist_basics.ipynb index ead7153..ad4d6f7 100644 --- a/04_mnist_basics.ipynb +++ b/04_mnist_basics.ipynb @@ -27,6 +27,15 @@ "# Under the hood: training a digit classifier" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we’ve seen what it looks like to actually train a variety of models, let’s now dig under the hood and see exactly what is going on. We’ll start with computer vision, and will use that to introduce many of the key concepts of deep learning. In future chapters we’ll do deep dives into other applications as well, and we’ll see how to use these insights to both improve our model’s accuracy, speed up its training, and turn it into a real working web application.\n", + "\n", + "First, let's start by how images are represented in a computer, then we will make our way up to how to classify different type of images." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -38,8 +47,6 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that we’ve seen what it looks like to actually train a variety of models, let’s now dig under the hood and see exactly what is going on. We’ll start with computer vision, and will use that to introduce many of the key concepts of deep learning. In future chapters we’ll do deep dives into other applications as well, and we’ll see how to use these insights to both improve our model’s accuracy, speed up its training, and turn it into a real working web application.\n", - "\n", "In order to understand what happens in a computer vision model, we first have to understand how computers handle images. We'll use one of the most famous datasets in computer vision, [MNIST](https://en.wikipedia.org/wiki/MNIST_database), for our experiments. MNIST contains hand-written digits, collected by the National Institute of Standards and Technology, and collated into a machine learning dataset by Yann Lecun and his colleagues. Lecun used MNIST in 1998 to demonstrate [Lenet 5](http://yann.lecun.com/exdb/lenet/), the first computer system to demonstrate practically useful recognition of hand-written digit sequences. This was one of the most important breakthroughs in the history of AI." ] }, @@ -1750,6 +1757,13 @@ "> j: When I first came across this \"L1\" thingie, I looked it up to see what on Earth it meant, found on Google that it is a _vector norm_ using _absolute value_, so looked up _vector norm_ and started reading: _Given a vector space V over a field F of the real or complex numbers, a norm on V is a nonnegative-valued any function p: V → \\[0,+∞) with the following properties: For all a ∈ F and all u, v ∈ V, p(u + v) ≤ p(u) + p(v)..._ Then I stopped reading. \"Ugh, I'll never understand math!\" I thought, for the thousandth time. Since then I've learned that every time these complex mathy bits of jargon come up in practice, it turns out I can replace them with a tiny bit of code! Like the _L1 loss_ is just equal to `(a-b).abs().mean()`, where `a` and `b` are tensors. I guess mathy folks just think differently to me... I'll make sure, in this book, every time some mathy jargon comes up, I'll give you the little bit of code it's equal to as well, and explain in common sense terms what's going on." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above code we completed various mathematical operations on *PyTorch tensors*. If you've done some numeric programming in Pytorch before, you may recognize these as being similar to *Numpy arrays*. Let's have a look at those two very important classes." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1761,7 +1775,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the above code we completed various mathematical operations on *PyTorch tensors*. If you've done some numeric programming in Pytorch before, you may recognize these as being similar to *Numpy arrays*. [Numpy](https://numpy.org/) is the most widely used library for scientific and numeric programming in Python, and provides very similar functionality and a very similar API to that provided by PyTorch; however, it does not support using the GPU, or calculating gradients, which are both critical for deep learning. Therefore, in this book we will generally use PyTorch tensors instead of NumPy arrays, where possible. (Note that fastai adds some features to NumPy and PyTorch to make them a bit more similar to each other; if any code in this book doesn't work on your computer, it's possible that you forgot to include a line at the start of your notebook such as: `from fastai.vision.all import *`.)\n", + "[Numpy](https://numpy.org/) is the most widely used library for scientific and numeric programming in Python, and provides very similar functionality and a very similar API to that provided by PyTorch; however, it does not support using the GPU, or calculating gradients, which are both critical for deep learning. Therefore, in this book we will generally use PyTorch tensors instead of NumPy arrays, where possible. (Note that fastai adds some features to NumPy and PyTorch to make them a bit more similar to each other; if any code in this book doesn't work on your computer, it's possible that you forgot to include a line at the start of your notebook such as: `from fastai.vision.all import *`.)\n", "\n", "So, what's an array? And what's a tensor?\n", "\n", @@ -2012,14 +2026,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Broadcasting and metrics" + "So, is our baseline model any good? To quantify this, we will use a metric." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "So, is our baseline model any good? To quantify this, we will use a metric. A metric is a number which is calculated from the predictions of our model, and the correct labels in our dataset, and tells us something about how good our model is. For instance, we could use either of the functions we saw in the previous section, mean squared error or mean absolute error, and take the average of them over the whole dataset. However, neither of these are numbers that are very understandable to most people; in practice, we normally use *accuracy* as the metric for classification models.\n", + "## Metrics and broadcasting" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A metric is a number which is calculated from the predictions of our model, and the correct labels in our dataset, and tells us something about how good our model is. For instance, we could use either of the functions we saw in the previous section, mean squared error or mean absolute error, and take the average of them over the whole dataset. However, neither of these are numbers that are very understandable to most people; in practice, we normally use *accuracy* as the metric for classification models.\n", "\n", "As we've discussed, we need to use a *validation set* to calculate our metric. That means we need to do is remove some of the data from training entirely, so it is not seen by the model at all. As it turns out, the creators of the MNIST dataset have already done this for us. Do you remember how there was a whole separate directory called \"valid\"? That's what this directory is for!\n", "\n", @@ -2300,7 +2321,7 @@ "\n", "> : _Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programed would \"learn\" from its experience._\n", "\n", - "As we discussed, this is the key to allowing us to have something which can get better and better — to learn. But our pixel similarity approach does not really do this. We do not have any kind of weight assignment, or any way of improving based on testing the effectiveness of a weight assignment. In other words, we can't really improve our pixel similarity approach by modifying a set of parameters. In order to take advantage of the power of deep learning, we will first have to represent our task in the way that Arthur Samuel described it.\n", + "As we discussed, this is the key to allowing us to have something which can get better and better — to learn. But our pixel similarity approach does not really do this. We do not have any kind of weight assignment, or any way of improving based on testing the effectiveness of a weight assignment. In other words, we can't really improve our pixel similarity approach by modifying a set of parameters (which will be the SGD part, as we will see). In order to take advantage of the power of deep learning, we will first have to represent our task in the way that Arthur Samuel described it.\n", "\n", "Instead of trying to find the similarity between an image and a \"ideal image\" we could instead look at each individual pixel, and come up with a set of weights for each pixel, such that the highest weights are associated with those pixels most likely to be black for a particular category. For instance, pixels towards the bottom right are not very likely to be activated for a seven, so they should have a low weight for a seven, but are more likely to be activated for an eight, so they should have a high weight for an eight. This can be represented as a function for each possible category, for instance the probability of being the number eight:\n", "\n", @@ -2448,14 +2469,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "These seven steps are the key to the training of all deep learning models, and we'll be using the seven terms in the above diagram throughout this book. That deep learning turns out to rely entirely on these steps is extremely surprising and counter-intuitive. It's amazing that this process can solve such complex problems. But, as you'll see, it really does!\n", + "These seven steps, illustrated in <> are the key to the training of all deep learning models, and we'll be using the seven terms in the above diagram throughout this book. That deep learning turns out to rely entirely on these steps is extremely surprising and counter-intuitive. It's amazing that this process can solve such complex problems. But, as you'll see, it really does!\n", "\n", "There are many different ways to do each of these seven steps, and we will be learning about them throughout the rest of this book. These are the details which make a big difference for deep learning practitioners. But it turns out that the general approach to each one generally follows some basic principles:\n", "\n", - "- **Initialize**: we initialise the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initialising them to the percentage of times that that pixel is activated for that category. But since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well\n", - "- **Loss**: This is the thing Arthur Samuel refered to: \"*testing the effectiveness of any current weight assignment in terms of actual performance*\". We need some function that will return a number that is small if the performance of the model is good, and vice versa (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention)\n", - "- **Step**: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it. Increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount which works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out which direction, and roughly how much, to change each weight, without having to try all these small changes, by calculating *gradients*. This is just a performance optimisation, we would get exactly the same results by using the slower manual process as well\n", - "- **Stop**: We have already discussed how to choose how many epochs to train a model for. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time." + "- **Initialize**:: we initialise the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initialising them to the percentage of times that that pixel is activated for that category. But since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well\n", + "- **Loss**:: This is the thing Arthur Samuel refered to: \"*testing the effectiveness of any current weight assignment in terms of actual performance*\". We need some function that will return a number that is small if the performance of the model is good, and vice versa (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention)\n", + "- **Step**:: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it. Increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount which works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out which direction, and roughly how much, to change each weight, without having to try all these small changes, by calculating *gradients*. This is just a performance optimisation, we would get exactly the same results by using the slower manual process as well\n", + "- **Stop**:: We have already discussed how to choose how many epochs to train a model for. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time." ] }, { @@ -2544,7 +2565,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"A" + "\"A" ] }, { @@ -2558,7 +2579,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"An" + "\"An" ] }, { @@ -2776,15 +2797,20 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Stepping with a learning rate" + "The gradient only tells us the slope of our function, it doesn't actually tell us how far to adjust the parameters. It gives us some idea of how far to adjust them; if the slope is very large, then that may suggest that we have more adjustments to do, whereas if the slope is very small, that may suggest that we are close to the optimal value." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Stepping with a learning rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The gradient only tells us the slope of our function, it doesn't actually tell us how far to adjust the parameters. It gives us some idea of how far to adjust them; if the slope is very large, then that may suggest that we have more adjustments to do, whereas if the slope is very small, that may suggest that we are close to the optimal value.\n", - "\n", "Deciding how to change our parameters based on the value of the gradients is an important part of the deep learning process. Nearly all approaches start with the basic idea of multiplying the gradient by some small number, called the *learning rate* (LR). The learning rate is often a number between 0.001 and 0.1, although it could be anything. Often, people select a learning rate just by trying a few, and finding which results in the best model after training (we'll show you a better approach later in this book, called the *learning rate finder*). Once you've picked a learning rate, you can adjust your parameters using this simple function:\n", "\n", "```\n", @@ -2793,7 +2819,7 @@ "\n", "This is known as *stepping* your parameters, using a *optimiser step*.\n", "\n", - "If you pick a learning rate that's too low, it can mean having to do for a lot of steps:" + "If you pick a learning rate that's too low, it can mean having to do for a lot of steps. <> illustrates that." ] }, { @@ -2807,7 +2833,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Although picking a learning rate that's too high is even worse--it can actually result in the loss getting *worse*!" + "Although picking a learning rate that's too high is even worse--it can actually result in the loss getting *worse* as we see in <>!" ] }, { @@ -2821,7 +2847,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If the learning rate is too high, it may also \"bounce\" around, rather than actually diverging; this has the result of taking many steps to train successfully:" + "If the learning rate is too high, it may also \"bounce\" around, rather than actually diverging; <> shows how this has the result of taking many steps to train successfully." ] }, { @@ -2831,6 +2857,13 @@ "\"An" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's apply all of this on an end-to-end example." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -3371,7 +3404,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "`torch.where(a,b,c)` is the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. (It's important to learn about PyTorch functions like this, because looping over tensors in Python performs at Python speed, not C/CUDA speed!) Try running `help(torch.where)` now to read the docs for this function, or, better still, look it up on the PyTorch documentation site." + "`torch.where(a,b,c)` is the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. \n", + "\n", + "> note: It's important to learn about PyTorch functions like this, because looping over tensors in Python performs at Python speed, not C/CUDA speed!\n", + "\n", + "Try running `help(torch.where)` now to read the docs for this function, or, better still, look it up on the PyTorch documentation site." ] }, { @@ -3448,6 +3485,13 @@ "mnist_loss(tensor([0.9, 0.4, 0.8]),tgt)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One problem with mnist_loss as currently defined is that it assumes that inputs are always between zero and one. We need to ensure, then, that this is actually the case! As it happens, there is a function that does exactly that--it always outputs a number between zero and one and it's called sigmoid." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -3459,7 +3503,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "One problem with `mnist_loss` as currently defined is that it assumes that inputs are always between zero and one. We need to ensure, then, that this is actually the case! As it happens, there is a function that does exactly that--it always outputs a number between one and one. This function is called *sigmoid* and is defined by:" + "The function called *sigmoid* is defined by:" ] }, { @@ -3531,7 +3575,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Stochastic gradient descent and mini-batches" + "### SGD and mini-batches" ] }, { @@ -3631,6 +3675,13 @@ "list(dl)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are now read to write our first training loop for a model using SGD!" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -3642,7 +3693,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In code, our process will be implemented something like this for each epoch:\n", + "it's time to implement the graph we saw in <>. In code, our process will be implemented something like this for each epoch:\n", "\n", "```python\n", "for x,y in dl:\n", @@ -3885,7 +3936,7 @@ "source": [ "Whilst we could use a python for loop to calculate the prediction for each image, that would be very slow. Because Python loops don't run on the GPU, and because Python is a slow language for loops in general, we need to represent as much of the computation in a model as possible using higher-level functions.\n", "\n", - "In this case, there's an extremely convenient mathematical operation that calculates `w*x` for every row of a matrix--it's called *matrix multiplication*. Here's what matrix multiplication looks like (diagram from Wikipedia):" + "In this case, there's an extremely convenient mathematical operation that calculates `w*x` for every row of a matrix--it's called *matrix multiplication*. <> show what matrix multiplication looks like (diagram from Wikipedia)." ] }, { @@ -4090,7 +4141,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Our only remaining step will be to update the weights and bias based on the gradient and learning rate. When we do so, we have to tell PyTorch not to take the gradient of this step too, otherwise things will get very confusing! If we assign to the `data` attribute of a tensor then PyTorch will not take the gradient of that step. Here's our basic training loop for an epoch:" + "Our only remaining step will be to update the weights and bias based on the gradient and learning rate. When we do so, we have to tell PyTorch not to take the gradient of this step too, otherwise things will get very confusing when we try to compute the derivative at the next batch! If we assign to the `data` attribute of a tensor then PyTorch will not take the gradient of that step. Here's our basic training loop for an epoch:" ] }, { @@ -4274,7 +4325,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Looking good! We're already about at the same accuracy as our \"pixel similarity\" approach, and we've created a general purpose foundation we can build on." + "Looking good! We're already about at the same accuracy as our \"pixel similarity\" approach, and we've created a general purpose foundation we can build on. Our next step will be to create an object that will handle the SGD step for us. In PyTorch, it's called an *optimizer*." ] }, { @@ -4288,7 +4339,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Because this is such a useful general foundation, PyTorch provides some useful classes to make it easier to implement. The first we'll use is to replace our `linear()` function with PyTorch's `nn.Linear` *module*. A \"module\" is an object of a class that inherits from the PyTorch `nn.Module` class. Objects of this class behave identically to a standard Python function, in that you can call it using parentheses, and it will return the activations of a model.\n", + "Because this is such a general foundation, PyTorch provides some useful classes to make it easier to implement. The first we'll use is to replace our `linear()` function with PyTorch's `nn.Linear` *module*. A \"module\" is an object of a class that inherits from the PyTorch `nn.Module` class. Objects of this class behave identically to a standard Python function, in that you can call it using parentheses, and it will return the activations of a model.\n", "\n", "`nn.Linear` does the same thing as our `init_params` and `linear` together. It contains both the *weights* and *bias* in a single class. Here's how we replicate our model from the previous section:" ] @@ -4649,7 +4700,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "So far we have a general procedure for optimising the parameters of a function, and we have tried it out on a very boring function: a simple linear classifier. A linear classifier is very constrained in terms of what it can do. Let's instead use a neural network. Here is the entire definition of a basic neural network:" + "So far we have a general procedure for optimising the parameters of a function, and we have tried it out on a very boring function: a simple linear classifier. A linear classifier is very constrained in terms of what it can do. To make it a bit more complex (and able to handle more tasks), we need to add a non-linearity between two linear classifiers, and this is what will gived us a neural network.\n", + "\n", + "Here is the entire definition of a basic neural network:" ] }, { @@ -4692,7 +4745,7 @@ "source": [ "The key point about this is that `w1` has 30 output activations (which means that `w2` must have 30 input activations, so they match). That means that the first layer can construct 30 different features, each representing some different mix of pixels. You can change that `30` to anything you like, to make the model more or less complex.\n", "\n", - "That little function `res.max(tensor(0.0))` is called a *rectified linear unit*, also known as *ReLU*. I think we can all agree that *rectified linear unit* sounds pretty fancy and complicated... But actually, there's nothing more to it than `res.max(tensor(0.0))`, in other words: replace every negative number with a zero. This tiny function is also available in PyTorch as `F.relu`:" + "That little function `res.max(tensor(0.0))` is called a *rectified linear unit*, also known as *ReLU*. We think we can all agree that *rectified linear unit* sounds pretty fancy and complicated... But actually, there's nothing more to it than `res.max(tensor(0.0))`, in other words: replace every negative number with a zero. This tiny function is also available in PyTorch as `F.relu`:" ] }, { @@ -4730,8 +4783,20 @@ "source": [ "The basic idea is that by using more linear layers, we can have our model do more computation, and therefore model more complex functions. But there's no point just putting one linear layout directly after another one, because when we multiply things together and then at them up multiple times, that can be replaced by multiplying different things together and adding them up just once! That is to say, a series of any number of linear layers in a row can be replaced with a single linear layer with a different set of parameters.\n", "\n", - "But if we put a non-linear function between them, such as max, then this is no longer true. Now, each linear layer is actually somewhat decoupled from the other ones, and can do its own useful work. The max function is particularly interesting, because it operates as a simple \"if\" statement. For any arbitrarily wiggly function, we can approximate it as a bunch of lines joined together; to make it more close to the wiggly function, we just have to use shorter lines.\n", - "\n", + "But if we put a non-linear function between them, such as max, then this is no longer true. Now, each linear layer is actually somewhat decoupled from the other ones, and can do its own useful work. The max function is particularly interesting, because it operates as a simple \"if\" statement. For any arbitrarily wiggly function, we can approximate it as a bunch of lines joined together; to make it more close to the wiggly function, we just have to use shorter lines." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> s: Mathematically, we say the composition of two linear functions is another linear function. So we can stack as many linear classifiers on top or each other, without non-linear functions between them, it will jsut be the same as one linear classifier." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "Amazingly enough, it can be mathematically proven that this little function can solve any computable problem to an arbitrarily high level of accuracy, if you can find the right parameters for `w1` and `w2`, and if you make these matrices big enough. This is known as the *universal approximation theorem* . The three lines of code that we have here are known as *layers*. The first and third are known as *linear layers*, and the second line of code is known variously as a *nonlinearity*, or *activation function*.\n", "\n", "Just like the previous section, we can replace this code with something a bit simpler, by taking advantage of PyTorch:" @@ -5217,7 +5282,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Deep learning" + "## Jargon recap" ] }, { @@ -5250,7 +5315,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### _Choose Your Own Adventure_ reminder" + "#### _Choose Your Own Adventure_ reminder" ] }, { diff --git a/05_pet_breeds.ipynb b/05_pet_breeds.ipynb index d4cf37c..9008ff1 100644 --- a/05_pet_breeds.ipynb +++ b/05_pet_breeds.ipynb @@ -2466,6 +2466,31 @@ "display_name": "Python 3", "language": "python", "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.5" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": false, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false } }, "nbformat": 4, diff --git a/06_multicat.ipynb b/06_multicat.ipynb index e0a107e..4c0220a 100644 --- a/06_multicat.ipynb +++ b/06_multicat.ipynb @@ -1898,6 +1898,31 @@ "display_name": "Python 3", "language": "python", "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.5" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": false, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false } }, "nbformat": 4, diff --git a/08_collab.ipynb b/08_collab.ipynb index 213f7ad..c650451 100644 --- a/08_collab.ipynb +++ b/08_collab.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -55,7 +55,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -73,7 +73,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -152,7 +152,7 @@ "4 166 346 1 886397596" ] }, - "execution_count": 3, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -188,7 +188,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -204,7 +204,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -220,7 +220,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -229,7 +229,7 @@ "2.1420000000000003" ] }, - "execution_count": 6, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -261,7 +261,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -277,7 +277,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -286,7 +286,7 @@ "-1.611" ] }, - "execution_count": 8, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -345,7 +345,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -412,7 +412,7 @@ "4 5 Copycat (1995)" ] }, - "execution_count": 9, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -432,7 +432,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -517,7 +517,7 @@ "4 306 242 5 876503793 Kolya (1996)" ] }, - "execution_count": 10, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -536,7 +536,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -637,7 +637,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -660,7 +660,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -669,7 +669,7 @@ "tensor([-0.4586, -0.9915, -0.4052, -0.3621, -0.5908])" ] }, - "execution_count": 13, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -688,7 +688,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -697,7 +697,7 @@ "tensor([-0.4586, -0.9915, -0.4052, -0.3621, -0.5908])" ] }, - "execution_count": 14, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -755,7 +755,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -773,7 +773,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -782,7 +782,7 @@ "'Hello Sylvain, nice to meet you.'" ] }, - "execution_count": 16, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -803,7 +803,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -829,7 +829,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -838,7 +838,7 @@ "torch.Size([64, 2])" ] }, - "execution_count": 18, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -857,7 +857,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -874,7 +874,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -944,7 +944,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -962,7 +962,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1036,7 +1036,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1065,7 +1065,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1153,7 +1153,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "metadata": { "hide_input": true }, @@ -1205,7 +1205,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1291,7 +1291,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1300,7 +1300,7 @@ "(#0) []" ] }, - "execution_count": 30, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -1321,7 +1321,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1331,7 +1331,7 @@ "tensor([1., 1., 1.], requires_grad=True)]" ] }, - "execution_count": 32, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -1352,7 +1352,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1364,7 +1364,7 @@ " [ 0.8159]], requires_grad=True)]" ] }, - "execution_count": 37, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -1379,7 +1379,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1388,7 +1388,7 @@ "torch.nn.parameter.Parameter" ] }, - "execution_count": 41, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -1406,7 +1406,7 @@ }, { "cell_type": "code", - "execution_count": 58, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1423,7 +1423,7 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1452,7 +1452,7 @@ }, { "cell_type": "code", - "execution_count": 57, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -2227,31 +2227,6 @@ "display_name": "Python 3", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": false, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false } }, "nbformat": 4, diff --git a/11_nlp_dive.ipynb b/11_nlp_dive.ipynb index 08962b7..767b4d3 100644 --- a/11_nlp_dive.ipynb +++ b/11_nlp_dive.ipynb @@ -1281,31 +1281,6 @@ "display_name": "Python 3", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": false, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false } }, "nbformat": 4, diff --git a/12_better_rnn.ipynb b/12_better_rnn.ipynb index 39c859a..88d9e23 100644 --- a/12_better_rnn.ipynb +++ b/12_better_rnn.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -12,7 +12,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": { "hide_input": false }, @@ -62,7 +62,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -133,7 +133,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -154,7 +154,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -399,7 +399,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -432,7 +432,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -462,7 +462,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -471,7 +471,7 @@ "tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, - "execution_count": 22, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -482,7 +482,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -491,7 +491,7 @@ "(tensor([0, 1, 2, 3, 4]), tensor([5, 6, 7, 8, 9]))" ] }, - "execution_count": 23, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -516,7 +516,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -538,7 +538,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -736,7 +736,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -821,7 +821,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -853,7 +853,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -871,7 +871,7 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -888,7 +888,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1122,34 +1122,6 @@ "display_name": "Python 3", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": { - "height": "245px", - "width": "258px" - }, - "number_sections": false, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false } }, "nbformat": 4, diff --git a/14_deep_conv.ipynb b/14_deep_conv.ipynb index 52c4826..87e1134 100644 --- a/14_deep_conv.ipynb +++ b/14_deep_conv.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -28,12 +28,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Since we are so good at recognizing threes from sevens, let's move onto something harder—recognized all 10 digits. That means we'll need to use `MNIST` instead of `MNIST_SAMPLE`:" + "Since we are so good at recognizing threes from sevens, let's move onto something harder—recognizing all 10 digits. That means we'll need to use `MNIST` instead of `MNIST_SAMPLE`:" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -42,7 +42,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -52,7 +52,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -61,7 +61,7 @@ "(#2) [Path('testing'),Path('training')]" ] }, - "execution_count": 4, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -79,7 +79,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -104,7 +104,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -147,7 +147,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -163,7 +163,7 @@ "source": [ "Let's start with a basic CNN as a baseline. We'll use the same as we had in the last chapter, but with one tweak: we'll use more activations.\n", "\n", - "As we discussed, we generally want to double the number of filters each time we have a stride 2 layer. So, one way to increase the number of filters throughout our network is to double the number of activations in the first layer them – then every layer after that will end up twice as big as the previous version as well.\n", + "As we discussed, we generally want to double the number of filters each time we have a stride 2 layer. So, one way to increase the number of filters throughout our network is to double the number of activations in the first layer – then every layer after that will end up twice as big as the previous version as well.\n", "\n", "But there is a subtle problem with this. Consider the kernel which is being applied to each pixel. By default, we use a 3x3 pixel kernel. That means that there are a total of 3×3 = 9 pixels that the kernel is being applied to at each location. Previously, our first layer had four filters output. That meant that there were four values being computed from nine pixels at each location. Think about what happens if we double this output to 8 filters. Then when we apply our kernel we would be using nine pixels to calculate eight numbers. That means that it isn't really learning much at all — the output size is almost the same as the input size. Neural networks will only create useful features if they're forced to do so—that is, that the number of outputs from an operation is smaller than the number of inputs.\n", "\n", @@ -172,7 +172,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -191,12 +191,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As you'll see in a moment, we're going to look inside our models while they're training, in order to try to find ways to make them train better. To do this, we use the `ActivationStats` callback, which records the mean, standard deviation, and histogram of activations of every trainable layer (as we've seen, callbacks are used to add behavior to the training loop; we'll see how they work in <>)." + "As you'll see in a moment, we're going to look inside our models while they're training in order to try to find ways to make them train better. To do this, we use the `ActivationStats` callback, which records the mean, standard deviation, and histogram of activations of every trainable layer (as we've seen, callbacks are used to add behavior to the training loop; we'll see how they work in <>)." ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -212,7 +212,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -225,7 +225,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -270,14 +270,14 @@ "source": [ "This didn't train at all well! Let's find out why.\n", "\n", - "One handy feature of callbacks that you pass to `Learner` is that they are made available automatically, with the same name as the callback class, except in `camel_case`. So our `ActivationStats` callback can be accessed through `activation_stats`. In fact--I'm sure you remember `learn.recorder`... can you guess how that is implemented? That's right, it's a callback called `Recorder`!\n", + "One handy feature of the callbacks passed to `Learner` is that they are made available automatically, with the same name as the callback class, except in `camel_case`. So our `ActivationStats` callback can be accessed through `activation_stats`. In fact--I'm sure you remember `learn.recorder`... can you guess how that is implemented? That's right, it's a callback called `Recorder`!\n", "\n", "`ActivationStats` includes some handy utilities for plotting the activations during training. `plot_layer_stats(idx)` plots the mean and standard deviation of the activations of layer number `idx`, along with the percent of activations near zero. Here's the first layer's plot:" ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -306,7 +306,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -349,7 +349,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -358,7 +358,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -406,7 +406,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -446,7 +446,7 @@ "source": [ "Our initial weights are not well suited to the task we're trying to solve. Therefore, it is dangerous to begin training with a high learning rate: we may very well make the training diverge instantly, as we've seen above. We probably don't want to end training with a high learning rate either, so that we don't skip over a minimum. But we want to train at a high learning rate for the rest of training, because we'll be able to train more quickly. Therefore, we should change the learning rate during training, from low, to high, and then back to low again.\n", "\n", - "Leslie Smith (yes, the same guy that invented the learning rate finder!) developed this idea in his article [Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120) by designing a schedule for learning rate separated in two phases: one were the learning rate grows from the minimum value to the maximum value (*warm-up*) then one where it decreases back to the minimum value (*annealing*). Smith called this combination of approaches *1cycle training*.\n", + "Leslie Smith (yes, the same guy that invented the learning rate finder!) developed this idea in his article [Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120) by designing a schedule for learning rate separated in two phases: one were the learning rate grows from the minimum value to the maximum value (*warm-up*), and then one where it decreases back to the minimum value (*annealing*). Smith called this combination of approaches *1cycle training*.\n", "\n", "1cycle training allows us to use a much higher maximum learning rate than other types of training, which gives two benefits:\n", "\n", @@ -457,14 +457,14 @@ "\n", "Then, once we have found a nice smooth area for our parameters, we then want to find the very best part of that area, which means we have to bring out learning rates down again. This is why 1cycle training has a gradual learning rate warmup, and a gradual learning rate cooldown. Many researchers have found that in practice this approach leads to more accurate models, and trains more quickly. That is why it is the approach that is used by default for `fine_tune` in fastai.\n", "\n", - "Later in this book we'll learn all about *momentum* in SGD. Briefly, momentum is a technique where the optimizer takes a step not only in the direction of the gradients, but also continues in the direction of previous steps. Leslie Smith introduced cyclical momentums in [A disciplined approach to neural network hyper-parameters: Part 1](https://arxiv.org/pdf/1803.09820.pdf). It suggests that the momentum vary in the opposite direction of the learning rate: when we are at high learning rate, we use less momentum, and we use more again in the annealing phase.\n", + "Later in this book we'll learn all about *momentum* in SGD. Briefly, momentum is a technique where the optimizer takes a step not only in the direction of the gradients, but also continues in the direction of previous steps. Leslie Smith introduced cyclical momentums in [A disciplined approach to neural network hyper-parameters: Part 1](https://arxiv.org/pdf/1803.09820.pdf). It suggests that the momentum varies in the opposite direction of the learning rate: when we are at high learning rate, we use less momentum, and we use more again in the annealing phase.\n", "\n", "We can use 1cycle training in fastai by calling `fit_one_cycle`:" ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -477,7 +477,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -527,7 +527,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -564,7 +564,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -595,7 +595,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -656,7 +656,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -729,7 +729,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -749,7 +749,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -797,7 +797,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -830,7 +830,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -1037,31 +1037,6 @@ "display_name": "Python 3", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": false, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false } }, "nbformat": 4, diff --git a/16_arch_details.ipynb b/16_arch_details.ipynb index 6c5492a..5a6dc7d 100644 --- a/16_arch_details.ipynb +++ b/16_arch_details.ipynb @@ -444,31 +444,6 @@ "display_name": "Python 3", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": false, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false } }, "nbformat": 4, diff --git a/17_accel_sgd.ipynb b/17_accel_sgd.ipynb index 4c6ffcd..d0c1e09 100644 --- a/17_accel_sgd.ipynb +++ b/17_accel_sgd.ipynb @@ -953,31 +953,6 @@ "display_name": "Python 3", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": false, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false } }, "nbformat": 4, diff --git a/18_callbacks.ipynb b/18_callbacks.ipynb index 475640a..91281a0 100644 --- a/18_callbacks.ipynb +++ b/18_callbacks.ipynb @@ -392,31 +392,6 @@ "display_name": "Python 3", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": false, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false } }, "nbformat": 4, diff --git a/20_CAM.ipynb b/20_CAM.ipynb index 9d0c824..82957c9 100644 --- a/20_CAM.ipynb +++ b/20_CAM.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": { "hide_input": false }, @@ -55,7 +55,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -140,7 +140,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -157,7 +157,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -174,7 +174,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -191,7 +191,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -207,7 +207,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -223,7 +223,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -232,7 +232,7 @@ "tensor([[2.7374e-09, 1.0000e+00]], device='cuda:5')" ] }, - "execution_count": 8, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -250,7 +250,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -259,7 +259,7 @@ "(#2) [False,True]" ] }, - "execution_count": 9, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -284,7 +284,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -293,7 +293,7 @@ "torch.Size([1, 3, 224, 224])" ] }, - "execution_count": 10, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -304,7 +304,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -313,7 +313,7 @@ "torch.Size([2, 7, 7])" ] }, - "execution_count": 11, + "execution_count": null, "metadata": {}, "output_type": "execute_result" } @@ -334,7 +334,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -369,7 +369,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -385,7 +385,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -406,7 +406,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -440,7 +440,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -462,7 +462,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -484,7 +484,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -494,7 +494,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -526,7 +526,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -540,7 +540,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -557,7 +557,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -637,31 +637,6 @@ "display_name": "Python 3", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": false, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false } }, "nbformat": 4, diff --git a/21_learner.ipynb b/21_learner.ipynb index f352f07..81054d7 100644 --- a/21_learner.ipynb +++ b/21_learner.ipynb @@ -23,7 +23,7 @@ "source": [ "This final chapter (other than the conclusion, and the online chapters) is going to look a bit different. We will have far more code, and far less pros than previous chapters. We will introduce new Python keywords and libraries without discussing them. This chapter is meant to be the start of a significant research project for you. You see, we are going to implement all of the key pieces of the fastai and PyTorch APIs from scratch, building on nothing other than the components that we developed in <>! The key goal here is to end up with our own `Learner` class, and some callbacks--enough to be able to train a model on Imagenette, including examples of each of the key techniques we've studied. On the way to building Learner, we will be creating Module, Parameter, and even our own parallel DataLoader… and much more.\n", "\n", - "The end of chapter questionnaire is particularly important for this chapter. This is where we will be getting you started on the many interesting directions that you could take, using this chapter as your starting out point. What we really saying is: follow through with this chapter on your computer, not on paper, and do lots of experiments, web searches, and whatever else you need to understand what's going on. You've built up the skills and expertise to do this in the rest of this book, so we think you are going to go great!" + "The end of chapter questionnaire is particularly important for this chapter. This is where we will be getting you started on the many interesting directions that you could take, using this chapter as your starting out point. What we are really saying is: follow through with this chapter on your computer, not on paper, and do lots of experiments, web searches, and whatever else you need to understand what's going on. You've built up the skills and expertise to do this in the rest of this book, so we think you are going to go great!" ] }, { diff --git a/images/0_jupyter.png b/images/0_jupyter.png index edba285..a358186 100644 Binary files a/images/0_jupyter.png and b/images/0_jupyter.png differ diff --git a/images/Dropout.png b/images/Dropout.png index fc9c1a8..fcc51ce 100644 Binary files a/images/Dropout.png and b/images/Dropout.png differ diff --git a/images/Dropout1.png b/images/Dropout1.png index dad87e6..0d6d169 100644 Binary files a/images/Dropout1.png and b/images/Dropout1.png differ diff --git a/images/LSTM.png b/images/LSTM.png index 41e2d93..30e7c1a 100644 Binary files a/images/LSTM.png and b/images/LSTM.png differ diff --git a/images/att_00000.png b/images/att_00000.png index 8e12ffe..092a75e 100644 Binary files a/images/att_00000.png and b/images/att_00000.png differ diff --git a/images/att_00001.png b/images/att_00001.png index dbd0732..861f39e 100644 Binary files a/images/att_00001.png and b/images/att_00001.png differ diff --git a/images/att_00002.png b/images/att_00002.png index e04fa2a..40c710b 100644 Binary files a/images/att_00002.png and b/images/att_00002.png differ diff --git a/images/att_00003.png b/images/att_00003.png index 555d26e..e497368 100644 Binary files a/images/att_00003.png and b/images/att_00003.png differ diff --git a/images/att_00004.png b/images/att_00004.png index 6e6f8cc..21577bf 100644 Binary files a/images/att_00004.png and b/images/att_00004.png differ diff --git a/images/att_00005.png b/images/att_00005.png index 721bc8f..3c39e1b 100644 Binary files a/images/att_00005.png and b/images/att_00005.png differ diff --git a/images/att_00006.png b/images/att_00006.png index b68aa7b..5feff29 100644 Binary files a/images/att_00006.png and b/images/att_00006.png differ diff --git a/images/att_00007.png b/images/att_00007.png index e04fa2a..40c710b 100644 Binary files a/images/att_00007.png and b/images/att_00007.png differ diff --git a/images/att_00008.png b/images/att_00008.png index 555d26e..e497368 100644 Binary files a/images/att_00008.png and b/images/att_00008.png differ diff --git a/images/att_00009.png b/images/att_00009.png index 6e6f8cc..21577bf 100644 Binary files a/images/att_00009.png and b/images/att_00009.png differ diff --git a/images/att_00010.png b/images/att_00010.png index 721bc8f..3c39e1b 100644 Binary files a/images/att_00010.png and b/images/att_00010.png differ diff --git a/images/att_00011.png b/images/att_00011.png index b68aa7b..5feff29 100644 Binary files a/images/att_00011.png and b/images/att_00011.png differ diff --git a/images/att_00012.png b/images/att_00012.png index d2a05dd..8abb382 100644 Binary files a/images/att_00012.png and b/images/att_00012.png differ diff --git a/images/att_00013.png b/images/att_00013.png index 62a324d..bb5618d 100644 Binary files a/images/att_00013.png and b/images/att_00013.png differ diff --git a/images/att_00014.png b/images/att_00014.png index 1856310..b922165 100644 Binary files a/images/att_00014.png and b/images/att_00014.png differ diff --git a/images/att_00015.png b/images/att_00015.png index f551cf5..f9a7c1e 100644 Binary files a/images/att_00015.png and b/images/att_00015.png differ diff --git a/images/att_00016.png b/images/att_00016.png index ce94b4e..d46cd16 100644 Binary files a/images/att_00016.png and b/images/att_00016.png differ diff --git a/images/att_00017.png b/images/att_00017.png index 9ff8f3a..94d65a8 100644 Binary files a/images/att_00017.png and b/images/att_00017.png differ diff --git a/images/att_00018.png b/images/att_00018.png index 8e50df2..ddb5eea 100644 Binary files a/images/att_00018.png and b/images/att_00018.png differ diff --git a/images/att_00019.png b/images/att_00019.png index a223387..bcb48ed 100644 Binary files a/images/att_00019.png and b/images/att_00019.png differ diff --git a/images/att_00020.png b/images/att_00020.png index 26e686e..747c201 100644 Binary files a/images/att_00020.png and b/images/att_00020.png differ diff --git a/images/att_00021.png b/images/att_00021.png index 63de323..e00dd02 100644 Binary files a/images/att_00021.png and b/images/att_00021.png differ diff --git a/images/att_00022.png b/images/att_00022.png index 62f0986..4630e88 100644 Binary files a/images/att_00022.png and b/images/att_00022.png differ diff --git a/images/att_00023.png b/images/att_00023.png index 99e8be6..ca3f514 100644 Binary files a/images/att_00023.png and b/images/att_00023.png differ diff --git a/images/att_00024.png b/images/att_00024.png index 9ec3434..a9c1f7f 100644 Binary files a/images/att_00024.png and b/images/att_00024.png differ diff --git a/images/att_00025.png b/images/att_00025.png index 15c4682..64465f6 100644 Binary files a/images/att_00025.png and b/images/att_00025.png differ diff --git a/images/att_00026.png b/images/att_00026.png index b717939..e76ecf4 100644 Binary files a/images/att_00026.png and b/images/att_00026.png differ diff --git a/images/att_00027.png b/images/att_00027.png index 8cebce1..a16ca0c 100644 Binary files a/images/att_00027.png and b/images/att_00027.png differ diff --git a/images/att_00028.png b/images/att_00028.png index bcbc411..8a34a4c 100644 Binary files a/images/att_00028.png and b/images/att_00028.png differ diff --git a/images/att_00029.png b/images/att_00029.png index 9c1b12d..20e5dbe 100644 Binary files a/images/att_00029.png and b/images/att_00029.png differ diff --git a/images/att_00030.png b/images/att_00030.png index 00c58f5..84a8f8d 100644 Binary files a/images/att_00030.png and b/images/att_00030.png differ diff --git a/images/att_00031.png b/images/att_00031.png index edc3e52..2f57fdf 100644 Binary files a/images/att_00031.png and b/images/att_00031.png differ diff --git a/images/att_00032.png b/images/att_00032.png index 1e117ca..40ac666 100644 Binary files a/images/att_00032.png and b/images/att_00032.png differ diff --git a/images/att_00033.png b/images/att_00033.png index 220c369..4ceb02c 100644 Binary files a/images/att_00033.png and b/images/att_00033.png differ diff --git a/images/att_00034.png b/images/att_00034.png index e02bbc5..2cda351 100644 Binary files a/images/att_00034.png and b/images/att_00034.png differ diff --git a/images/att_00035.png b/images/att_00035.png index e252d32..8a71cd5 100644 Binary files a/images/att_00035.png and b/images/att_00035.png differ diff --git a/images/att_00036.png b/images/att_00036.png index b2eb07c..e02df25 100644 Binary files a/images/att_00036.png and b/images/att_00036.png differ diff --git a/images/att_00037.png b/images/att_00037.png index e6090f5..5ea5428 100644 Binary files a/images/att_00037.png and b/images/att_00037.png differ diff --git a/images/att_00038.png b/images/att_00038.png index ec6a0cd..10702f7 100644 Binary files a/images/att_00038.png and b/images/att_00038.png differ diff --git a/images/att_00039.png b/images/att_00039.png index 011fca3..d69a3f0 100644 Binary files a/images/att_00039.png and b/images/att_00039.png differ diff --git a/images/att_00040.png b/images/att_00040.png index 54621fe..aca0292 100644 Binary files a/images/att_00040.png and b/images/att_00040.png differ diff --git a/images/att_00041.png b/images/att_00041.png index deac415..5b32634 100644 Binary files a/images/att_00041.png and b/images/att_00041.png differ diff --git a/images/att_00042.png b/images/att_00042.png index 321add7..fe42139 100644 Binary files a/images/att_00042.png and b/images/att_00042.png differ diff --git a/images/att_00043.png b/images/att_00043.png index 5b1dc67..ce34eaa 100644 Binary files a/images/att_00043.png and b/images/att_00043.png differ diff --git a/images/att_00044.png b/images/att_00044.png index 8a6f579..f8160d8 100644 Binary files a/images/att_00044.png and b/images/att_00044.png differ diff --git a/images/att_00045.png b/images/att_00045.png index 8e283ed..1907f9d 100644 Binary files a/images/att_00045.png and b/images/att_00045.png differ diff --git a/images/att_00046.png b/images/att_00046.png index 42dc575..b19c31a 100644 Binary files a/images/att_00046.png and b/images/att_00046.png differ diff --git a/images/att_00047.png b/images/att_00047.png index 2a08960..975d096 100644 Binary files a/images/att_00047.png and b/images/att_00047.png differ diff --git a/images/att_00048.png b/images/att_00048.png index f88319c..f6c01ed 100644 Binary files a/images/att_00048.png and b/images/att_00048.png differ diff --git a/images/att_00049.png b/images/att_00049.png index 229828f..c41a47a 100644 Binary files a/images/att_00049.png and b/images/att_00049.png differ diff --git a/images/att_00050.png b/images/att_00050.png index b0b907d..5c86f9d 100644 Binary files a/images/att_00050.png and b/images/att_00050.png differ diff --git a/images/att_00051.png b/images/att_00051.png index 547eaec..83f9468 100644 Binary files a/images/att_00051.png and b/images/att_00051.png differ diff --git a/images/att_00052.png b/images/att_00052.png index 2b0f38f..b8991ab 100644 Binary files a/images/att_00052.png and b/images/att_00052.png differ diff --git a/images/att_00053.png b/images/att_00053.png index af53b15..28c6128 100644 Binary files a/images/att_00053.png and b/images/att_00053.png differ diff --git a/images/att_00054.png b/images/att_00054.png index 2729935..03d8c42 100644 Binary files a/images/att_00054.png and b/images/att_00054.png differ diff --git a/images/att_00055.png b/images/att_00055.png index 2745d59..d53c42a 100644 Binary files a/images/att_00055.png and b/images/att_00055.png differ diff --git a/images/att_00056.png b/images/att_00056.png index d944304..a8bc11b 100644 Binary files a/images/att_00056.png and b/images/att_00056.png differ diff --git a/images/att_00057.png b/images/att_00057.png index f6d1e6d..17d572d 100644 Binary files a/images/att_00057.png and b/images/att_00057.png differ diff --git a/images/att_00058.png b/images/att_00058.png index d9ac5a1..86dd849 100644 Binary files a/images/att_00058.png and b/images/att_00058.png differ diff --git a/images/att_00059.png b/images/att_00059.png index 87e8ae6..9598a90 100644 Binary files a/images/att_00059.png and b/images/att_00059.png differ diff --git a/images/att_00060.png b/images/att_00060.png index 5e708b4..f25fd81 100644 Binary files a/images/att_00060.png and b/images/att_00060.png differ diff --git a/images/att_00061.png b/images/att_00061.png index 1912167..1af8595 100644 Binary files a/images/att_00061.png and b/images/att_00061.png differ diff --git a/images/att_00062.png b/images/att_00062.png index 9d6e758..25ad1b8 100644 Binary files a/images/att_00062.png and b/images/att_00062.png differ diff --git a/images/att_00063.png b/images/att_00063.png index 737b25e..4f61e2c 100644 Binary files a/images/att_00063.png and b/images/att_00063.png differ diff --git a/images/att_00064.png b/images/att_00064.png index 202669d..50c5ead 100644 Binary files a/images/att_00064.png and b/images/att_00064.png differ diff --git a/images/att_00065.png b/images/att_00065.png index f2911db..997170e 100644 Binary files a/images/att_00065.png and b/images/att_00065.png differ diff --git a/images/att_00066.png b/images/att_00066.png index 0217de2..2fc99e2 100644 Binary files a/images/att_00066.png and b/images/att_00066.png differ diff --git a/images/att_00067.png b/images/att_00067.png index 613ed9c..841f132 100644 Binary files a/images/att_00067.png and b/images/att_00067.png differ diff --git a/images/att_00068.png b/images/att_00068.png index c995e85..ffe29d4 100644 Binary files a/images/att_00068.png and b/images/att_00068.png differ diff --git a/images/att_00069.png b/images/att_00069.png index ae4ead1..8db76f5 100644 Binary files a/images/att_00069.png and b/images/att_00069.png differ diff --git a/images/att_00070.png b/images/att_00070.png index edbdd02..e614898 100644 Binary files a/images/att_00070.png and b/images/att_00070.png differ diff --git a/images/att_00071.png b/images/att_00071.png index 856fc5c..1483d74 100644 Binary files a/images/att_00071.png and b/images/att_00071.png differ diff --git a/images/chapter1_add.png b/images/chapter1_add.png index 2647935..aa92da5 100644 Binary files a/images/chapter1_add.png and b/images/chapter1_add.png differ diff --git a/images/chapter1_busy.png b/images/chapter1_busy.png index d4198d7..4e0407d 100644 Binary files a/images/chapter1_busy.png and b/images/chapter1_busy.png differ diff --git a/images/chapter1_markdown.png b/images/chapter1_markdown.png index c38852f..55d97b3 100644 Binary files a/images/chapter1_markdown.png and b/images/chapter1_markdown.png differ diff --git a/images/chapter1_new_notebook.png b/images/chapter1_new_notebook.png index ca52feb..61c4942 100644 Binary files a/images/chapter1_new_notebook.png and b/images/chapter1_new_notebook.png differ diff --git a/images/chapter1_run.png b/images/chapter1_run.png index 7a02242..0d69367 100644 Binary files a/images/chapter1_run.png and b/images/chapter1_run.png differ diff --git a/images/chapter1_save.png b/images/chapter1_save.png index d232d49..1321fb6 100644 Binary files a/images/chapter1_save.png and b/images/chapter1_save.png differ diff --git a/images/chapter1_terminal.png b/images/chapter1_terminal.png index b0265d3..3b1c7a4 100644 Binary files a/images/chapter1_terminal.png and b/images/chapter1_terminal.png differ diff --git a/images/chapter2_bouncing.PNG b/images/chapter2_bouncing.PNG index 99124a9..982896a 100644 Binary files a/images/chapter2_bouncing.PNG and b/images/chapter2_bouncing.PNG differ diff --git a/images/chapter2_derivative.PNG b/images/chapter2_derivative.PNG index df6a1bc..4bf0238 100644 Binary files a/images/chapter2_derivative.PNG and b/images/chapter2_derivative.PNG differ diff --git a/images/chapter2_layer1and2.PNG b/images/chapter2_layer1and2.PNG index 0709ed8..a4fcc2c 100644 Binary files a/images/chapter2_layer1and2.PNG and b/images/chapter2_layer1and2.PNG differ diff --git a/images/chapter2_layer3.PNG b/images/chapter2_layer3.PNG index e8ba124..6d5c534 100644 Binary files a/images/chapter2_layer3.PNG and b/images/chapter2_layer3.PNG differ diff --git a/images/chapter2_layer4and5.PNG b/images/chapter2_layer4and5.PNG index 97b5129..3b49d47 100644 Binary files a/images/chapter2_layer4and5.PNG and b/images/chapter2_layer4and5.PNG differ diff --git a/images/chapter2_sgd.PNG b/images/chapter2_sgd.PNG index 8bb27bb..28e59cc 100644 Binary files a/images/chapter2_sgd.PNG and b/images/chapter2_sgd.PNG differ diff --git a/images/chapter4_1cycle_schedule.png b/images/chapter4_1cycle_schedule.png index eac204f..576bd78 100644 Binary files a/images/chapter4_1cycle_schedule.png and b/images/chapter4_1cycle_schedule.png differ diff --git a/images/chapter4_overfit.png b/images/chapter4_overfit.png index 68dec22..eb4cc32 100644 Binary files a/images/chapter4_overfit.png and b/images/chapter4_overfit.png differ diff --git a/images/chapter7_neuron.png b/images/chapter7_neuron.png index e5a8e6c..778c767 100644 Binary files a/images/chapter7_neuron.png and b/images/chapter7_neuron.png differ diff --git a/images/chapter9_bottleneck.png b/images/chapter9_bottleneck.png index 7afbbea..dd868e7 100644 Binary files a/images/chapter9_bottleneck.png and b/images/chapter9_bottleneck.png differ diff --git a/images/chapter9_cat_conv.png b/images/chapter9_cat_conv.png index 5ef6991..da9d963 100644 Binary files a/images/chapter9_cat_conv.png and b/images/chapter9_cat_conv.png differ diff --git a/images/chapter9_conv_basic.png b/images/chapter9_conv_basic.png index dd39b7c..54300d8 100644 Binary files a/images/chapter9_conv_basic.png and b/images/chapter9_conv_basic.png differ diff --git a/images/chapter9_conv_pad.png b/images/chapter9_conv_pad.png index b13301f..62e3503 100644 Binary files a/images/chapter9_conv_pad.png and b/images/chapter9_conv_pad.png differ diff --git a/images/chapter9_conv_rgb.png b/images/chapter9_conv_rgb.png index e92353b..efd6330 100644 Binary files a/images/chapter9_conv_rgb.png and b/images/chapter9_conv_rgb.png differ diff --git a/images/chapter9_conv_stride.png b/images/chapter9_conv_stride.png index d7eaf3f..88f6261 100644 Binary files a/images/chapter9_conv_stride.png and b/images/chapter9_conv_stride.png differ diff --git a/images/chapter9_loss_landscape.png b/images/chapter9_loss_landscape.png index 0c40bc0..4ab0603 100644 Binary files a/images/chapter9_loss_landscape.png and b/images/chapter9_loss_landscape.png differ diff --git a/images/chapter9_skip_connection.png b/images/chapter9_skip_connection.png index 60a29b3..74506ee 100644 Binary files a/images/chapter9_skip_connection.png and b/images/chapter9_skip_connection.png differ diff --git a/images/colorful_summ.png b/images/colorful_summ.png index cf108ee..81a3a79 100644 Binary files a/images/colorful_summ.png and b/images/colorful_summ.png differ diff --git a/images/cover.png b/images/cover.png index 15912ad..933b56d 100644 Binary files a/images/cover.png and b/images/cover.png differ diff --git a/images/driver_phone.png b/images/driver_phone.png index 20a64a2..989ad9d 100644 Binary files a/images/driver_phone.png and b/images/driver_phone.png differ diff --git a/images/driver_phone2.png b/images/driver_phone2.png index a024d1d..fa2f37f 100644 Binary files a/images/driver_phone2.png and b/images/driver_phone2.png differ diff --git a/images/drivetrain-approach.png b/images/drivetrain-approach.png index 814ec06..c2480f5 100644 Binary files a/images/drivetrain-approach.png and b/images/drivetrain-approach.png differ diff --git a/images/ethics/image1.png b/images/ethics/image1.png index e0cae59..f0e866b 100644 Binary files a/images/ethics/image1.png and b/images/ethics/image1.png differ diff --git a/images/ethics/image10.png b/images/ethics/image10.png index 8755ab2..207cf3f 100644 Binary files a/images/ethics/image10.png and b/images/ethics/image10.png differ diff --git a/images/ethics/image11.png b/images/ethics/image11.png index d130c77..017199d 100644 Binary files a/images/ethics/image11.png and b/images/ethics/image11.png differ diff --git a/images/ethics/image12.png b/images/ethics/image12.png index 27684c9..fed0575 100644 Binary files a/images/ethics/image12.png and b/images/ethics/image12.png differ diff --git a/images/ethics/image13.png b/images/ethics/image13.png index 064ad92..03b0e8e 100644 Binary files a/images/ethics/image13.png and b/images/ethics/image13.png differ diff --git a/images/ethics/image14.png b/images/ethics/image14.png index 68b8a4f..f93d6c1 100644 Binary files a/images/ethics/image14.png and b/images/ethics/image14.png differ diff --git a/images/ethics/image16.png b/images/ethics/image16.png index 3671f28..48978c6 100644 Binary files a/images/ethics/image16.png and b/images/ethics/image16.png differ diff --git a/images/ethics/image17.png b/images/ethics/image17.png index f9095ba..27545c2 100644 Binary files a/images/ethics/image17.png and b/images/ethics/image17.png differ diff --git a/images/ethics/image2.png b/images/ethics/image2.png index a9be469..a54e2e9 100644 Binary files a/images/ethics/image2.png and b/images/ethics/image2.png differ diff --git a/images/ethics/image4.png b/images/ethics/image4.png index 7fa7b94..7d311ac 100644 Binary files a/images/ethics/image4.png and b/images/ethics/image4.png differ diff --git a/images/ethics/image5.png b/images/ethics/image5.png index fe5926e..909f141 100644 Binary files a/images/ethics/image5.png and b/images/ethics/image5.png differ diff --git a/images/ethics/image6.png b/images/ethics/image6.png index 0263728..7a75e40 100644 Binary files a/images/ethics/image6.png and b/images/ethics/image6.png differ diff --git a/images/ethics/image7.png b/images/ethics/image7.png index 2e8cb03..37b4f10 100644 Binary files a/images/ethics/image7.png and b/images/ethics/image7.png differ diff --git a/images/ethics/image8.png b/images/ethics/image8.png index 83877e1..0fe0319 100644 Binary files a/images/ethics/image8.png and b/images/ethics/image8.png differ diff --git a/images/fast_template/image1.png b/images/fast_template/image1.png index 6e3bdac..9ff57b9 100644 Binary files a/images/fast_template/image1.png and b/images/fast_template/image1.png differ diff --git a/images/fast_template/image10.png b/images/fast_template/image10.png index 1d08559..e06e8d5 100644 Binary files a/images/fast_template/image10.png and b/images/fast_template/image10.png differ diff --git a/images/fast_template/image11.png b/images/fast_template/image11.png index 9f2ddff..725d460 100644 Binary files a/images/fast_template/image11.png and b/images/fast_template/image11.png differ diff --git a/images/fast_template/image12.png b/images/fast_template/image12.png index 37960c7..2d01a59 100644 Binary files a/images/fast_template/image12.png and b/images/fast_template/image12.png differ diff --git a/images/fast_template/image13.png b/images/fast_template/image13.png index 266110e..f6b4815 100644 Binary files a/images/fast_template/image13.png and b/images/fast_template/image13.png differ diff --git a/images/fast_template/image14.png b/images/fast_template/image14.png index 4a56112..27bddfc 100644 Binary files a/images/fast_template/image14.png and b/images/fast_template/image14.png differ diff --git a/images/fast_template/image15.png b/images/fast_template/image15.png index ddcb0fe..a6efa7f 100644 Binary files a/images/fast_template/image15.png and b/images/fast_template/image15.png differ diff --git a/images/fast_template/image16.png b/images/fast_template/image16.png index 10b3192..3e662d1 100644 Binary files a/images/fast_template/image16.png and b/images/fast_template/image16.png differ diff --git a/images/fast_template/image2.png b/images/fast_template/image2.png index d221317..7e5453e 100644 Binary files a/images/fast_template/image2.png and b/images/fast_template/image2.png differ diff --git a/images/fast_template/image3.png b/images/fast_template/image3.png index 7ecf264..5641140 100644 Binary files a/images/fast_template/image3.png and b/images/fast_template/image3.png differ diff --git a/images/fast_template/image4.png b/images/fast_template/image4.png index d25c0a6..023a78c 100644 Binary files a/images/fast_template/image4.png and b/images/fast_template/image4.png differ diff --git a/images/fast_template/image5.png b/images/fast_template/image5.png index 28a41f0..f4de339 100644 Binary files a/images/fast_template/image5.png and b/images/fast_template/image5.png differ diff --git a/images/fast_template/image6.png b/images/fast_template/image6.png index 52aef3a..7350800 100644 Binary files a/images/fast_template/image6.png and b/images/fast_template/image6.png differ diff --git a/images/fast_template/image7.png b/images/fast_template/image7.png index 90cb8fc..d0bd236 100644 Binary files a/images/fast_template/image7.png and b/images/fast_template/image7.png differ diff --git a/images/fast_template/image8.png b/images/fast_template/image8.png index 86e3a9f..b3b661d 100644 Binary files a/images/fast_template/image8.png and b/images/fast_template/image8.png differ diff --git a/images/fast_template/image9.png b/images/fast_template/image9.png index 267e97e..87196a7 100644 Binary files a/images/fast_template/image9.png and b/images/fast_template/image9.png differ diff --git a/images/gitblog/commit.png b/images/gitblog/commit.png index 4b8312b..25d757d 100644 Binary files a/images/gitblog/commit.png and b/images/gitblog/commit.png differ diff --git a/images/gitblog/image1.png b/images/gitblog/image1.png index ddcb0fe..a6efa7f 100644 Binary files a/images/gitblog/image1.png and b/images/gitblog/image1.png differ diff --git a/images/gitblog/image2.png b/images/gitblog/image2.png index 10b3192..3e662d1 100644 Binary files a/images/gitblog/image2.png and b/images/gitblog/image2.png differ diff --git a/images/gitblog/image3.png b/images/gitblog/image3.png index 077736d..50495ca 100644 Binary files a/images/gitblog/image3.png and b/images/gitblog/image3.png differ diff --git a/images/gitblog/image4.png b/images/gitblog/image4.png index f3c5798..143b76a 100644 Binary files a/images/gitblog/image4.png and b/images/gitblog/image4.png differ diff --git a/images/gitblog/image5.png b/images/gitblog/image5.png index 8972dbc..eea557d 100644 Binary files a/images/gitblog/image5.png and b/images/gitblog/image5.png differ diff --git a/images/pratchett.png b/images/pratchett.png index c33f6a4..7f3bb81 100644 Binary files a/images/pratchett.png and b/images/pratchett.png differ diff --git a/images/sklearn_features.png b/images/sklearn_features.png index 32466bf..48de773 100644 Binary files a/images/sklearn_features.png and b/images/sklearn_features.png differ diff --git a/images/tarsier.png b/images/tarsier.png index 8c7a5f6..8ca1bbd 100644 Binary files a/images/tarsier.png and b/images/tarsier.png differ diff --git a/images/timeseries1.png b/images/timeseries1.png index 9d3b987..9132290 100644 Binary files a/images/timeseries1.png and b/images/timeseries1.png differ diff --git a/images/timeseries2.png b/images/timeseries2.png index 5af3a51..b98c65e 100644 Binary files a/images/timeseries2.png and b/images/timeseries2.png differ diff --git a/images/timeseries3.png b/images/timeseries3.png index 6816358..aad1de3 100644 Binary files a/images/timeseries3.png and b/images/timeseries3.png differ