mirror of
https://github.com/fastai/fastbook.git
synced 2025-04-04 01:40:44 +00:00
4882 lines
277 KiB
Plaintext
4882 lines
277 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#hide\n",
|
||
"from fastai2.vision.all import *\n",
|
||
"from utils import *\n",
|
||
"\n",
|
||
"matplotlib.rc('image', cmap='Greys')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "raw",
|
||
"metadata": {},
|
||
"source": [
|
||
"[[chapter_mnist_basics]]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Under the hood: training a digit classifier"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Pixels: the foundations of computer vision"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now that we’ve seen what it looks like to actually train a variety of models, let’s now dig under the hood and see exactly what is going on. We’ll start with computer vision, and will use that to introduce many of the key concepts of deep learning. In future chapters we’ll do deep dives into other applications as well, and we’ll see how to use these insights to both improve our model’s accuracy, speed up its training, and turn it into a real working web application.\n",
|
||
"\n",
|
||
"In order to understand what happens in a computer vision model, we first have to understand how computers handle images. We'll use one of the most famous datasets in computer vision, [MNIST](https://en.wikipedia.org/wiki/MNIST_database), for our experiments. MNIST contains hand-written digits, collected by the National Institute of Standards and Technology, and collated into a machine learning dataset by Yann Lecun and his colleagues. Lecun used MNIST in 1998 to demonstrate [Lenet 5](http://yann.lecun.com/exdb/lenet/), the first computer system to demonstrate practically useful recognition of hand-written digit sequences. This was one of the most important breakthroughs in the history of AI."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Sidebar: Tenacity and deep learning"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The story of deep learning is one of tenacity and grit from a handful of dedicated researchers. After early hopes (and hype!) neural networks went out of favor in the 1990's and 2000's, and just a handful of researchers kept trying to make them work well. Three of them, Yann Lecun, Geoff Hinton, and Yoshua Bengio were awarded the highest honor in computer science, the Turing Award (generally considered the \"Nobel Prize of computer science\") after triumphing despite the deep skepticism and disinterest of the wider machine learning and statistics community.\n",
|
||
"\n",
|
||
"<img src=\"images/turing_300.jpg\" id=\"dl_fathers\" caption=\"Left to right, Yann Lecun, Geoffrey Hinton and Yoshua Bengio\" alt=\"Picture of Yann Lecun, Geoffrey Hinton and Yoshua Bengio\">\n",
|
||
"\n",
|
||
"Geoff Hinton has told of how even academic papers showing dramatically better results than anything previously published would be rejected from top journals and conferences, just because they used a neural network. Yann Lecun's work on convolutional neural networks, which we will study in the next section, showed that these models could read hand-written text--something that had never been achieved before. However his breakthrough was ignored by most researchers, even as it was used commercially to read 10% of the checks in the US!\n",
|
||
"\n",
|
||
"In addition to these three Turing Award winners, there are many other researchers who have battled to get us to where we are today. For instance, Jurgen Schmidhuber (who many believe should have shared in the Turing Award) pioneered many important ideas, including working on the *LSTM* architecture with his student Sepp Hochreiter (widely used for speech recognition and other text modeling tasks, and used in the IMDb example in <<chapter_intro>>). Perhaps most important of all, Werbos invented back-propagation for neural networks, the technique shown in this chapter and used universally for training neural networks. His development was almost entirely ignored for decades, but today it is the most important foundation of modern AI.\n",
|
||
"\n",
|
||
"There is a lesson here for all of us! On your deep learning journey you will face many obstacles, both technical, and (even more difficult) people around you who don't believe you'll be successful. There's one *guaranteed* way to fail, and that's to stop trying. We've seen that the only consistent trait amongst every fast.ai student that's gone on to be a world-class practitioner is that they are all very tenacious."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## End sidebar"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"For this initial tutorial we are just going to try to create a model that can recognise \"3\"s and \"7\"s. So let's download a sample of MNIST which contains images of just these digits:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"path = untar_data(URLs.MNIST_SAMPLE)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#hide\n",
|
||
"Path.BASE_PATH = path"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can see what's in this directory by using `ls()`, a method added by fastai. This method returns an object of a special fastai class called `L`, which has all the same functionality of Python's builtin `list`, plus a lot more. One of its handy features is that, when printed, it displays the count of items, before listing the items themselves (if there's more than 10 items, it just shows the first few)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(#9) [Path('cleaned.csv'),Path('item_list.txt'),Path('trained_model.pkl'),Path('models'),Path('valid'),Path('labels.csv'),Path('export.pkl'),Path('history.csv'),Path('train')]"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"path.ls()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The MNIST dataset shows is a very common layout for machine learning datasets: separate folders for the *training set*, which is used to train a model, and the *validation set* (and/or *test set*), which is used to evaluate the model (we'll be talking a lot of these concepts very soon!) Let's see what's inside the training set:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(#2) [Path('train/7'),Path('train/3')]"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"(path/'train').ls()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"There's a folder of \"3\"s, and a folder of \"7\"s. In machine learning parlance, we say that \"3\" and \"7\" are the *labels* in this dataset. Let's take a look in one of these folders (using `sorted` to ensure we all get the same order of files):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(#6131) [Path('train/3/10.png'),Path('train/3/10000.png'),Path('train/3/10011.png'),Path('train/3/10031.png'),Path('train/3/10034.png'),Path('train/3/10042.png'),Path('train/3/10052.png'),Path('train/3/1007.png'),Path('train/3/10074.png'),Path('train/3/10091.png')...]"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"threes = (path/'train'/'3').ls().sorted()\n",
|
||
"sevens = (path/'train'/'7').ls().sorted()\n",
|
||
"threes"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"As we might expect, it's full of image files. Let’s take a look at one now. Here’s an image of a handwritten number ‘3’, taken from the famous MNIST dataset of handwritten numbers:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAA9ElEQVR4nM3Or0sDcRjH8c/pgrfBVBjCgibThiKIyTWbWF1bORhGwxARxH/AbtW0JoIGwzXRYhJhtuFY2q1ocLgbe3sGReTuuWbwkx6+r+/zQ/pncX6q+YOldSe6nG3dn8U/rTQ70L8FCGJUewvxl7NTmezNb8xIkvKugr1HSeMP6SrWOVkoTEuSyh0Gm2n3hQyObMnXnxkempRrvgD+gokzwxFAr7U7YXHZ8x4A/Dl7rbu6D2yl3etcw/F3nZgfRVI7rXM7hMUUqzzBec427x26rkmlkzEEa4nnRqnSOH2F0UUx0ePzlbuqMXAHgN6GY9if5xP8dmtHFfwjuQAAAABJRU5ErkJggg==\n",
|
||
"text/plain": [
|
||
"<PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x7F24CDF87F50>"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"im3_path = threes[1]\n",
|
||
"im3 = Image.open(im3_path)\n",
|
||
"im3"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Here we are using the `Image` class from the *Python Imaging Library* (PIL), which is the most widely used Python package for opening, manipulating, and viewing images. Jupyter knows about PIL images, so it displays the image for us automatically.\n",
|
||
"\n",
|
||
"In a computer, everything is represented as a number. To view the numbers that make up this image, we have to convert it to a *NumPy array* or a *PyTorch tensor*. For instance, here's a few numbers from the top-left of the image, converted to a numpy array:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([[ 0, 0, 0, 0, 0, 0],\n",
|
||
" [ 0, 0, 0, 0, 0, 29],\n",
|
||
" [ 0, 0, 0, 48, 166, 224],\n",
|
||
" [ 0, 93, 244, 249, 253, 187],\n",
|
||
" [ 0, 107, 253, 253, 230, 48],\n",
|
||
" [ 0, 3, 20, 20, 15, 0]], dtype=uint8)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"array(im3)[4:10,4:10]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"...and the same thing as a PyTorch tensor:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([[ 0, 0, 0, 0, 0, 0],\n",
|
||
" [ 0, 0, 0, 0, 0, 29],\n",
|
||
" [ 0, 0, 0, 48, 166, 224],\n",
|
||
" [ 0, 93, 244, 249, 253, 187],\n",
|
||
" [ 0, 107, 253, 253, 230, 48],\n",
|
||
" [ 0, 3, 20, 20, 15, 0]], dtype=torch.uint8)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tensor(im3)[4:10,4:10]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can slice the array to pick just a part with the top of the digit in it, and then use a Pandas DataFrame to color-code the values using a gradient, which shows us clearly how the image is created from the pixel values:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<style type=\"text/css\" >\n",
|
||
" #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row0_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #efefef;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #7c7c7c;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #4a4a4a;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #606060;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #4d4d4d;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #7c7c7c;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #bbbbbb;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row1_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #e4e4e4;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #6b6b6b;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #171717;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #4b4b4b;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #010101;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #171717;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row2_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #272727;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #0a0a0a;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #050505;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #333333;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #e6e6e6;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #fafafa;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #fbfbfb;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #fdfdfd;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #fafafa;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #4b4b4b;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #171717;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row3_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #1b1b1b;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #e0e0e0;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #4e4e4e;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #767676;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row4_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #fcfcfc;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #f6f6f6;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #f6f6f6;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #f8f8f8;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #e8e8e8;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #222222;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #090909;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #d0d0d0;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row5_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #060606;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #090909;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #979797;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row6_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #f8f8f8;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #b6b6b6;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #252525;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #010101;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #060606;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #999999;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row7_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #f9f9f9;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #6b6b6b;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #101010;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #010101;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #020202;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #010101;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #545454;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #f1f1f1;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row8_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #f7f7f7;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #060606;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #030303;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #010101;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #020202;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #010101;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #181818;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #303030;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #a9a9a9;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #fefefe;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row9_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col0 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col1 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col2 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col3 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col4 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col5 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col6 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col7 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #e8e8e8;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col8 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #bababa;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col9 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #bababa;\n",
|
||
" color: #000000;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col10 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #393939;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col11 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col12 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col13 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col14 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col15 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col16 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #000000;\n",
|
||
" color: #f1f1f1;\n",
|
||
" } #T_507bccae_5414_11ea_8833_a372988ddbd9row10_col17 {\n",
|
||
" font-size: 6pt;\n",
|
||
" background-color: #ffffff;\n",
|
||
" color: #000000;\n",
|
||
" }</style><table id=\"T_507bccae_5414_11ea_8833_a372988ddbd9\" ><thead> <tr> <th class=\"blank level0\" ></th> <th class=\"col_heading level0 col0\" >0</th> <th class=\"col_heading level0 col1\" >1</th> <th class=\"col_heading level0 col2\" >2</th> <th class=\"col_heading level0 col3\" >3</th> <th class=\"col_heading level0 col4\" >4</th> <th class=\"col_heading level0 col5\" >5</th> <th class=\"col_heading level0 col6\" >6</th> <th class=\"col_heading level0 col7\" >7</th> <th class=\"col_heading level0 col8\" >8</th> <th class=\"col_heading level0 col9\" >9</th> <th class=\"col_heading level0 col10\" >10</th> <th class=\"col_heading level0 col11\" >11</th> <th class=\"col_heading level0 col12\" >12</th> <th class=\"col_heading level0 col13\" >13</th> <th class=\"col_heading level0 col14\" >14</th> <th class=\"col_heading level0 col15\" >15</th> <th class=\"col_heading level0 col16\" >16</th> <th class=\"col_heading level0 col17\" >17</th> </tr></thead><tbody>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col0\" class=\"data row0 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col1\" class=\"data row0 col1\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col2\" class=\"data row0 col2\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col3\" class=\"data row0 col3\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col4\" class=\"data row0 col4\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col5\" class=\"data row0 col5\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col6\" class=\"data row0 col6\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col7\" class=\"data row0 col7\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col8\" class=\"data row0 col8\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col9\" class=\"data row0 col9\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col10\" class=\"data row0 col10\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col11\" class=\"data row0 col11\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col12\" class=\"data row0 col12\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col13\" class=\"data row0 col13\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col14\" class=\"data row0 col14\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col15\" class=\"data row0 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col16\" class=\"data row0 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row0_col17\" class=\"data row0 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col0\" class=\"data row1 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col1\" class=\"data row1 col1\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col2\" class=\"data row1 col2\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col3\" class=\"data row1 col3\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col4\" class=\"data row1 col4\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col5\" class=\"data row1 col5\" >29</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col6\" class=\"data row1 col6\" >150</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col7\" class=\"data row1 col7\" >195</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col8\" class=\"data row1 col8\" >254</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col9\" class=\"data row1 col9\" >255</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col10\" class=\"data row1 col10\" >254</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col11\" class=\"data row1 col11\" >176</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col12\" class=\"data row1 col12\" >193</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col13\" class=\"data row1 col13\" >150</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col14\" class=\"data row1 col14\" >96</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col15\" class=\"data row1 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col16\" class=\"data row1 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row1_col17\" class=\"data row1 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col0\" class=\"data row2 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col1\" class=\"data row2 col1\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col2\" class=\"data row2 col2\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col3\" class=\"data row2 col3\" >48</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col4\" class=\"data row2 col4\" >166</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col5\" class=\"data row2 col5\" >224</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col6\" class=\"data row2 col6\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col7\" class=\"data row2 col7\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col8\" class=\"data row2 col8\" >234</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col9\" class=\"data row2 col9\" >196</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col10\" class=\"data row2 col10\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col11\" class=\"data row2 col11\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col12\" class=\"data row2 col12\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col13\" class=\"data row2 col13\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col14\" class=\"data row2 col14\" >233</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col15\" class=\"data row2 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col16\" class=\"data row2 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row2_col17\" class=\"data row2 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col0\" class=\"data row3 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col1\" class=\"data row3 col1\" >93</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col2\" class=\"data row3 col2\" >244</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col3\" class=\"data row3 col3\" >249</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col4\" class=\"data row3 col4\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col5\" class=\"data row3 col5\" >187</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col6\" class=\"data row3 col6\" >46</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col7\" class=\"data row3 col7\" >10</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col8\" class=\"data row3 col8\" >8</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col9\" class=\"data row3 col9\" >4</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col10\" class=\"data row3 col10\" >10</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col11\" class=\"data row3 col11\" >194</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col12\" class=\"data row3 col12\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col13\" class=\"data row3 col13\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col14\" class=\"data row3 col14\" >233</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col15\" class=\"data row3 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col16\" class=\"data row3 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row3_col17\" class=\"data row3 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col0\" class=\"data row4 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col1\" class=\"data row4 col1\" >107</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col2\" class=\"data row4 col2\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col3\" class=\"data row4 col3\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col4\" class=\"data row4 col4\" >230</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col5\" class=\"data row4 col5\" >48</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col6\" class=\"data row4 col6\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col7\" class=\"data row4 col7\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col8\" class=\"data row4 col8\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col9\" class=\"data row4 col9\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col10\" class=\"data row4 col10\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col11\" class=\"data row4 col11\" >192</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col12\" class=\"data row4 col12\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col13\" class=\"data row4 col13\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col14\" class=\"data row4 col14\" >156</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col15\" class=\"data row4 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col16\" class=\"data row4 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row4_col17\" class=\"data row4 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col0\" class=\"data row5 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col1\" class=\"data row5 col1\" >3</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col2\" class=\"data row5 col2\" >20</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col3\" class=\"data row5 col3\" >20</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col4\" class=\"data row5 col4\" >15</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col5\" class=\"data row5 col5\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col6\" class=\"data row5 col6\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col7\" class=\"data row5 col7\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col8\" class=\"data row5 col8\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col9\" class=\"data row5 col9\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col10\" class=\"data row5 col10\" >43</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col11\" class=\"data row5 col11\" >224</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col12\" class=\"data row5 col12\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col13\" class=\"data row5 col13\" >245</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col14\" class=\"data row5 col14\" >74</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col15\" class=\"data row5 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col16\" class=\"data row5 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row5_col17\" class=\"data row5 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col0\" class=\"data row6 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col1\" class=\"data row6 col1\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col2\" class=\"data row6 col2\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col3\" class=\"data row6 col3\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col4\" class=\"data row6 col4\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col5\" class=\"data row6 col5\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col6\" class=\"data row6 col6\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col7\" class=\"data row6 col7\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col8\" class=\"data row6 col8\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col9\" class=\"data row6 col9\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col10\" class=\"data row6 col10\" >249</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col11\" class=\"data row6 col11\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col12\" class=\"data row6 col12\" >245</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col13\" class=\"data row6 col13\" >126</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col14\" class=\"data row6 col14\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col15\" class=\"data row6 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col16\" class=\"data row6 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row6_col17\" class=\"data row6 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col0\" class=\"data row7 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col1\" class=\"data row7 col1\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col2\" class=\"data row7 col2\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col3\" class=\"data row7 col3\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col4\" class=\"data row7 col4\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col5\" class=\"data row7 col5\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col6\" class=\"data row7 col6\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col7\" class=\"data row7 col7\" >14</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col8\" class=\"data row7 col8\" >101</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col9\" class=\"data row7 col9\" >223</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col10\" class=\"data row7 col10\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col11\" class=\"data row7 col11\" >248</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col12\" class=\"data row7 col12\" >124</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col13\" class=\"data row7 col13\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col14\" class=\"data row7 col14\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col15\" class=\"data row7 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col16\" class=\"data row7 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row7_col17\" class=\"data row7 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col0\" class=\"data row8 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col1\" class=\"data row8 col1\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col2\" class=\"data row8 col2\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col3\" class=\"data row8 col3\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col4\" class=\"data row8 col4\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col5\" class=\"data row8 col5\" >11</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col6\" class=\"data row8 col6\" >166</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col7\" class=\"data row8 col7\" >239</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col8\" class=\"data row8 col8\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col9\" class=\"data row8 col9\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col10\" class=\"data row8 col10\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col11\" class=\"data row8 col11\" >187</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col12\" class=\"data row8 col12\" >30</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col13\" class=\"data row8 col13\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col14\" class=\"data row8 col14\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col15\" class=\"data row8 col15\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col16\" class=\"data row8 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row8_col17\" class=\"data row8 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col0\" class=\"data row9 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col1\" class=\"data row9 col1\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col2\" class=\"data row9 col2\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col3\" class=\"data row9 col3\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col4\" class=\"data row9 col4\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col5\" class=\"data row9 col5\" >16</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col6\" class=\"data row9 col6\" >248</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col7\" class=\"data row9 col7\" >250</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col8\" class=\"data row9 col8\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col9\" class=\"data row9 col9\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col10\" class=\"data row9 col10\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col11\" class=\"data row9 col11\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col12\" class=\"data row9 col12\" >232</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col13\" class=\"data row9 col13\" >213</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col14\" class=\"data row9 col14\" >111</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col15\" class=\"data row9 col15\" >2</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col16\" class=\"data row9 col16\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row9_col17\" class=\"data row9 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th id=\"T_507bccae_5414_11ea_8833_a372988ddbd9level0_row10\" class=\"row_heading level0 row10\" >10</th>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col0\" class=\"data row10 col0\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col1\" class=\"data row10 col1\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col2\" class=\"data row10 col2\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col3\" class=\"data row10 col3\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col4\" class=\"data row10 col4\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col5\" class=\"data row10 col5\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col6\" class=\"data row10 col6\" >0</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col7\" class=\"data row10 col7\" >43</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col8\" class=\"data row10 col8\" >98</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col9\" class=\"data row10 col9\" >98</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col10\" class=\"data row10 col10\" >208</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col11\" class=\"data row10 col11\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col12\" class=\"data row10 col12\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col13\" class=\"data row10 col13\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col14\" class=\"data row10 col14\" >253</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col15\" class=\"data row10 col15\" >187</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col16\" class=\"data row10 col16\" >22</td>\n",
|
||
" <td id=\"T_507bccae_5414_11ea_8833_a372988ddbd9row10_col17\" class=\"data row10 col17\" >0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody></table>"
|
||
],
|
||
"text/plain": [
|
||
"<pandas.io.formats.style.Styler at 0x7f24e23b2190>"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"#hide_output\n",
|
||
"im3_t = tensor(im3)\n",
|
||
"df = pd.DataFrame(im3_t[4:15,4:22])\n",
|
||
"df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<img width=\"453\" id=\"output_pd_pixels\" src=\"images/att_00058.png\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You can see that the background white pixels are stored as the number zero, black is the number 255, and shades of grey are between the two. This image contains 28 pixels across and 28 pixels down, for a total of 768 pixels. (This is much smaller than an image that you would get from a phone camera, which has millions of pixels, but is a convenient size for our initial learning and experiments. We will build up to bigger, full-colour images soon.)\n",
|
||
"\n",
|
||
"So, now you've seen what an image looks like to a computer, let's recall our goal: create a model that can recognise “3”s and “7”s. How might you go about getting a computer to do that?\n",
|
||
"\n",
|
||
"> stop: Before you read on, take a moment to think about how a computer might be able to recognize these two different digits. What kind of features might it be able to look at? How might it be able to identify these features? How could it combine them together? Learning works best when you try to solve problems yourself, rather than just reading somebody else's answers; so step away from this book for a few minutes, grab a piece of paper and pen, and jot some ideas down…"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## First try: pixel similarity"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"So, here is a first idea: how about we find the average pixel value for every pixel of the threes and do the same for each of the sevens. Then, to classify a digit see which of these two group averages it is most similar to. This certainly seems like it should be better than nothing, so it will make a good baseline."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> note: A _baseline_ is a simple model which you are confident should perform reasonably well. It should be very simple to implement, and very easy to test, so that you can then test each of your improved ideas, and make sure they are always better than your baseline. Without starting with a sensible baseline, it is very difficult to know whether your super fancy models are actually any good. One good approach to creating a baseline is doing what we have done here: think of a simple, easy to implement model. Another good approach is to search around to find other people that have solved similar problems to yours, and download and run their code on your dataset. Ideally, try both of these!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Step one for our simple model is to get the average of pixel values for each of our two groups. In the process of doing this, we will learn a lot of neat Python numeric programming tricks!\n",
|
||
"\n",
|
||
"Let's create a tensor containing all of our threes stacked together. We already know how to create a tensor containing a single image. To create a tensor for every image in a directory, we can use a list comprehension. (Notice also that we use Jupyter to do some little checks of our work along the way; in this case, making sure that the number of returned items seems reasonable):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(6131, 6265)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"seven_tensors = [tensor(Image.open(o)) for o in sevens]\n",
|
||
"three_tensors = [tensor(Image.open(o)) for o in threes]\n",
|
||
"len(three_tensors),len(seven_tensors)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> note: List and dictionary comprehensions are a wonderful feature of Python. Many Python programmers use them every day, including all of the authors of this book—they are part of \"idiomatic Python\". But programmers coming from other languages may have never seen them before. There are a lot of great tutorials just a web search away, so we won't spend a long time discussing them now. Here is a quick explanation and example to get you started. A list comprehension looks like this: `new_list = [f(o) for o in a_list if o>0]`. This would return every element of `a_list` that is greater than zero, after passing it to the function `f`. There are three parts here: the collection you are iterating over (`a_list`), an optional filter (`if o>0`), and something to do to each element (`f(o)`). It's not only shorter to write but way faster than the alternative ways of creating the same list with a loop."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We'll also check that one of the images looks okay. Since we now have tensors (which Jupyter by default will print as values), rather than PIL images (which Jupyter by default will display as an image), we need to use fastai's show_image function to display it:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAEQAAABECAYAAAA4E5OyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAADjElEQVR4nO2aPyh9YRjHP/f4k38L5X+ysohsUpTBhEVMJGUyGAwWg0kGkcFqlMFIyv+kSGIwKWUiUvKn5P/9DXrvcR+He+695957+vV8llPnvvd9n77n2/s8z3tOIBgMothYqQ7Ab6ggAhVEoIIIVBBBeoTf/+cUFHC6qQ4RqCACFUSggghUEIEKIlBBBCqIQAURRKpUPeHh4QGAyclJAI6PjwFYXl4GIBgMEgh8FY59fX0A3N7eAlBTUwNAU1MTAC0tLQmNVR0iCEQ4MYupl7m4uABgYmICgJWVFQDOz8/DxhUVFQFQX18fGvMbxcXFAFxeXsYSkhPay7jBkz1ke3sbgLa2NgBeX18BeH9/B6CzsxOAnZ0dAAoLCwFC+4ZlWXx8fISNXVpa8iK0qFGHCDxxyN3dHQBPT09h98vLywGYmpoCoKys7Nc5LMsKu0p6enrijtMN6hCBJ1nm8/MTgOfn57D75mlnZWVFnOPq6gqAxsZGwM5I2dnZAOzu7gJQW1vrJiQ3aJZxgyd7iHFCTk5OzHNUVlYCdmYyzjDVrYfO+BN1iCApvYzk5eUFgM3NTQCGhoZCzsjMzARgenoagIGBgaTGpg4RJMUhpnIdHh4GYH5+HrDrl++0t7cD0NXVlYzQfqAOESSk25WY+iQ/Px8g1LeYqxMlJSUAlJaWAjAyMgLYvY7pg+LAcYKkCCIxRdjJyUno3tjYGAD7+/t//tcIMjc3B0Bubm6sYWhh5oaUOMSJt7c3wHaPScn9/f2O4w8PDwGoq6uLdUl1iBtSUpg5kZGRAUBFRQUAvb29AKyurgKwsLAQNn5tbQ2IyyGOqEMEvnGIxKTV39JrdXV1QtZVhwh8k2Uke3t7ADQ3NwP2sYDh5uYGgIKCgliX0CzjBt/tIWdnZwAMDg4CP51h6pK8vLyErK8OEfhmDzF1RUdHB2AfIhnMEePp6Slg1y1xoHuIG1K6h1xfXwMwOzvL+Pg48PVpxHfMS+6trS3AE2f8iTpE4KlDzBPf2NgA7I9bHh8fATg4OADg6OgIsM807u/vQ3OkpaUB9qvLmZkZIHFZRaIOEXiaZbq7uwFYXFyMOpDW1lYARkdHAWhoaIh6jijRLOMGTx1iPnIxtUQkzEHy+vo6VVVVXwHFf3jsFnWIG3xTqaYAdYgbVBCBCiJQQQQqiCBSL5O0osAvqEMEKohABRGoIAIVRKCCCP4B/PMI7HrW9/wAAAAASUVORK5CYII=\n",
|
||
"text/plain": [
|
||
"<Figure size 72x72 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"show_image(three_tensors[1]);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We want to take the average of the pixels, which means we need to combine all of the items of this list into a single three-dimensional tensor. The most common way to describe such a tensor is to call it a *rank-3 tensor*. We often need to stack up individual tensors in a collection into a single tensor. Unsurprisingly, PyTorch comes with a function called `stack`. Some things in PyTorch, such as taking a mean, require us to cast our integer types to float types. Since we'll be needing this later, we'll cast our tensor to `float` now. Casting in PyTorch is as simple as typing the name of the type you wish to cast to, and treating it as a method. Generally when images are floats, the pixels are expected to be be zero and one, so we also divide by 255 here."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"torch.Size([6131, 28, 28])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"stacked_sevens = torch.stack(seven_tensors).float()/255\n",
|
||
"stacked_threes = torch.stack(three_tensors).float()/255\n",
|
||
"stacked_threes.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Perhaps the most important attribute of tensor is its shape. This tells you the length of each axis. In this case, we can see that we have 6131 images, each of size 28 x 28 pixels. There is nothing specifically about this tensor that says that the first axis is the number of images, the second is the height, and the third is the width — the semantics of a tensor are entirely up to us, and how we construct it. As far as PyTorch is concerned, it is just a bunch of numbers in memory.\n",
|
||
"\n",
|
||
"The length of a tensor's shape is its rank."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"3"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(stacked_threes.shape)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> Important: it's really important for you to commit to memory and practice these bits of tensor jargon: _rank_ is the number of axes or dimensions in a tensor; _shape_ is the size of each axis of a tensor."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You can also get a tensor's rank directly with `ndim`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"3"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"stacked_threes.ndim"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we can calculate the mean across all of these tensors by taking the mean of dimension zero."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAEQAAABECAYAAAA4E5OyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAE1klEQVR4nO2byU8jPRTEf2En7AgQO4gDi9hO8P9fOIE4AGIR+xoU1kAgQAKZA6o4eUMU6O7R983IdbE66XYHv3K98rOJ5fN5PByq/usf8H+DHxADPyAGfkAM/IAY1FT4/l9OQbGvPvQMMfADYuAHxMAPiIEfEINKWSYSaL1kW/t9MWKx2JfX5T6PCp4hBpEwxEb+/f0dgFwuB0A2mwXg6emppH1+fgbg9fWVj48PAGprawGIx+MANDc3A9DU1ARAfX19yX3V1dVAeQb9FJ4hBqEYIkZYJmQyGQBub28BuLi4AGBnZweAvb09AM7PzwFIJpOFPsSEvr4+AMbHxwGYnZ0FYHR0FICuri7AMUiMqar6jHFQpniGGARiSDmtkDZcXV0BsL+/D8Da2hoAu7u7ABwcHACOOalUipeXl5J3tLW1AXB6elrS58LCAgBTU1MA9Pf3A44pYkhQeIYYhGKIMoMYoigXZw+AmprP13R2dgJuvo+NjQGf2qNnbm5uSvp4e3sD4O7uDnC6I41Rn8pK+m1eQyJCJFlG0VDkW1tbARgaGgKgo6MDcFEX5CFyuVwhIx0dHQFwcnICOF3SO8RGsbOc+w0KzxCDUAyRoivSmse6lvIrGymqgqKayWRIJBIAXF9fl/Ste6RD8il6V11dXcn93qlGjEAMURQUFUVP19ISO8/FFGWfx8dH4NNjbGxsALC1tQU4DWloaAAc2wYGBgCXXRobGwHHyrDwDDGIREMsY8QMtVrjKMtcXl4CsLm5CcDq6irr6+uAc7fqc35+HoDe3l7AOVNlsqjWMIW/KdTT/yBCaUglVyjNSKVSgIv+0tISACsrKwAsLy8XHKhYJScqBrS0tABOU6JmhuAZYhBKQyxTBF0rm2ilKp3Q6lcMSSQSBWbIX6gyJv2RPxHburu7S+6zlbOgiLTIbBd9tjygHytBnJ6eBmBkZKTQh50KelZlABWZ2tvbATeFoiol+iljEMnizk4Zu9iTiVIZUNcyZvl8vsAmlR+TySRAwdI/PDwAcHh4CMDw8DDgmGItfFCj5hliEKpAZDWjXDnAFnGkGcXPSyskmipEizlKy/peDBJTrFELWijyDDH4EUMsI2xrmaPoKDVqnlsUM0T3SDO03NcC0pYrldpticFrSEQIpCG2uKxWURLEEEVLGcBuFcRisd88ixiirKN32ixiWesXdxEjlIbYTWw7nxUt6YLdqC4uHIsRcqTb29uA8yF6l5yptEV9e6f6hxDKh2i+q/CjzSQ5UGUCRdEu3IR0Os3x8THgmCEfoj61ua1FXU9PD+BKi9apBoVniMGPGGLnp90qSKfTgCsEyV1KY3SfXcmmUqlCiUDPaAtzcHAQcI50cnIScAUkOVT5FJ9lIkYgDZGiKyrSBm0JCPf394DTA2UQfS6NyWazBdboGMTMzAwAc3NzACwuLgIwMTEBOA0pVw8JCs8Qg0AaomhK2VUA1haBXcvY+8/OzgDnQuPxeEEjdERCtZNymmFLh2Gzi+AZYhCrcIzgyy8rHcOUY1V2kQtVK99SfBRTkbetdMkew4xg+8H/e8h3EIghZW8uc4S70tFu+N3jVGojgGfIdxApQ/4yeIZ8B35ADPyAGFRyqtH+d85fAM8QAz8gBn5ADPyAGPgBMfADYvALMumtb+Vr5kIAAAAASUVORK5CYII=\n",
|
||
"text/plain": [
|
||
"<Figure size 72x72 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"mean3 = stacked_threes.mean(0)\n",
|
||
"show_image(mean3);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"According to this dataset, this is the ideal number three! Let's do the same thing for the sevens, but let's put all the steps together at once to save some time:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAEQAAABECAYAAAA4E5OyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAElUlEQVR4nO2bSUszWRSGn3JKUGOM84giCA7oQnHj33ejiKCI4MIpxhES5ylx6oW8dTvnMyaWJd1033dzqVRqyLlPnelWgvf3d7yc6v7pG/i3yRvEyBvEyBvEyBvEqKHK/v9yCAo++9ATYuQNYuQNYuQNYuQNYuQNYuQNYuQNYuQNYlQtU42kaj2Wz/YHwaeJY+TvRZUnxOhHhGimNb69vZWNr6+vX25r/Ep1dR9zVl9f/3HDDQ1l29qvUQRFJckTYvQtQiwRmvGXlxcAnp6eALi9vQXg+voagMvLSwAuLi7KxoeHh/A4nUOjrtHU1ARAR0cHAENDQwAMDw8D0NvbC0BraysAjY2NgCPou6R4QowiEaJZfH5+BhwR+XwegIODAwC2t7cB2NvbA+Dw8BCAo6MjAK6urgAoFovhuUSdRhEiMubn5wFYXFwEYGFhoWx/JZ9SqzwhRpEIUXTQrN7f3wNQKBQA2N/fB2B3dxdwpIicm5ubsvMlEgkSiQTgyBB1Oufj4yPgfMXo6GjZtXWc5KNMTKqJkGqZp2ZDnr25uRmA9vZ2AEZGRgDo7u4GnF/Q/nQ6HRKiGRdNq6urgPM3IsFe0+YlUeUJMaqJEJv9aRaUNYqIzs5OAMbGxsr29/f3A44Mbff19QEffkEzrBxlaWkJgLOzM8DlOJlMpmxsa2sDXP4RNbpInhCjb0WZaoTYGkVEaDuVSgGOJM1uQ0NDmNtYKdroXD09PYDzS+l0GnCE/LQa9oQYRap2K1WglhRFDn2/paUF+LPuCIIgPEY5irLb8/NzwNUyqmEGBgbKrql7+akiPTKSNYx+oH64DCKDCXttS6+vr2G43dzcBGB9fR1wj8zU1BTgHhWFbHsuSamCT91/qB81iKyTFSkiwTo6jXo8NIulUolsNgvA8vIyADs7O4CjTKFcox6VuFuKnhCjSIRU8iX2ubUNJdtYUnGYz+dZWVkBYG1tDXCJ2OzsLACTk5OAC7vJZLLs2nGR4gkximUZwhZalZrOkvarpM/lcmxsbAAuzKoQVENoZmYGcOFXfsq2Cn1iFrNiiTJ22/oSjXYZQknY1tYWuVwOcDM/NzcHOB8yODgIuKRO+YdtGVa6t1rlCTGK1YdUyg7tfvmO09NTALLZbLgkoTxjenoagImJCcBlprbMj4sMyRNiFOtityVBks8olUqAawIpGy0UCiEBKt7Gx8cB6OrqAqrnHT4P+SXFSkiljFRkaGnz5OQEcO3BIAjCdqKWF7TwpMrZRpXfei3CE2IUCyGVXouwC1la6jw+PgZcvZJKpcJ2oho/2lZe8tPlhVrlCTH6FR+ihrF8h7peWmy6u7sDXE6RyWTCaKJqVv0O+Y64apVq8oQY/corVYou8hHKTLUtMuQnEolE+OLL3z+D6C++RJUnxCiWarfaYriI0EKVljIVhZLJZLjgpIxVmWmlrvpvyRNiFFSZ3Zr+YmZ9iH3lStFGPqRYLJYdFwRBSJHIkA+xL9HF2CHzfzGrRbEQ8sdBFc5po9JX37cE/EIe4gmpRdUI+d/JE2LkDWLkDWLkDWLkDWLkDWL0F7hnDWZImx+vAAAAAElFTkSuQmCC\n",
|
||
"text/plain": [
|
||
"<Figure size 72x72 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"mean7 = stacked_sevens.mean(0)\n",
|
||
"show_image(mean7);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's now pick a \"3\", and measure its *distance* from each of these \"ideal digits\".\n",
|
||
"\n",
|
||
"> stop: How would you calculate how similar a particular image is from each of our ideal digits? Remember to step away from this book and jot down some ideas, before you move on! Research shows that recall and understanding improves dramatically when you are *engaged* with the learning process by solving problems, experimenting, and trying new ideas yourself\n",
|
||
"\n",
|
||
"Here's our sample \"3\":"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAEQAAABECAYAAAA4E5OyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAADjElEQVR4nO2aPyh9YRjHP/f4k38L5X+ysohsUpTBhEVMJGUyGAwWg0kGkcFqlMFIyv+kSGIwKWUiUvKn5P/9DXrvcR+He+695957+vV8llPnvvd9n77n2/s8z3tOIBgMothYqQ7Ab6ggAhVEoIIIVBBBeoTf/+cUFHC6qQ4RqCACFUSggghUEIEKIlBBBCqIQAURRKpUPeHh4QGAyclJAI6PjwFYXl4GIBgMEgh8FY59fX0A3N7eAlBTUwNAU1MTAC0tLQmNVR0iCEQ4MYupl7m4uABgYmICgJWVFQDOz8/DxhUVFQFQX18fGvMbxcXFAFxeXsYSkhPay7jBkz1ke3sbgLa2NgBeX18BeH9/B6CzsxOAnZ0dAAoLCwFC+4ZlWXx8fISNXVpa8iK0qFGHCDxxyN3dHQBPT09h98vLywGYmpoCoKys7Nc5LMsKu0p6enrijtMN6hCBJ1nm8/MTgOfn57D75mlnZWVFnOPq6gqAxsZGwM5I2dnZAOzu7gJQW1vrJiQ3aJZxgyd7iHFCTk5OzHNUVlYCdmYyzjDVrYfO+BN1iCApvYzk5eUFgM3NTQCGhoZCzsjMzARgenoagIGBgaTGpg4RJMUhpnIdHh4GYH5+HrDrl++0t7cD0NXVlYzQfqAOESSk25WY+iQ/Px8g1LeYqxMlJSUAlJaWAjAyMgLYvY7pg+LAcYKkCCIxRdjJyUno3tjYGAD7+/t//tcIMjc3B0Bubm6sYWhh5oaUOMSJt7c3wHaPScn9/f2O4w8PDwGoq6uLdUl1iBtSUpg5kZGRAUBFRQUAvb29AKyurgKwsLAQNn5tbQ2IyyGOqEMEvnGIxKTV39JrdXV1QtZVhwh8k2Uke3t7ADQ3NwP2sYDh5uYGgIKCgliX0CzjBt/tIWdnZwAMDg4CP51h6pK8vLyErK8OEfhmDzF1RUdHB2AfIhnMEePp6Slg1y1xoHuIG1K6h1xfXwMwOzvL+Pg48PVpxHfMS+6trS3AE2f8iTpE4KlDzBPf2NgA7I9bHh8fATg4OADg6OgIsM807u/vQ3OkpaUB9qvLmZkZIHFZRaIOEXiaZbq7uwFYXFyMOpDW1lYARkdHAWhoaIh6jijRLOMGTx1iPnIxtUQkzEHy+vo6VVVVXwHFf3jsFnWIG3xTqaYAdYgbVBCBCiJQQQQqiCBSL5O0osAvqEMEKohABRGoIAIVRKCCCP4B/PMI7HrW9/wAAAAASUVORK5CYII=\n",
|
||
"text/plain": [
|
||
"<Figure size 72x72 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"a_3 = stacked_threes[1]\n",
|
||
"show_image(a_3);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can't just add up the differences between the pixels of this image and the ideal digit. Why not?...\n",
|
||
"\n",
|
||
"Because some will be too high, some will be too low, but overall these will balance out. Instead, there's two main ways data scientists measure *distance* in this context:\n",
|
||
"\n",
|
||
"- Take the mean of the *absolute value* of differences (_absolute value_ is the function that replaces negative values with positive values). This is called the *mean absolute difference* or *L1 norm*\n",
|
||
"- Take the mean of the *square* of differences (which makes everything positive) and then take the *square root* (which *undoes* the squaring). This is called the *root mean squared error (RMSE)* or *L2 norm*.\n",
|
||
"\n",
|
||
"> important: in this book we generally assume that you have completed high school maths, and remember at least some of it... But everybody forgets some things! It all depends on what you happen to have had reason to practice in the meantime. Perhaps you have forgotten what a _square root_ is, or exactly how they work. No problem! Any time you come across a maths concept that is not explained fully in this book, don't just keep moving on, but instead stop and look it up. Make sure you understand the basic idea of what that maths concept is, how it works, and why we might be using it. One of the best places to refresh your understanding is Khan Academy. For instance, Khan Academy has a great [introduction to square roots](https://www.khanacademy.org/math/algebra/x2f8bb11595b61c86:rational-exponents-radicals/x2f8bb11595b61c86:radicals/v/understanding-square-roots)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's try both of these now:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(tensor(0.1114), tensor(0.2021))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dist_3_abs = (a_3 - mean3).abs().mean()\n",
|
||
"dist_3_sqr = ((a_3 - mean3)**2).mean().sqrt()\n",
|
||
"dist_3_abs,dist_3_sqr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(tensor(0.1586), tensor(0.3021))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dist_7_abs = (a_3 - mean7).abs().mean()\n",
|
||
"dist_7_sqr = ((a_3 - mean7)**2).mean().sqrt()\n",
|
||
"dist_7_abs,dist_7_sqr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In both cases, the distance between our `3` and the \"ideal\" `3` is less than the distance to the ideal `7`. So our simple model will give the right prediction in this case."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> s: Intuitively, the difference between L1 norm and mean squared error (*MSE*) is that the latter will penalize more heavily bigger mistakes than the former (and be more lenient with small mistakes)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"PyTorch already provides both of these as *loss functions*. You'll find these inside `torch.nn.functional`, which the PyTorch team recommends importing as `F` (and is available by default under that name in fastai). Here *MSE* stands for *mean squared error*, and *L1* refers to the standard mathematical jargon for *mean absolute value* (in math it's called the *L1 norm*)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(tensor(0.1586), tensor(0.3021))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"F.l1_loss(a_3.float(),mean7), F.mse_loss(a_3,mean7).sqrt()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> j: When I first came across this \"L1\" thingie, I looked it up to see what on Earth it meant, found on Google that it is a _vector norm_ using _absolute value_, so looked up _vector norm_ and started reading: _Given a vector space V over a field F of the real or complex numbers, a norm on V is a nonnegative-valued any function p: V → \\[0,+∞) with the following properties: For all a ∈ F and all u, v ∈ V, p(u + v) ≤ p(u) + p(v)..._ Then I stopped reading. \"Ugh, I'll never understand math!\" I thought, for the thousandth time. Since then I've learned that every time these complex mathy bits of jargon come up in practice, it turns out I can replace them with a tiny bit of code! Like the _L1 loss_ is just equal to `(a-b).abs().mean()`, where `a` and `b` are tensors. I guess mathy folks just think differently to me... I'll make sure, in this book, every time some mathy jargon comes up, I'll give you the little bit of code it's equal to as well, and explain in common sense terms what's going on."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### NumPy arrays and PyTorch tensors"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In the above code we completed various mathematical operations on *PyTorch tensors*. If you've done some numeric programming in Pytorch before, you may recognize these as being similar to *Numpy arrays*. [Numpy](https://numpy.org/) is the most widely used library for scientific and numeric programming in Python, and provides very similar functionality and a very similar API to that provided by PyTorch; however, it does not support using the GPU, or calculating gradients, which are both critical for deep learning. Therefore, in this book we will generally use PyTorch tensors instead of NumPy arrays, where possible. (Note that fastai adds some features to NumPy and PyTorch to make them a bit more similar to each other; if any code in this book doesn't work on your computer, it's possible that you forgot to include a line at the start of your notebook such as: `from fastai.vision.all import *`.)\n",
|
||
"\n",
|
||
"So, what's an array? And what's a tensor?\n",
|
||
"\n",
|
||
"And why should you care?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"A numpy array is multidimensional table of data, with all items of the same type. Since that can be any type at all, they could even be arrays of arrays, with the innermost array potentially being different sizes — this is called a \"jagged array\". By \"multidimensional table\" we mean, for instance, a list (dimension of one), a table or matrix (dimension of two), a \"table of tables\" or a \"cube\" (dimension of three), and so forth. If the items are all of some simple type such as an integer or a float then numpy will store them as a compact C data structure in memory. This is where numpy shines. Numpy has a wide variety of operators and methods which can run computations on these compact structures at the same speed as optimized C, because they are written in optimized C!\n",
|
||
"\n",
|
||
"**Arrays and tensors can finish computations many thousands of times faster than using pure Python!**\n",
|
||
"A PyTorch tensor is nearly the same thing. It, too, is a multidimensional table of data, with all items of the same type. However, they cannot be just any old type — they have to be a basic numeric type. Therefore, a PyTorch tensor cannot be a jagged array. It is always a regularly shaped multidimensional rectangular structure. The vast majority of methods and operators supported by numpy on these structures are also supported by PyTorch. But PyTorch has the very big benefit that these structures can live on the GPU, in which case this computation will be optimised for the GPU. And furthermore, PyTorch can automatically calculate derivatives of these operations, including combinations of them. As you'll see, it would be impossible to do deep learning in practice without this capability.\n",
|
||
"\n",
|
||
"> s: If you don't know what C is, do not worry as you won't need it at all. In a nutshell, it's a low-level (low-level means more similar to the language that computers use internally) language that is very fast compared to Python. To take advantage of its speed while programming in Python, try to avoid as much as possible writing loops and replace them by commands that work directly on arrays or tensors.\n",
|
||
"\n",
|
||
"Perhaps the most important new coding skill for a Python programmer to learn is how to effectively use the array/tensor APIs. We will be showing lots more tricks later in this book, but here's a summary of the key things you need to know for now."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"To create an array or tensor, pass a list (or list of lists, or list of lists of lists, etc), to `array()` or `tensor()`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"data = [[1,2,3],[4,5,6]]\n",
|
||
"arr = array (data)\n",
|
||
"tns = tensor(data)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([[1, 2, 3],\n",
|
||
" [4, 5, 6]])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"arr # numpy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([[1, 2, 3],\n",
|
||
" [4, 5, 6]])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tns # pytorch"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"All the operations below are shown on tensors - the syntax and results for NumPy arrays is idential.\n",
|
||
"\n",
|
||
"You can select a row:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([4, 5, 6])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tns[1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"...or a column, using `:` to indicate *all of the first axis* (we sometimes refer to the dimensions of tensors/arrays as *axes*):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([2, 5])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tns[:,1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can combine these, along with Python slice syntax (`[start:end]`, `end` being excluded)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([5, 6])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tns[1,1:3]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can use the standard operators:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([[2, 3, 4],\n",
|
||
" [5, 6, 7]])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tns+1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Tensors have a type:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'torch.LongTensor'"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tns.type()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Tensors will automatically change from int to float if needed"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([[1.5000, 3.0000, 4.5000],\n",
|
||
" [6.0000, 7.5000, 9.0000]])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tns*1.5"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Broadcasting and metrics"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"So, is our baseline model any good? To quantify this, we will use a metric. A metric is a number which is calculated from the predictions of our model, and the correct labels in our dataset, and tells us something about how good our model is. For instance, we could use either of the functions we saw in the previous section, mean squared error or mean absolute error, and take the average of them over the whole dataset. However, neither of these are numbers that are very understandable to most people; in practice, we normally use *accuracy* as the metric for classification models.\n",
|
||
"\n",
|
||
"As we've discussed, we need to use a *validation set* to calculate our metric. That means we need to do is remove some of the data from training entirely, so it is not seen by the model at all. As it turns out, the creators of the MNIST dataset have already done this for us. Do you remember how there was a whole separate directory called \"valid\"? That's what this directory is for!\n",
|
||
"\n",
|
||
"So to start with, let's create tensors for our threes and sevens from that directory."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(torch.Size([1010, 28, 28]), torch.Size([1028, 28, 28]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"valid_3_tens = torch.stack([tensor(Image.open(o)) for o in (path/'valid'/'3').ls()])\n",
|
||
"valid_3_tens = valid_3_tens.float()/255\n",
|
||
"valid_7_tens = torch.stack([tensor(Image.open(o)) for o in (path/'valid'/'7').ls()])\n",
|
||
"valid_7_tens = valid_7_tens.float()/255\n",
|
||
"valid_3_tens.shape,valid_7_tens.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we need a function that decides if a digit is a 3 or a 7. We need to know which of our \"ideal digits\" its closer to. First, we need a function that calculates the distance from a dataset to an ideal image. It turns out we can do that very simply, in this case calculating the mean absolute error:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor(0.1114)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def mnist_distance(a,b): return (a-b).abs().mean((-1,-2))\n",
|
||
"mnist_distance(a_3, mean3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Something very interesting happens when we run this function on the whole set of threes in the validation set:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(tensor([0.1050, 0.1526, 0.1186, ..., 0.1122, 0.1170, 0.1086]),\n",
|
||
" torch.Size([1010]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"valid_3_dist = mnist_distance(valid_3_tens, mean3)\n",
|
||
"valid_3_dist, valid_3_dist.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"It's returned the distance for every single image, as a vector (i.e. rank 1 tensor) of length 1010 (the number of threes in our validation set). How did that happen? Have a look again at our function `mnist_distance`, and you'll see we have there `(a-b)`. The magic trick is that PyTorch, when it sees two tensors of different ranks, will `broadcast` the tensor with the smaller rank to have the same size as the one with the larger rank. Then, when PyTorch sees an operation on two tensors of the same rank, it completes the operation on each corresponding element of the two tensors, and returns the tensor result. For instance:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([2, 3, 4])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"tensor([1,2,3]) + tensor([1,1,1])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"So in this case, PyTorch treats `mean3`, a rank 2 tensor representing a single image, as if it was 1010 copies of the same image, and then subtracts each of those copies from each \"three\" in our validation set. What shape would you expect this tensor to have? Try to figure it out yourself before you look at the answer below:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"torch.Size([1010, 28, 28])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"(valid_3_tens-mean3).shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We are calculating the difference between the \"ideal 3\" and each of 1010 threes in the validation set, for each of `28x28` images, resulting in the shape `1010,28,28`.\n",
|
||
"\n",
|
||
"There's a couple of really cool things to know about this operation we just did:\n",
|
||
"\n",
|
||
"- PyTorch doesn't *actually* copy `mean3` 1010 times. Instead, it just *pretends* as if it was a tensor of that shape, but doesn't actually allocate any additional memory\n",
|
||
"- It does the whole calculation in C (or, if you're using a GPU, in CUDA, the equivalent of C on the GPU), tens of thousands of times faster than pure Python (up to millions of times faster on a GPU!)\n",
|
||
"\n",
|
||
"This is true of all broadcasting and elementwise operations and functions done in PyTorch. **It's the most important technique for you to know to create efficient PyTorch code.**\n",
|
||
"\n",
|
||
"Next in `mnist_distance` we see `abs()`. You might be able to guess now what this does when applied to a tensor... It applies the method to each individual element in the tensor, and returns a tensor of the results (that is, it applies the method \"elementwise\"). So in this case, we'll get back 1010 absolute values.\n",
|
||
"\n",
|
||
"Finally, our function calls `mean((-1,-2))`. In Python, `-1` refers to the last element, and `-2` refers to the second last. So in this case, this tells PyTorch that we want to take the mean of the last two axes of the tensor. After taking the mean over the last two axes, we are left with just the first axis, which is why our final size was `(1010)`.\n",
|
||
"\n",
|
||
"We'll be learning lots more about broadcasting throughout this book, especially in <<chapter_foundations>>, and will be practising it regularly too.\n",
|
||
"\n",
|
||
"We can use this `mnist_distance` to figure out whether an image is a three or not by using the logic: if the distance between the digit in question and the ideal 3 is less than the distance to the ideal 7, then it's a 3. This function will automatically do broadcasting and be applied elementwise, just like all PyTorch functions and operators."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def is_3(x): return mnist_distance(x,mean3) < mnist_distance(x,mean7)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's test it on our example case (note also that when we convert the boolean response to a float, we get a `1.0` for true and `0.0` for false):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(tensor(True), tensor(1.))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"is_3(a_3), is_3(a_3).float()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And testing it on the full validation set of threes:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([True, True, True, ..., True, True, True])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"is_3(valid_3_tens)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we can calculate the accuracy for each of threes and sevens, by taking the average of that function for all threes, and it's inverse for all sevens:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(tensor(0.9168), tensor(0.9854), tensor(0.9511))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"accuracy_3s = is_3(valid_3_tens).float() .mean()\n",
|
||
"accuracy_7s = (1 - is_3(valid_7_tens).float()).mean()\n",
|
||
"\n",
|
||
"accuracy_3s,accuracy_7s,(accuracy_3s+accuracy_7s)/2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This looks like a pretty good start! We're getting over 90% accuracy on both threes and sevens.\n",
|
||
"\n",
|
||
"But let's be honest: threes and sevens are very different looking digits. And we're only classifying two out of the ten possible digits so far. So we're going to need to do better! To do better, perhaps we should try some deep learning."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Stochastic Gradient descent (SGD)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Do you remember the way that Arthur Samuel described machine learning, which we quoted in <<chapter_intro>>:\n",
|
||
"\n",
|
||
"> : _Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programed would \"learn\" from its experience._\n",
|
||
"\n",
|
||
"As we discussed, this is the key to allowing us to have something which can get better and better — to learn. But our pixel similarity approach does not really do this. We do not have any kind of weight assignment, or any way of improving based on testing the effectiveness of a weight assignment. In other words, we can't really improve our pixel similarity approach by modifying a set of parameters. In order to take advantage of the power of deep learning, we will first have to represent our task in the way that Arthur Samuel described it.\n",
|
||
"\n",
|
||
"Instead of trying to find the similarity between an image and a \"ideal image\" we could instead look at each individual pixel, and come up with a set of weights for each pixel, such that the highest weights are associated with those pixels most likely to be black for a particular category. For instance, pixels towards the bottom right are not very likely to be activated for a seven, so they should have a low weight for a seven, but are more likely to be activated for an eight, so they should have a high weight for an eight. This can be represented as a function for each possible category, for instance the probability of being the number eight:\n",
|
||
"\n",
|
||
"```\n",
|
||
"def pr_eight(x,w) = (x*w).sum()\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Here we are assuming that X is the image, represented as a vector. In other words, with all of the rows stacked up end to end into a single long line. And we are assuming that the weights are a vector W. If we have this function, then we just need some way to update the weights to make them a little bit better. With such an approach, we can repeat that step a number of times, making the weights better and better, until they are as good as we can make them.\n",
|
||
"\n",
|
||
"We want to find the specific values for the vector W which causes our function to be high for those images that are actually an eight, and low for those images which are not. Searching for the best vector W is a way to search for the best function for recognising eights. (Because we are not yet using a deep neural network, we are limited by what our function can actually do — we are going to fix that constraint later in this chapter.) \n",
|
||
"\n",
|
||
"To be more specific, here are the steps that we are going to require, to turn this function into a machine learning classifier:\n",
|
||
"\n",
|
||
"1. *Initialize* the weights\n",
|
||
"1. For each image, use these weights to *predict* whether it appears to be a three or a seven\n",
|
||
"1. Based on these predictions, calculate how good the model is (its *loss*)\n",
|
||
"1. Calculate the *gradient*, which measures for each weight, how changing that weight would change the loss\n",
|
||
"1. *Step* all weights based on that calculation\n",
|
||
"1. Go back to the second step, and *repeat* the process\n",
|
||
"1. ...until you decide to *stop* the training process (for instance because the model is good enough, or you don't want to wait any longer)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"hide_input": true
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/svg+xml": [
|
||
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
|
||
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
|
||
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
|
||
"<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
|
||
" -->\n",
|
||
"<!-- Title: G Pages: 1 -->\n",
|
||
"<svg width=\"591pt\" height=\"78pt\"\n",
|
||
" viewBox=\"0.00 0.00 591.49 78.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
|
||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 74)\">\n",
|
||
"<title>G</title>\n",
|
||
"<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-74 587.4867,-74 587.4867,4 -4,4\"/>\n",
|
||
"<!-- init -->\n",
|
||
"<g id=\"node1\" class=\"node\">\n",
|
||
"<title>init</title>\n",
|
||
"<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||
"<text text-anchor=\"middle\" x=\"27\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">init</text>\n",
|
||
"</g>\n",
|
||
"<!-- predict -->\n",
|
||
"<g id=\"node2\" class=\"node\">\n",
|
||
"<title>predict</title>\n",
|
||
"<ellipse fill=\"none\" stroke=\"#000000\" cx=\"126.0969\" cy=\"-18\" rx=\"35.194\" ry=\"18\"/>\n",
|
||
"<text text-anchor=\"middle\" x=\"126.0969\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">predict</text>\n",
|
||
"</g>\n",
|
||
"<!-- init->predict -->\n",
|
||
"<g id=\"edge1\" class=\"edge\">\n",
|
||
"<title>init->predict</title>\n",
|
||
"<path fill=\"none\" stroke=\"#000000\" d=\"M54.0787,-18C62.3227,-18 71.6196,-18 80.7269,-18\"/>\n",
|
||
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"80.8626,-21.5001 90.8626,-18 80.8625,-14.5001 80.8626,-21.5001\"/>\n",
|
||
"</g>\n",
|
||
"<!-- loss -->\n",
|
||
"<g id=\"node3\" class=\"node\">\n",
|
||
"<title>loss</title>\n",
|
||
"<ellipse fill=\"none\" stroke=\"#000000\" cx=\"225.1938\" cy=\"-52\" rx=\"27\" ry=\"18\"/>\n",
|
||
"<text text-anchor=\"middle\" x=\"225.1938\" y=\"-48.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">loss</text>\n",
|
||
"</g>\n",
|
||
"<!-- predict->loss -->\n",
|
||
"<g id=\"edge2\" class=\"edge\">\n",
|
||
"<title>predict->loss</title>\n",
|
||
"<path fill=\"none\" stroke=\"#000000\" d=\"M155.2932,-28.0172C166.6224,-31.9043 179.6698,-36.3808 191.4018,-40.406\"/>\n",
|
||
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"190.2859,-43.7234 200.8806,-43.6582 192.5577,-37.1023 190.2859,-43.7234\"/>\n",
|
||
"</g>\n",
|
||
"<!-- gradient -->\n",
|
||
"<g id=\"node4\" class=\"node\">\n",
|
||
"<title>gradient</title>\n",
|
||
"<ellipse fill=\"none\" stroke=\"#000000\" cx=\"361.8403\" cy=\"-52\" rx=\"39.7935\" ry=\"18\"/>\n",
|
||
"<text text-anchor=\"middle\" x=\"361.8403\" y=\"-48.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">gradient</text>\n",
|
||
"</g>\n",
|
||
"<!-- loss->gradient -->\n",
|
||
"<g id=\"edge3\" class=\"edge\">\n",
|
||
"<title>loss->gradient</title>\n",
|
||
"<path fill=\"none\" stroke=\"#000000\" d=\"M252.5178,-52C269.4967,-52 291.836,-52 311.8929,-52\"/>\n",
|
||
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"312.1329,-55.5001 322.1329,-52 312.1328,-48.5001 312.1329,-55.5001\"/>\n",
|
||
"</g>\n",
|
||
"<!-- step -->\n",
|
||
"<g id=\"node5\" class=\"node\">\n",
|
||
"<title>step</title>\n",
|
||
"<ellipse fill=\"none\" stroke=\"#000000\" cx=\"465.4867\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||
"<text text-anchor=\"middle\" x=\"465.4867\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">step</text>\n",
|
||
"</g>\n",
|
||
"<!-- gradient->step -->\n",
|
||
"<g id=\"edge4\" class=\"edge\">\n",
|
||
"<title>gradient->step</title>\n",
|
||
"<path fill=\"none\" stroke=\"#000000\" d=\"M394.0665,-41.4286C405.9515,-37.5298 419.4492,-33.1021 431.4862,-29.1535\"/>\n",
|
||
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"432.7754,-32.4142 441.1862,-25.9715 430.5935,-25.7629 432.7754,-32.4142\"/>\n",
|
||
"</g>\n",
|
||
"<!-- step->predict -->\n",
|
||
"<g id=\"edge6\" class=\"edge\">\n",
|
||
"<title>step->predict</title>\n",
|
||
"<path fill=\"none\" stroke=\"#000000\" d=\"M438.4132,-18C380.3272,-18 243.2155,-18 171.5401,-18\"/>\n",
|
||
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"171.4571,-14.5001 161.4571,-18 171.4571,-21.5001 171.4571,-14.5001\"/>\n",
|
||
"<text text-anchor=\"middle\" x=\"287.1938\" y=\"-21.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">repeat</text>\n",
|
||
"</g>\n",
|
||
"<!-- stop -->\n",
|
||
"<g id=\"node6\" class=\"node\">\n",
|
||
"<title>stop</title>\n",
|
||
"<ellipse fill=\"none\" stroke=\"#000000\" cx=\"556.4867\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||
"<text text-anchor=\"middle\" x=\"556.4867\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">stop</text>\n",
|
||
"</g>\n",
|
||
"<!-- step->stop -->\n",
|
||
"<g id=\"edge5\" class=\"edge\">\n",
|
||
"<title>step->stop</title>\n",
|
||
"<path fill=\"none\" stroke=\"#000000\" d=\"M492.7897,-18C501.068,-18 510.3085,-18 519.1272,-18\"/>\n",
|
||
"<polygon fill=\"#000000\" stroke=\"#000000\" points=\"519.203,-21.5001 529.203,-18 519.203,-14.5001 519.203,-21.5001\"/>\n",
|
||
"</g>\n",
|
||
"</g>\n",
|
||
"</svg>\n"
|
||
],
|
||
"text/plain": [
|
||
"<graphviz.files.Source at 0x7f24cd580910>"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"#id gradient_descent\n",
|
||
"#caption The gradient descent process\n",
|
||
"#alt Graph showing the steps for Gradient Descent\n",
|
||
"gv('''\n",
|
||
"init->predict->loss->gradient->step->stop\n",
|
||
"step->predict[label=repeat]\n",
|
||
"''')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"These seven steps are the key to the training of all deep learning models, and we'll be using the seven terms in the above diagram throughout this book. That deep learning turns out to rely entirely on these steps is extremely surprising and counter-intuitive. It's amazing that this process can solve such complex problems. But, as you'll see, it really does!\n",
|
||
"\n",
|
||
"There are many different ways to do each of these seven steps, and we will be learning about them throughout the rest of this book. These are the details which make a big difference for deep learning practitioners. But it turns out that the general approach to each one generally follows some basic principles:\n",
|
||
"\n",
|
||
"- **Initialize**: we initialise the weights to random values. This may sound surprising. There are certainly other choices we could make, such as initialising them to the percentage of times that that pixel is activated for that category. But since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well\n",
|
||
"- **Loss**: This is the thing Arthur Samuel refered to: \"*testing the effectiveness of any current weight assignment in terms of actual performance*\". We need some function that will return a number that is small if the performance of the model is good, and vice versa (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention)\n",
|
||
"- **Step**: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it. Increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount which works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out which direction, and roughly how much, to change each weight, without having to try all these small changes, by calculating *gradients*. This is just a performance optimisation, we would get exactly the same results by using the slower manual process as well\n",
|
||
"- **Stop**: We have already discussed how to choose how many epochs to train a model for. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's look at a picture of what this would look like. First we will define a very simple function, the quadratic — let's pretend that this is our loss function:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def f(x): return x**2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Here is a graph of that function:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"plot_function(f, 'x', 'x**2')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The sequence of steps we described above starts by picking some random value for a parameter, and calculating the value of the loss:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"plot_function(f, 'x', 'x**2')\n",
|
||
"plt.scatter(-1.5, f(-1.5), color='red');"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we look to see what would happen if we increased or decreased our parameter by a little bit — the *adjustment*. This is simply the slope at a particular point:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<img alt=\"A graph showing the squared function with the slope at one point\" width=\"400\" caption=\"The slope of a function\" src=\"images/grad_illustration.svg\" id=\"slope\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can change our weight by a little in the direction of the slop, calculate our loss and adjustment again, and repeat this a few times. Eventually, we will get to the lowest point on our curve:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<img alt=\"An illustration of gradient descent\" width=\"400\" caption=\"Gradient descent\" src=\"images/chapter2_perfect.svg\" id=\"descent\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This basic idea goes all the way back to Isaac Newton, who pointed out that we can optimise arbitrary functions in this way. Regardless of how complicated our functions become, this basic approach of gradient descent will not significantly change. The only minor changes we will see later in this book are some handy ways we can make it faster, by finding better steps."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## The gradient"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The one magic step is the bit where we calculate the *gradients*. As we mentioned, we can use calculus as a performance optimization; it allows us to more quickly calculate whether our loss will go up or down when we adjust our parameters up or down. In other words, the gradients will tell us how much we have to change each weight to make our model better.\n",
|
||
"\n",
|
||
"Perhaps you remember back to your high school calculus class: the *derivative* of a function tells you how much a change in the parameters of a function will change its result. Don't worry, lots of us forget our calculus once high school is behind us! But you will have to have some intuitive understanding of what a derivative is before you continue, so if this is all very fuzzy in your head, head over to Khan Academy and complete the lessons on basic derivatives. You won't have to know how to calculate them yourselves, you just have to know what a derivative is.\n",
|
||
"\n",
|
||
"The key point about a derivative is this: for any function, such as the quadratic function we saw before, we can calculate its derivative. The derivative is another function. It calculates the change, rather than the value. For instance, the derivative of the quadratic function at the value three tells us how rapidly the function changes at the value three. More specifically, you may remember from high school that gradient is defined as \"rise/run\", that is, the change in the value of the function, divided by the change in the value of the parameter. When we know how our function will change, then we know what we need to do to make it smaller. This is the key to machine learning: having a way to change the parameters of a function to make it smaller. Calculus provides us with a computational shortcut, the derivative, which lets us directly calculate the gradient of our functions."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"One important thing to be aware of: our function has lots of weights that we need to adjust, so when we calculate the derivative we won't get back one number, but lots of them — a gradient for every weight. But there is nothing mathematically tricky here; you can calculate the derivative with respect to one weight, and treat all the other ones as constant. Then repeat that for each weight. This is how all of the gradients are calculated, for every weight.\n",
|
||
"\n",
|
||
"We mentioned just now that you won't have to calculate any gradients yourselves. How can that be? Amazingly enough, PyTorch is able to automatically compute the derivative of nearly any function! What's more, it does it very fast. Most of the time, it will be at least as fast as any derivative function that you can create by hand. Let's see an example. First, pick a value (which must be a tensor) we want gradients at:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"xt = tensor(3.).requires_grad_()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Notice the special method `requires_grad_`? That's the magical incantation we use to tell PyTorch that we want to calculate gradients for that value.\n",
|
||
"\n",
|
||
"Now we calculate our function with that value (notice how PyTorch prints not just the value calculated, but also a note that it has a gradient function it'll be using to calculate our gradient when needed):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor(9., grad_fn=<PowBackward0>)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"yt = f(xt)\n",
|
||
"yt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Finally, we tell PyTorch to calculate the gradients for us:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"yt.backward()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> note: The \"backward\" here refers to \"back propagation\", which is the name given to the process of calculating the derivative of each layer (we'll see how this is done exactly in chapter <chapter_foundations>, when we calculate the gradients of a deep neural net from scratch). This is called the \"backward pass\" of the network, as opposed to the \"forward pass\", which is where the activations are calculated. Life would probably be easier if `backward` was just called `calculate_grad`, but deep learning folks really do like to add jargon everywhere they can!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can now view the gradients by checking the `grad` attribute of our tensor:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor(6.)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"xt.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"If you remember your high school calculus rules, the derivative of `x**2` is `2*x`, and we have `x=3`, so the gradient should be `2*3=6`, which is what PyTorch calculated for us!\n",
|
||
"\n",
|
||
"Now we'll repeat the above steps, but with a vector argument for our function:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([ 3., 4., 10.], requires_grad=True)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"xt = tensor([3.,4.,10.]).requires_grad_()\n",
|
||
"xt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"...and adding `sum()` to our function so it can take a vector (i.e. a *rank-1 tensor*), and return a scalar (i.e. a *rank-0 tensor*):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor(125., grad_fn=<SumBackward0>)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def f(x): return (x**2).sum()\n",
|
||
"\n",
|
||
"yt = f(xt)\n",
|
||
"yt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Our gradients are `2*xt`, as we'd expect!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([ 6., 8., 20.])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"yt.backward()\n",
|
||
"xt.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## The loss function"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"As we've seen, if we are going to calculate gradients (which we need), then we need some *loss function* that represents how good our model is. The obvious approach would be to use the accuracy for this purpose. In this case, we would calculate our prediction for each image, and then calculate the overall accuracy (remember, at first we simply use random weights), and then calculate the gradients of each weight with respect to that accuracy calculation.\n",
|
||
"\n",
|
||
"Unfortunately, we have a significant technical problem here. The gradient of a function is its *slope*, or its steepness, which can be defined as *rise over run* -- that is, how much the value of function goes up or down, divided by how much you changed the input. We can write this in maths: `(y_new-y_old) / (x_new-x_old)`. Specifically, it is defined when x_new is very similar to x_old, meaning that their difference is very small. But accuracy only changes at all when a prediction changes from a 3 to a 7, or vice versa. So the problem is that a small change in weights from from x_old to x_new isn't likely to cause any prediction to change, so `(y_new - y_old)` will be zero. (In other words, the gradient is zero almost everywhere.) As a result, a very small change in the value of a weight will often not actually change the accuracy at all. This means it is not useful to use accuracy as a loss function. When we use accuracy as a loss function, most of the time our gradients will actually be zero, and the model will not be able to learn from that number. That is not much use at all!\n",
|
||
"\n",
|
||
"> s: In mathematical terms, accuracy is a function that is constant almost everywhere (except at the threshold, 0.5) so its derivative is nil almost everywhere (and infinity at the threshold). This then gives gradients that are zero or infinite, so useless to do an update of gradient descent.\n",
|
||
"\n",
|
||
"Instead, we want a loss function which, when our weights result in slightly better predictions, gives us a slightly better loss. So what does a \"slightly better prediction\" look like, exactly? Well, in this case, it means that, if the correct answer is a 3, then the score is a little higher, or if the correct answer is a 7, then the score is a little lower. Here is a simple implementation of just such a function, assuming that `inputs` are numbers between zero and one:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def mnist_loss(inputs, targets):\n",
|
||
" return torch.where(targets==1, 1-inputs, inputs).mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Here, we're assuming that `targets` contains `1` for any digit which is meant to be a three, and `0` otherwise. Let's look at an example:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tgt = tensor([1,0,1])\n",
|
||
"inp = tensor([0.9, 0.4, 0.2])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"`torch.where(a,b,c)` is the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. (It's important to learn about PyTorch functions like this, because looping over tensors in Python performs at Python speed, not C/CUDA speed!) Try running `help(torch.where)` now to read the docs for this function, or, better still, look it up on the PyTorch documentation site."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([0.1000, 0.4000, 0.8000])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"torch.where(tgt==1, 1-inp, inp)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You can see that this function will return a lower number if the predictions are more accurate, and more confident for accurate predictions (higher absolute values) and less confident for inaccurate predictions. In PyTorch, we always assume that a lower value of a loss function is better."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor(0.4333)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"mnist_loss(inp,tgt)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"For instance, if we change our prediction for the one \"false\" target from `0.2` to `0.8` the loss will go down, indicating that this is a better prediction."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor(0.2333)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"mnist_loss(tensor([0.9, 0.4, 0.8]),tgt)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Sigmoid"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"One problem with `mnist_loss` as currently defined is that it assumes that inputs are always between zero and one. We need to ensure, then, that this is actually the case! As it happens, there is a function that does exactly that--it always outputs a number between one and one. This function is called *sigmoid* and is defined by:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def sigmoid(x): return 1/(1+torch.exp(-x))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Pytorch actually already defines this for us, so we don’t really need our own version. This is an important function in deep learning, since we often want to ensure values between zero and one. This is what it looks like:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"plot_function(torch.sigmoid, title='Sigmoid', min=-4, max=4)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's update `mnist_loss` to first apply `sigmoid` to the inputs:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def mnist_loss(inputs, targets):\n",
|
||
" inputs = inputs.sigmoid()\n",
|
||
" return torch.where(targets==1, 1-inputs, inputs).mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Sidebar: loss versus metric"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We now have two terms which are somewhat similar: loss and metric. They are similar because they are both measures of how well your model is performing. The key difference, though, is that the loss must be a function which has a meaningful derivative. It can't have big flat sections, and large jumps, but instead must be reasonably smooth. Therefore, sometimes it does not really reflect exactly what we are trying to achieve, but is something that is a compromise between our real goal, and a function that can be optimised using its gradient. The loss function is calculated for each item in our dataset, and then at the end of an epoch these are all averaged, and the overall mean loss is reported for the epoch.\n",
|
||
"\n",
|
||
"Metrics, on the other hand, are the numbers that we really care about. These are the things which are printed at the end of each epoch, and tell us how our model is really doing. It is important that we learn to focus on these metrics, rather than the loss, when judging the performance of a model."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### End sidebar"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Stepping with a learning rate"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The gradient only tells us the slope of our function, it doesn't actually tell us how far to adjust the parameters. It gives us some idea of how far to adjust them; if the slope is very large, then that may suggest that we have more adjustments to do, whereas if the slope is very small, that may suggest that we are close to the optimal value.\n",
|
||
"\n",
|
||
"Deciding how to change our parameters based on the value of the gradients is an important part of the deep learning process. Nearly all approaches start with the basic idea of multiplying the gradient by some small number, called the *learning rate* (LR). The learning rate is often a number between 0.001 and 0.1, although it could be anything. Often, people select a learning rate just by trying a few, and finding which results in the best model after training (we'll show you a better approach later in this book, called the *learning rate finder*). Once you've picked a learning rate, you can adjust your parameters using this simple function:\n",
|
||
"\n",
|
||
"```\n",
|
||
"w -= gradient(w) * lr\n",
|
||
"```\n",
|
||
"\n",
|
||
"This is known as *stepping* your parameters, using a *optimiser step*.\n",
|
||
"\n",
|
||
"If you pick a learning rate that's too low, it can mean having to do for a lot of steps:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<img alt=\"An illustration of gradient descent with a LR too low\" width=\"400\" caption=\"Gradient descent with low LR\" src=\"images/chapter2_small.svg\" id=\"descent_small\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Although picking a learning rate that's too high is even worse--it can actually result in the loss getting *worse*!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<img alt=\"An illustration of gradient descent with a LR too high\" width=\"400\" caption=\"Gradient descent with high LR\" src=\"images/chapter2_div.svg\" id=\"descent_div\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"If the learning rate is too high, it may also \"bounce\" around, rather than actually diverging; this has the result of taking many steps to train successfully:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<img alt=\"An illustation of gradient descent with a bouncy LR\" width=\"400\" caption=\"Gradient descent with bouncy LR\" src=\"images/chapter2_bouncy.svg\" id=\"descent_bouncy\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Summarizing gradient descent"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"To summarize, at the beginning, the weights of our model can be random (training *from scratch*) or come from of a pretrained model (*transfer learning*). In the first case, the output we will get from our inputs won't have anything to do with what we want, and even in the second case, it's very likely the pretrained model won't be very good at the speficic task we are targetting. So the model will need to *learn* better weights.\n",
|
||
"\n",
|
||
"To do this, we will compare the outputs the model gives us with our targets (we have labelled data, so we know what result the model should give) using a *loss function*, which returns a number that needs to be as low as possible. Our weights need to be improved. To do this, we take a few data items (such as images) that we feed to our model. After going through our model, we compare to the corresponding targets using our loss function. The score we get tells us how wrong our predictions were, and we will change the weights a little bit to make it slightly better.\n",
|
||
"\n",
|
||
"To find how to change the weights to make the loss a bit better, we use calculus to calculate the *gradient* (actually, we let PyTorch do it for us!) Let's imagine you are lost in the mountains with your car parked at the lowest point. To find your way, you might wander in a random direction but that probably won't help much. Since you know you your vehicle is at the lowest point, you would be better to go downhill. By always taking a step in the direction of the steepest slope, you should eventually arrive at your destination. We use the gradient to tell us how big a step to take; specifically, we multiply the gradient by a number we choose called the *learning rate* to decide on the step size."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Stochastic gradient descent and mini-batches"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In order to take an optimiser step we need to calculate the loss over one or more data items. We could calculate it for the whole dataset, and take the average, or we could calculate it for a single data item. But neither of these sounds ideal — calculating it for the whole dataset would take a very long time, but calculating it for a single item would result in a very imprecise and unstable gradient. So instead we take a compromise between the two: we calculate the average loss for a few data items at a time. This is called a *mini-batch*. The number of data items in the mini batch is called the *batch size*. A larger batch size means that you will get a more accurate and stable estimate of your datasets gradient on the loss function, but it will take longer, and you will get less mini-batches per epoch. Choosing a good batch size is one of the decisions you need to make as a deep learning practitioner to train your model quickly and accurately. We will talk about how to make this choice throughout this book.\n",
|
||
"\n",
|
||
"Another good reason for using mini-batches rather than calculating the gradient on individual data items is that, in practice, we nearly always do our training on an accelerator such as a GPU. These accelerators only perform well if they have lots of work to do at a time. So it is helpful if we can give them lots of data items to work on at a time. Using mini-batches is one of the best ways to do this. (Although if you give them too much data to work on at once, they run out of memory--making GPUs happy is tricky!)\n",
|
||
"\n",
|
||
"As we've seen, in the discussion of data augmentation, we get better generalisation if we can very things during training. A simple and effective thing we can vary during training is what data items we put in each mini batch. Rather than simply enumerating our data set in order for every epoch, instead what we normally do in practice is to randomly shuffle it on every epoch, before we create mini batches. PyTorch and fastai provide a class that will do the shuffling and mini batch collation for you, called `DataLoader`.\n",
|
||
"\n",
|
||
"A `DataLoader` can take any Python collection, and turn it into an iterator over many batches, like so:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[tensor([9, 3, 6, 8, 0]),\n",
|
||
" tensor([13, 1, 14, 4, 12]),\n",
|
||
" tensor([ 7, 11, 2, 5, 10])]"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"coll = range(15)\n",
|
||
"dl = DataLoader(coll, batch_size=5, shuffle=True)\n",
|
||
"list(dl)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"For training a model, we don't just want any Python collection, but a collection containing independent and dependent variables. A collection that contains tuples of independent and dependent variables is known in PyTorch as a Dataset. Here's an example of an extremely simple Dataset:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(#26) [(0, 'a'),(1, 'b'),(2, 'c'),(3, 'd'),(4, 'e'),(5, 'f'),(6, 'g'),(7, 'h'),(8, 'i'),(9, 'j')...]"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"ds = L(enumerate(string.ascii_lowercase))\n",
|
||
"ds"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"When we pass a Dataset to a DataLoader we will get back many batches which are themselves tuples of independent and dependent variable many batches:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[(tensor([ 7, 19, 17, 13, 25, 15]), ('h', 't', 'r', 'n', 'z', 'p')),\n",
|
||
" (tensor([11, 9, 23, 21, 3, 16]), ('l', 'j', 'x', 'v', 'd', 'q')),\n",
|
||
" (tensor([12, 2, 18, 22, 14, 24]), ('m', 'c', 's', 'w', 'o', 'y')),\n",
|
||
" (tensor([ 1, 0, 20, 4, 6, 10]), ('b', 'a', 'u', 'e', 'g', 'k')),\n",
|
||
" (tensor([8, 5]), ('i', 'f'))]"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dl = DataLoader(ds, batch_size=6, shuffle=True)\n",
|
||
"list(dl)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Putting it all together"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In code, our process will be implemented something like this for each epoch:\n",
|
||
"\n",
|
||
"```python\n",
|
||
"for x,y in dl:\n",
|
||
" pred = model(x)\n",
|
||
" loss = loss_func(pred, y)\n",
|
||
" loss.backward()\n",
|
||
" parameters -= parameters.grad * lr\n",
|
||
"```\n",
|
||
"\n",
|
||
"We already have our `x`s--that's the images themselves. We'll concatenate them all into a single tensor, and also change them from a list of matrices (a rank 3 tensor) to a list of vectors (a rank 2 tensor). We can do this using `view`, which is a PyTorch method that changes the shape of a tensor without changing its contents. `-1` is a special parameter to `view`. It means: make this axis as big as necessary to fit all the data."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"train_x = torch.cat([stacked_threes, stacked_sevens]).view(-1, 28*28)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We need a label for each. We'll use `1` for threes and `0` for sevens:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(torch.Size([12396, 784]), torch.Size([12396, 1]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"train_y = tensor([1]*len(threes) + [0]*len(sevens)).unsqueeze(1)\n",
|
||
"train_x.shape,train_y.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"A Dataset in PyTorch is required to return a tuple of `(x,y)` when indexed. Python provides a `zip` function which, when combined with `list`, provides a simple way to get this functionality:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(torch.Size([784]), tensor([1]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dset = list(zip(train_x,train_y))\n",
|
||
"x,y = dset[0]\n",
|
||
"x.shape,y"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This is enough to allow us to create a `DataLoader`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(torch.Size([256, 784]), torch.Size([256, 1]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dl = DataLoader(dset, batch_size=256)\n",
|
||
"xb,yb = first(dl)\n",
|
||
"xb.shape,yb.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We'll do the same for the validation set:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)\n",
|
||
"valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)\n",
|
||
"valid_dset = list(zip(valid_x,valid_y))\n",
|
||
"valid_dl = DataLoader(valid_dset, batch_size=256)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we need an (initially random) weight for every pixel (this is the *initialize* step in our 7-step process):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"weights = init_params((28*28,1))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The function `weights*pixels` won't be flexible enough--it is always equal to zero when the pixels are equal to zero (i.e. it's *intercept* is zero). You might remember from high school math that the formula for a line is `y=w*x+b`; we still need the `b`. We'll initialize it to a random number too:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"bias = init_params(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In neural networks, the `w` in the equation `y=w*x+b` is called the *weights*, and the `b` is called the *bias*. Together, the weights and bias make up the *parameters*."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> jargon: Parameters: the *weights* and *biases* of a model. The weights are the `w` in the equation `w*x+b`, and the biases are the `b` in that equation."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can now calculate a prediction for one image:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([4.5118], grad_fn=<AddBackward0>)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"(train_x[0]*weights.T).sum() + bias"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We need a way to do this for all the images in a mini-batch. Let's create a mini-batch of size 4 for testing:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"torch.Size([4, 784])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"batch = train_x[:4]\n",
|
||
"batch.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Whilst we could use a python for loop to calculate the prediction for each image, that would be very slow. Because Python loops don't run on the GPU, and because Python is a slow language for loops in general, we need to represent as much of the computation in a model as possible using higher-level functions.\n",
|
||
"\n",
|
||
"In this case, there's an extremely convenient mathematical operation that calculates `w*x` for every row of a matrix--it's called *matrix multiplication*. Here's what matrix multiplication looks like (diagram from Wikipedia):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<img alt=\"Matrix multiplication\" width=\"400\" caption=\"Matrix multiplication\" src=\"images/matmul2.svg\" id=\"matmul\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This image shows two matrices, `A` and `B` being multiplied together. Each item of the result, which we'll call `AB`, contains each item of its corresponding row of `A` multiplied by each item of its corresponding column of `B`, added together. For instance, row 1 column 2 (the orange dot with a red border) is calculated as $a_{1,1} * b_{1,2} + a_{1,2} * b_{2,2}$. If you need a refresher on matrix multiplication, we suggest you take a look at the great *Introduction to Matrix Multiplcation* on *Khan Academy*, since this is the most important mathematical operation in deep learning.\n",
|
||
"\n",
|
||
"In Python, matrix multiplication is represented with the `@` operator. Let's try it:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([[ 4.5118],\n",
|
||
" [ 3.6536],\n",
|
||
" [11.2975],\n",
|
||
" [14.1164]], grad_fn=<AddBackward0>)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def linear1(xb): return xb@weights + bias\n",
|
||
"preds = linear1(batch)\n",
|
||
"preds"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The first element is the same as we calculated before, as we'd expect. This equation, `batch@weights + bias`, is one of the two fundamental equations of any neural network (the other one is the *activation function*, which we'll see in a moment).\n",
|
||
"\n",
|
||
"The `mnist_loss` function we wrote earlier already works on a mini-batch, thanks to the magic of broadcasting! Here's the loss for our mini-batch:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor(0.0090, grad_fn=<MeanBackward0>)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"loss = mnist_loss(preds, train_y[:4])\n",
|
||
"loss"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we can calculate the gradients:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(torch.Size([784, 1]), tensor(-0.0013), tensor([-0.0088]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"loss.backward()\n",
|
||
"weights.grad.shape,weights.grad.mean(),bias.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's put that all in a function:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def calc_grad(xb, yb, model):\n",
|
||
" preds = model(xb)\n",
|
||
" loss = mnist_loss(preds, yb)\n",
|
||
" loss.backward()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"...and test it:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(tensor(-0.0025), tensor([-0.0177]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"calc_grad(batch, train_y[:4], linear1)\n",
|
||
"weights.grad.mean(),bias.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"But look what happens if we call it twice:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(tensor(-0.0038), tensor([-0.0265]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"calc_grad(batch, train_y[:4], linear1)\n",
|
||
"weights.grad.mean(),bias.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The gradients have changed! The reason for this is that `loss.backward` actually *adds* the gradients of `loss` to any gradients that are currently stored. So we have to set the current gradients to zero first."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"weights.grad.zero_()\n",
|
||
"bias.grad.zero_();"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> note: Methods in PyTorch that end in an underscore modify their object *in-place*. For instance, `bias.zero_()` sets all elements of the tensor `bias` to zero."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Our only remaining step will be to update the weights and bias based on the gradient and learning rate. When we do so, we have to tell PyTorch not to take the gradient of this step too, otherwise things will get very confusing! If we assign to the `data` attribute of a tensor then PyTorch will not take the gradient of that step. Here's our basic training loop for an epoch:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def train_epoch(model, lr, params):\n",
|
||
" for xb,yb in dl:\n",
|
||
" calc_grad(xb, yb, model)\n",
|
||
" for p in params:\n",
|
||
" p.data -= p.grad*lr\n",
|
||
" p.grad.zero_()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We also want to know how we're doing, by looking at the accuracy of the validation set. To decide if an output represents a 3 or a 7, we can just check whether it's greater than zero. So our accuracy for each item can be calculated (using broadcasting, so no loops!) with:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([[True],\n",
|
||
" [True],\n",
|
||
" [True],\n",
|
||
" [True]])"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"(preds>0.0).float() == train_y[:4]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"That gives us this function to calculate our validation accuracy:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def batch_accuracy(xb, yb):\n",
|
||
" preds = xb.sigmoid()\n",
|
||
" correct = (preds>0.5) == yb\n",
|
||
" return correct.float().mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can check it works:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor(1.)"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"batch_accuracy(linear1(batch), train_y[:4])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"...and then putting the batches together:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def validate_epoch(model):\n",
|
||
" accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]\n",
|
||
" return round(torch.stack(accs).mean().item(), 4)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.4403"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"validate_epoch(linear1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"That's our starting point. Let's train for one epoch, and see if the accuracy improves:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.4992"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"lr = 1.\n",
|
||
"params = weights,bias\n",
|
||
"train_epoch(linear1, lr, params)\n",
|
||
"validate_epoch(linear1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0.6772 0.8081 0.914 0.9453 0.9565 0.9619 0.9624 0.9633 0.9658 0.9677 0.9702 0.9716 0.9721 0.9736 0.9741 0.9745 0.9765 0.977 0.977 0.9765 "
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for i in range(20):\n",
|
||
" train_epoch(linear1, lr, params)\n",
|
||
" print(validate_epoch(linear1), end=' ')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Looking good! We're already about at the same accuracy as our \"pixel similarity\" approach, and we've created a general purpose foundation we can build on."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Creating an optimizer"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Because this is such a useful general foundation, PyTorch provides some useful classes to make it easier to implement. The first we'll use is to replace our `linear()` function with PyTorch's `nn.Linear` *module*. A \"module\" is an object of a class that inherits from the PyTorch `nn.Module` class. Objects of this class behave identically to a standard Python function, in that you can call it using parentheses, and it will return the activations of a model.\n",
|
||
"\n",
|
||
"`nn.Linear` does the same thing as our `init_params` and `linear` together. It contains both the *weights* and *bias* in a single class. Here's how we replicate our model from the previous section:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"linear_model = nn.Linear(28*28,1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Every PyTorch module knows what parameters it has that can be trained; they are available through the `parameters` method:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(torch.Size([1, 784]), torch.Size([1]))"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"w,b = linear_model.parameters()\n",
|
||
"w.shape,b.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can use this information to create an optimizer:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"class BasicOptim:\n",
|
||
" def __init__(self,params,lr): self.params,self.lr = list(params),lr\n",
|
||
"\n",
|
||
" def step(self, *args, **kwargs):\n",
|
||
" for p in self.params: p.data -= p.grad.data * self.lr\n",
|
||
"\n",
|
||
" def zero_grad(self, *args, **kwargs):\n",
|
||
" for p in self.params: p.grad = None"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can create our optimizer by passing in the model's parameters:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"opt = BasicOptim(linear_model.parameters(), lr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Our training loop can now be simplified to:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def train_epoch(model):\n",
|
||
" for xb,yb in dl:\n",
|
||
" calc_grad(xb, yb, model)\n",
|
||
" opt.step()\n",
|
||
" opt.zero_grad()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Our validation function doesn't need to change at all:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.6714"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"validate_epoch(linear_model)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's put our little training loop in a function, to make things simpler:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def train_model(model, epochs):\n",
|
||
" for i in range(epochs):\n",
|
||
" train_epoch(model)\n",
|
||
" print(validate_epoch(model), end=' ')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The results are the same as the previous section."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0.4932 0.7935 0.8477 0.9165 0.9346 0.9482 0.956 0.9634 0.9658 0.9673 0.9702 0.9717 0.9731 0.9751 0.9756 0.9765 0.9775 0.978 0.9785 0.9785 "
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"train_model(linear_model, 20)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"fastai provides the `SGD` class which, by default, does the same thing as our `BasicOptim`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0.4932 0.771 0.8594 0.918 0.9355 0.9492 0.9575 0.9634 0.9658 0.9682 0.9692 0.9717 0.9731 0.9751 0.9756 0.977 0.977 0.9785 0.9785 0.9785 "
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"linear_model = nn.Linear(28*28,1)\n",
|
||
"opt = SGD(linear_model.parameters(), lr)\n",
|
||
"train_model(linear_model, 20)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"fastai also provides `Learner.fit`, which we can use instead of `train_model`. To create a `Learner` we first need to create `DataLoaders`, by passing in our training and validation `DataLoader`s:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dls = DataLoaders(dl, valid_dl)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"To create a `Learner` without using an application (such as `cnn_learner`) we need to pass in all the information that we've created in this chapter: the `DataLoaders`, the model, the optimization function (which will be passed the parameters), the loss function, and optionally any metrics to print:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD,\n",
|
||
" loss_func=mnist_loss, metrics=batch_accuracy)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we can call `fit`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: left;\">\n",
|
||
" <th>epoch</th>\n",
|
||
" <th>train_loss</th>\n",
|
||
" <th>valid_loss</th>\n",
|
||
" <th>batch_accuracy</th>\n",
|
||
" <th>time</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0.636918</td>\n",
|
||
" <td>0.503445</td>\n",
|
||
" <td>0.495584</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0.500283</td>\n",
|
||
" <td>0.192597</td>\n",
|
||
" <td>0.839549</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0.184349</td>\n",
|
||
" <td>0.182295</td>\n",
|
||
" <td>0.833660</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>3</td>\n",
|
||
" <td>0.081278</td>\n",
|
||
" <td>0.107260</td>\n",
|
||
" <td>0.912169</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>4</td>\n",
|
||
" <td>0.043316</td>\n",
|
||
" <td>0.078320</td>\n",
|
||
" <td>0.932777</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0.028503</td>\n",
|
||
" <td>0.062712</td>\n",
|
||
" <td>0.946025</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>6</td>\n",
|
||
" <td>0.022414</td>\n",
|
||
" <td>0.052999</td>\n",
|
||
" <td>0.955348</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>7</td>\n",
|
||
" <td>0.019704</td>\n",
|
||
" <td>0.046531</td>\n",
|
||
" <td>0.962218</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>8</td>\n",
|
||
" <td>0.018323</td>\n",
|
||
" <td>0.041979</td>\n",
|
||
" <td>0.965653</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>9</td>\n",
|
||
" <td>0.017486</td>\n",
|
||
" <td>0.038622</td>\n",
|
||
" <td>0.966634</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>"
|
||
],
|
||
"text/plain": [
|
||
"<IPython.core.display.HTML object>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"learn.fit(10, lr=lr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"As you can see, there's nothing magic about the PyTorch and fastai classes. They are just convenient pre-packaged pieces that make your life a bit easier! (They also provide a lot of extra functionality we'll be using in future chapters.)\n",
|
||
"\n",
|
||
"With these classes, we can now replace our linear model with a neural network."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Adding a non-linearity"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"So far we have a general procedure for optimising the parameters of a function, and we have tried it out on a very boring function: a simple linear classifier. A linear classifier is very constrained in terms of what it can do. Let's instead use a neural network. Here is the entire definition of a basic neural network:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def simple_net(xb): \n",
|
||
" res = xb@w1 + b1\n",
|
||
" res = res.max(tensor(0.0))\n",
|
||
" res = res@w2 + b2\n",
|
||
" return res"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"That's it! All we have in `simple_net` is two linear classifiers with a max function between them.\n",
|
||
"\n",
|
||
"Here, `w1` and `w2` are weight tensors, and `b1` and `b2` are bias tensors; that is, parameters that are initially randomly initialised, just like we did in the previous section."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"w1 = init_params((28*28,30))\n",
|
||
"b1 = init_params(30)\n",
|
||
"w2 = init_params((30,1))\n",
|
||
"b2 = init_params(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The key point about this is that `w1` has 30 output activations (which means that `w2` must have 30 input activations, so they match). That means that the first layer can construct 30 different features, each representing some different mix of pixels. You can change that `30` to anything you like, to make the model more or less complex.\n",
|
||
"\n",
|
||
"That little function `res.max(tensor(0.0))` is called a *rectified linear unit*, also known as *ReLU*. I think we can all agree that *rectified linear unit* sounds pretty fancy and complicated... But actually, there's nothing more to it than `res.max(tensor(0.0))`, in other words: replace every negative number with a zero. This tiny function is also available in PyTorch as `F.relu`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"plot_function(F.relu)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> j: There is an enormous amount of jargon in deep learning, such as: _rectified linear unit_. The vast vast majority of this jargon is no more complicated than can be implemented in a short line of code and Python, as we saw in this example. The reality is that for academics to get their papers published they need to make them sound as impressive and sophisticated as possible. One of the ways that they do that is to introduce jargon. Unfortunately, this has the result that the field ends up becoming far more intimidating and difficult to get into than it should be. You do have to learn the jargon, because otherwise papers and tutorials are not going to mean much to you. But that doesn't mean you have to find the jargon intimidating. Just remember, when you come across a word or phrase that you haven't seen before, it will almost certainly turn out that it is a very simple concept that it is referring to."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The basic idea is that by using more linear layers, we can have our model do more computation, and therefore model more complex functions. But there's no point just putting one linear layout directly after another one, because when we multiply things together and then at them up multiple times, that can be replaced by multiplying different things together and adding them up just once! That is to say, a series of any number of linear layers in a row can be replaced with a single linear layer with a different set of parameters.\n",
|
||
"\n",
|
||
"But if we put a non-linear function between them, such as max, then this is no longer true. Now, each linear layer is actually somewhat decoupled from the other ones, and can do its own useful work. The max function is particularly interesting, because it operates as a simple \"if\" statement. For any arbitrarily wiggly function, we can approximate it as a bunch of lines joined together; to make it more close to the wiggly function, we just have to use shorter lines.\n",
|
||
"\n",
|
||
"Amazingly enough, it can be mathematically proven that this little function can solve any computable problem to an arbitrarily high level of accuracy, if you can find the right parameters for `w1` and `w2`, and if you make these matrices big enough. This is known as the *universal approximation theorem* . The three lines of code that we have here are known as *layers*. The first and third are known as *linear layers*, and the second line of code is known variously as a *nonlinearity*, or *activation function*.\n",
|
||
"\n",
|
||
"Just like the previous section, we can replace this code with something a bit simpler, by taking advantage of PyTorch:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"simple_net = nn.Sequential(\n",
|
||
" nn.Linear(28*28,30),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Linear(30,1)\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"`nn.Sequential` creates a module which will call each of the listed layers or functions in turn.\n",
|
||
"\n",
|
||
"`F.relu` is a function, not a PyTorch module. `nn.ReLU` is a PyTorch module that does exactly the same thing. Most functions that can appear in a model also have identical forms that are modules. Generally, it's just a case of replacing `F` with `nn`, and changing the capitalization. When using `nn.Sequential` PyTorch requires us to use the module version. Since modules are classes, we have to instantiate them, which is why you see `nn.ReLU()` above. Because `nn.Sequential` is a module, we can get its parameters--which will return a list of all the parameters of all modules it contains.\n",
|
||
"\n",
|
||
"Let's try it out! For deeper models, we may need to use a lower learning rate and a few more epochs."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learn = Learner(dls, simple_net, opt_func=SGD,\n",
|
||
" loss_func=mnist_loss, metrics=batch_accuracy)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: left;\">\n",
|
||
" <th>epoch</th>\n",
|
||
" <th>train_loss</th>\n",
|
||
" <th>valid_loss</th>\n",
|
||
" <th>batch_accuracy</th>\n",
|
||
" <th>time</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0.294820</td>\n",
|
||
" <td>0.416238</td>\n",
|
||
" <td>0.504907</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0.141692</td>\n",
|
||
" <td>0.216893</td>\n",
|
||
" <td>0.816487</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0.079073</td>\n",
|
||
" <td>0.110840</td>\n",
|
||
" <td>0.921001</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>3</td>\n",
|
||
" <td>0.052444</td>\n",
|
||
" <td>0.075782</td>\n",
|
||
" <td>0.941119</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>4</td>\n",
|
||
" <td>0.040078</td>\n",
|
||
" <td>0.059658</td>\n",
|
||
" <td>0.957802</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0.033729</td>\n",
|
||
" <td>0.050542</td>\n",
|
||
" <td>0.962709</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>6</td>\n",
|
||
" <td>0.030057</td>\n",
|
||
" <td>0.044751</td>\n",
|
||
" <td>0.965653</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>7</td>\n",
|
||
" <td>0.027653</td>\n",
|
||
" <td>0.040775</td>\n",
|
||
" <td>0.967615</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>8</td>\n",
|
||
" <td>0.025914</td>\n",
|
||
" <td>0.037867</td>\n",
|
||
" <td>0.969087</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>9</td>\n",
|
||
" <td>0.024563</td>\n",
|
||
" <td>0.035642</td>\n",
|
||
" <td>0.970069</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>10</td>\n",
|
||
" <td>0.023465</td>\n",
|
||
" <td>0.033873</td>\n",
|
||
" <td>0.972031</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>11</td>\n",
|
||
" <td>0.022547</td>\n",
|
||
" <td>0.032421</td>\n",
|
||
" <td>0.972031</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>12</td>\n",
|
||
" <td>0.021761</td>\n",
|
||
" <td>0.031202</td>\n",
|
||
" <td>0.973013</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>13</td>\n",
|
||
" <td>0.021081</td>\n",
|
||
" <td>0.030153</td>\n",
|
||
" <td>0.974485</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>14</td>\n",
|
||
" <td>0.020482</td>\n",
|
||
" <td>0.029238</td>\n",
|
||
" <td>0.974485</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>15</td>\n",
|
||
" <td>0.019949</td>\n",
|
||
" <td>0.028429</td>\n",
|
||
" <td>0.975957</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>16</td>\n",
|
||
" <td>0.019472</td>\n",
|
||
" <td>0.027706</td>\n",
|
||
" <td>0.976938</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>17</td>\n",
|
||
" <td>0.019039</td>\n",
|
||
" <td>0.027055</td>\n",
|
||
" <td>0.977429</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>18</td>\n",
|
||
" <td>0.018645</td>\n",
|
||
" <td>0.026466</td>\n",
|
||
" <td>0.977920</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>19</td>\n",
|
||
" <td>0.018283</td>\n",
|
||
" <td>0.025931</td>\n",
|
||
" <td>0.977920</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>20</td>\n",
|
||
" <td>0.017950</td>\n",
|
||
" <td>0.025441</td>\n",
|
||
" <td>0.978901</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>21</td>\n",
|
||
" <td>0.017641</td>\n",
|
||
" <td>0.024991</td>\n",
|
||
" <td>0.979882</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>22</td>\n",
|
||
" <td>0.017353</td>\n",
|
||
" <td>0.024576</td>\n",
|
||
" <td>0.979882</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>23</td>\n",
|
||
" <td>0.017084</td>\n",
|
||
" <td>0.024192</td>\n",
|
||
" <td>0.980373</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>24</td>\n",
|
||
" <td>0.016832</td>\n",
|
||
" <td>0.023837</td>\n",
|
||
" <td>0.980864</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>25</td>\n",
|
||
" <td>0.016595</td>\n",
|
||
" <td>0.023506</td>\n",
|
||
" <td>0.981354</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>26</td>\n",
|
||
" <td>0.016371</td>\n",
|
||
" <td>0.023198</td>\n",
|
||
" <td>0.981354</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>27</td>\n",
|
||
" <td>0.016159</td>\n",
|
||
" <td>0.022910</td>\n",
|
||
" <td>0.981845</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>28</td>\n",
|
||
" <td>0.015959</td>\n",
|
||
" <td>0.022641</td>\n",
|
||
" <td>0.981845</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>29</td>\n",
|
||
" <td>0.015768</td>\n",
|
||
" <td>0.022389</td>\n",
|
||
" <td>0.981845</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>30</td>\n",
|
||
" <td>0.015587</td>\n",
|
||
" <td>0.022154</td>\n",
|
||
" <td>0.981845</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>31</td>\n",
|
||
" <td>0.015414</td>\n",
|
||
" <td>0.021932</td>\n",
|
||
" <td>0.981845</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>32</td>\n",
|
||
" <td>0.015249</td>\n",
|
||
" <td>0.021725</td>\n",
|
||
" <td>0.981845</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>33</td>\n",
|
||
" <td>0.015092</td>\n",
|
||
" <td>0.021529</td>\n",
|
||
" <td>0.982336</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>34</td>\n",
|
||
" <td>0.014941</td>\n",
|
||
" <td>0.021345</td>\n",
|
||
" <td>0.982336</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>35</td>\n",
|
||
" <td>0.014796</td>\n",
|
||
" <td>0.021171</td>\n",
|
||
" <td>0.982826</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>36</td>\n",
|
||
" <td>0.014658</td>\n",
|
||
" <td>0.021007</td>\n",
|
||
" <td>0.982826</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>37</td>\n",
|
||
" <td>0.014524</td>\n",
|
||
" <td>0.020852</td>\n",
|
||
" <td>0.982826</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>38</td>\n",
|
||
" <td>0.014396</td>\n",
|
||
" <td>0.020704</td>\n",
|
||
" <td>0.983317</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td>39</td>\n",
|
||
" <td>0.014272</td>\n",
|
||
" <td>0.020564</td>\n",
|
||
" <td>0.983317</td>\n",
|
||
" <td>00:00</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>"
|
||
],
|
||
"text/plain": [
|
||
"<IPython.core.display.HTML object>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"#hide_output\n",
|
||
"learn.fit(40, 0.1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We're not showing the 40 lines of output here to save room; the training process is recorded in `learn.recorder`, with the table of output stored in the `values` attribute, so we can plot the accuracy over training as:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"plt.plot(L(learn.recorder.values).itemgot(2));"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"...and we can view the final accuracy:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.983316957950592"
|
||
]
|
||
},
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"learn.recorder.values[-1][2]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"At this point we have something that is rather magical:\n",
|
||
"\n",
|
||
"1. A function that can solve any problem to any level of accuracy (the neural network) given the correct set of parameters\n",
|
||
"1. A way to find the best set of parameters for any function (stochastic gradient descent)\n",
|
||
"\n",
|
||
"This is why deep learning can do things which seem rather magical. Believing that this combination of simple techniques can really solve any problem here is one of the biggest steps that we find many students have to take. It seems too good to be true. It seems like things should be more difficult and complicated than this. Our recommendation: try it out! We will take our own recommendation and try this model on the MNIST dataset. Since we are doing everything from scratch ourselves (except for calculating the gradients) you know that there is no special magic hiding behind the scenes…\n",
|
||
"\n",
|
||
"There is no need to stop at just two linear layers. We can add as many as we want, as long as we add a nonlinearity between each pair of linear layers. As we will learn, however, the deeper the model gets, the harder it is to optimise the parameters in practice. Later in this book we will learn about some simple but brilliantly effective techniques for training deeper models.\n",
|
||
"\n",
|
||
"We already know that a single nonlinearity with two linear layers is enough to approximate any function. So why would we use deeper models? The reason is performance. With a deeper model (that is, one with more layers) we do not need to use as many parameters; it turns out that we can use smaller matrices, with more layers, and get better results than we would get with larger matrices, and few layers.\n",
|
||
"\n",
|
||
"That means that we can train them more quickly, and our model will take up less memory. In the 1990s researchers were so focused on the universal approximation theorem that very few were experimenting with more than one nonlinearity. This theoretical but not practical foundation held back the field for years. Some researchers, however, did experiment with deep models, and eventually were able to show that these models could perform much better in practice. Eventually, theoretical results were developed which showed why this happens. Today, it is extremely unusual to find anybody using a neural network with just one nonlinearity.\n",
|
||
"\n",
|
||
"Here what happens when we train 18 layer model using the same approach we saw in <<chapter_intro>>:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: left;\">\n",
|
||
" <th>epoch</th>\n",
|
||
" <th>train_loss</th>\n",
|
||
" <th>valid_loss</th>\n",
|
||
" <th>accuracy</th>\n",
|
||
" <th>time</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0.125685</td>\n",
|
||
" <td>0.026256</td>\n",
|
||
" <td>0.992640</td>\n",
|
||
" <td>00:11</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>"
|
||
],
|
||
"text/plain": [
|
||
"<IPython.core.display.HTML object>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"dls = ImageDataLoaders.from_folder(path)\n",
|
||
"learn = cnn_learner(dls, resnet18, pretrained=False,\n",
|
||
" loss_func=F.cross_entropy, metrics=accuracy)\n",
|
||
"learn.fit_one_cycle(1, 0.1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Nearly 100% accuracy! That's a big difference compared to our simple neural net. But as you'll learn in the remainder of this book, there are just a few little tricks you need to use to get such great results from scratch yourself. You already know the key foundational pieces. (Of course, even once you know all the tricks, you'll nearly always want to work with the pre-built classes provided by PyTorch and fastai, because they save you having to think about all the little details yourself.)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Deep learning"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Congratulations: you now know how to create and train a deep neural network from scratch! There has been quite a few steps to get to this point, but you might be surprised at how simple it really has ended up.\n",
|
||
"\n",
|
||
"Now that we are at this point, it is a good opportunity to define, and review, some jargon and concepts.\n",
|
||
"\n",
|
||
"The neural network contains a lot of numbers. But those numbers only have one of two types: numbers that are calculated, and the parameters that these are calculated from. This gives us the two most important pieces of jargon to learn:\n",
|
||
"\n",
|
||
"- *activations*: numbers that are calculated (both by linear and non-linear layers)\n",
|
||
"- *parameters*: numbers that are randomly initialised, and optimised (that is, the numbers that define the model)\n",
|
||
"\n",
|
||
"We will often talk in this book about activations and parameters. Remember that they have very specific meanings. They are numbers. They are not abstract concepts, but they are actual specific numbers that are in your model. Part of becoming a good deep learning practitioner is getting used to the idea of actually looking at your activations and parameters, and plotting them and testing whether they are behaving correctly.\n",
|
||
"\n",
|
||
"Our activations and parameters are all contained in tensors. These are simply regularly shaped arrays. For example, a matrix. Matrices have rows and columns; we call these the *axes* or *dimensions*. The number of dimensions of a tensor is its *rank*. There are some special tensors:\n",
|
||
"\n",
|
||
"- rank zero: scalar\n",
|
||
"- rank one: vector\n",
|
||
"- rank two: matrix\n",
|
||
"\n",
|
||
"A neural network contains a number of layers. Each layer is either linear or nonlinear. We generally alternate between these two kinds of layers in a neural network. Sometimes people refer to both a linear layer and its subsequent nonlinearity together as a single *layer*. Yes, this is confusing. Sometimes a nonlinearity is referred to as an activation function.\n",
|
||
"\n",
|
||
"TK: Table jargon recap"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### _Choose Your Own Adventure_ reminder"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Did you choose to skip over chapters 2 & 3, in your excitement to peak under the hood? Well, here's your reminder to head back to chapter 2 now, because you'll be needing to know that stuff very soon!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Questionnaire"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"1. How is a greyscale image represented on a computer? How about a color image?\n",
|
||
"1. How are the files and folders in the `MNIST_SAMPLE` dataset structured? Why?\n",
|
||
"1. Explain how the \"pixel similarity\" approach to classifying digits works.\n",
|
||
"1. What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.\n",
|
||
"1. What is a \"rank 3 tensor\"?\n",
|
||
"1. What is the difference between tensor rank and shape? How do you get the rank from the shape?\n",
|
||
"1. What are RMSE and L1 norm?\n",
|
||
"1. How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?\n",
|
||
"1. Create a 3x3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom right 4 numbers.\n",
|
||
"1. What is broadcasting?\n",
|
||
"1. Are metrics generally calculated using the training set, or the validation set? Why?\n",
|
||
"1. What is SGD?\n",
|
||
"1. Why does SGD use mini batches?\n",
|
||
"1. What are the 7 steps in SGD for machine learning?\n",
|
||
"1. How do we initialize the weights in a model?\n",
|
||
"1. What is \"loss\"?\n",
|
||
"1. Why can't we always use a high learning rate?\n",
|
||
"1. What is a \"gradient\"?\n",
|
||
"1. Do you need to know how to calculate gradients yourself?\n",
|
||
"1. Why can't we use accuracy as a loss function?\n",
|
||
"1. Draw the sigmoid function. What is special about its shape?\n",
|
||
"1. What is the difference between loss and metric?\n",
|
||
"1. What is the function to calculate new weights using a learning rate?\n",
|
||
"1. What does the `DataLoader` class do?\n",
|
||
"1. Write pseudo-code showing the basic steps taken each epoch for SGD.\n",
|
||
"1. Create a function which, if passed two arguments `[1,2,3,4]` and `'abcd'`, returns `[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]`. What is special about that output data structure?\n",
|
||
"1. What does `view` do in PyTorch?\n",
|
||
"1. What are the \"bias\" parameters in a neural network? Why do we need them?\n",
|
||
"1. What does the `@` operator do in python?\n",
|
||
"1. What does the `backward` method do?\n",
|
||
"1. Why do we have to zero the gradients?\n",
|
||
"1. What information do we have to pass to `Learner`?\n",
|
||
"1. Show python or pseudo-code for the basic steps of a training loop.\n",
|
||
"1. What is \"ReLU\"? Draw a plot of it for values from `-2` to `+2`.\n",
|
||
"1. What is an \"activation function\"?\n",
|
||
"1. What's the difference between `F.relu` and `nn.ReLU`?\n",
|
||
"1. The universal approximation theorem shows that any function can be approximately as closely as needed using just one nonlinearity. So why do we normally use more?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Further research"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"1. Create your own implementation of `Learner` from scratch, based on the training loop shown in this chapter.\n",
|
||
"1. Complete all the steps in this chapter using the full MNIST datasets (that is, for all digits, not just threes and sevens). This is a significant project and will take you quite a bit of time to complete! You'll need to do some of your own research to figure out how to overcome some obstacles you'll meet on the way."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"jupytext": {
|
||
"split_at_heading": true
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|