fastbook/clean/13_convolutions.ipynb
Jeremy Howard dd985841b6 clean
2020-09-03 15:58:27 -07:00

898 lines
20 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide\n",
"!pip install -Uqq fastbook\n",
"import fastbook\n",
"fastbook.setup_book()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide\n",
"from fastai.vision.all import *\n",
"from fastbook import *\n",
"\n",
"matplotlib.rc('image', cmap='Greys')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Convolutional Neural Networks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Magic of Convolutions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"top_edge = tensor([[-1,-1,-1],\n",
" [ 0, 0, 0],\n",
" [ 1, 1, 1]]).float()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"path = untar_data(URLs.MNIST_SAMPLE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide\n",
"Path.BASE_PATH = path"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"im3 = Image.open(path/'train'/'3'/'12.png')\n",
"show_image(im3);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"im3_t = tensor(im3)\n",
"im3_t[0:3,0:3] * top_edge"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(im3_t[0:3,0:3] * top_edge).sum()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame(im3_t[:10,:20])\n",
"df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(im3_t[4:7,6:9] * top_edge).sum()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(im3_t[7:10,17:20] * top_edge).sum()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def apply_kernel(row, col, kernel):\n",
" return (im3_t[row-1:row+2,col-1:col+2] * kernel).sum()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"apply_kernel(5,7,top_edge)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mapping a Convolution Kernel"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"[[(i,j) for j in range(1,5)] for i in range(1,5)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rng = range(1,27)\n",
"top_edge3 = tensor([[apply_kernel(i,j,top_edge) for j in rng] for i in rng])\n",
"\n",
"show_image(top_edge3);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"left_edge = tensor([[-1,1,0],\n",
" [-1,1,0],\n",
" [-1,1,0]]).float()\n",
"\n",
"left_edge3 = tensor([[apply_kernel(i,j,left_edge) for j in rng] for i in rng])\n",
"\n",
"show_image(left_edge3);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Convolutions in PyTorch"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"diag1_edge = tensor([[ 0,-1, 1],\n",
" [-1, 1, 0],\n",
" [ 1, 0, 0]]).float()\n",
"diag2_edge = tensor([[ 1,-1, 0],\n",
" [ 0, 1,-1],\n",
" [ 0, 0, 1]]).float()\n",
"\n",
"edge_kernels = torch.stack([left_edge, top_edge, diag1_edge, diag2_edge])\n",
"edge_kernels.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mnist = DataBlock((ImageBlock(cls=PILImageBW), CategoryBlock), \n",
" get_items=get_image_files, \n",
" splitter=GrandparentSplitter(),\n",
" get_y=parent_label)\n",
"\n",
"dls = mnist.dataloaders(path)\n",
"xb,yb = first(dls.valid)\n",
"xb.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"xb,yb = to_cpu(xb),to_cpu(yb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"edge_kernels.shape,edge_kernels.unsqueeze(1).shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"edge_kernels = edge_kernels.unsqueeze(1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"batch_features = F.conv2d(xb, edge_kernels)\n",
"batch_features.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_image(batch_features[0,0]);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Strides and Padding"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Understanding the Convolution Equations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Our First Convolutional Neural Network"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating the CNN"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"simple_net = nn.Sequential(\n",
" nn.Linear(28*28,30),\n",
" nn.ReLU(),\n",
" nn.Linear(30,1)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"simple_net"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"broken_cnn = sequential(\n",
" nn.Conv2d(1,30, kernel_size=3, padding=1),\n",
" nn.ReLU(),\n",
" nn.Conv2d(30,1, kernel_size=3, padding=1)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"broken_cnn(xb).shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def conv(ni, nf, ks=3, act=True):\n",
" res = nn.Conv2d(ni, nf, stride=2, kernel_size=ks, padding=ks//2)\n",
" if act: res = nn.Sequential(res, nn.ReLU())\n",
" return res"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"simple_cnn = sequential(\n",
" conv(1 ,4), #14x14\n",
" conv(4 ,8), #7x7\n",
" conv(8 ,16), #4x4\n",
" conv(16,32), #2x2\n",
" conv(32,2, act=False), #1x1\n",
" Flatten(),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"simple_cnn(xb).shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = Learner(dls, simple_cnn, loss_func=F.cross_entropy, metrics=accuracy)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.fit_one_cycle(2, 0.01)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Understanding Convolution Arithmetic"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"m = learn.model[0]\n",
"m"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"m[0].weight.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"m[0].bias.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Receptive Fields"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A Note About Twitter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Color Images"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"im = image2tensor(Image.open('images/grizzly.jpg'))\n",
"im.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_image(im);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"_,axs = subplots(1,3)\n",
"for bear,ax,color in zip(im,axs,('Reds','Greens','Blues')):\n",
" show_image(255-bear, ax=ax, cmap=color)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Improving Training Stability"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"path = untar_data(URLs.MNIST)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide\n",
"Path.BASE_PATH = path"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"path.ls()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_dls(bs=64):\n",
" return DataBlock(\n",
" blocks=(ImageBlock(cls=PILImageBW), CategoryBlock), \n",
" get_items=get_image_files, \n",
" splitter=GrandparentSplitter('training','testing'),\n",
" get_y=parent_label,\n",
" batch_tfms=Normalize()\n",
" ).dataloaders(path, bs=bs)\n",
"\n",
"dls = get_dls()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dls.show_batch(max_n=9, figsize=(4,4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A Simple Baseline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def conv(ni, nf, ks=3, act=True):\n",
" res = nn.Conv2d(ni, nf, stride=2, kernel_size=ks, padding=ks//2)\n",
" if act: res = nn.Sequential(res, nn.ReLU())\n",
" return res"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def simple_cnn():\n",
" return sequential(\n",
" conv(1 ,8, ks=5), #14x14\n",
" conv(8 ,16), #7x7\n",
" conv(16,32), #4x4\n",
" conv(32,64), #2x2\n",
" conv(64,10, act=False), #1x1\n",
" Flatten(),\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fastai.callback.hook import *"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def fit(epochs=1):\n",
" learn = Learner(dls, simple_cnn(), loss_func=F.cross_entropy,\n",
" metrics=accuracy, cbs=ActivationStats(with_hist=True))\n",
" learn.fit(epochs, 0.06)\n",
" return learn"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = fit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.activation_stats.plot_layer_stats(0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.activation_stats.plot_layer_stats(-2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Increase Batch Size"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dls = get_dls(512)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = fit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.activation_stats.plot_layer_stats(-2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1cycle Training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def fit(epochs=1, lr=0.06):\n",
" learn = Learner(dls, simple_cnn(), loss_func=F.cross_entropy,\n",
" metrics=accuracy, cbs=ActivationStats(with_hist=True))\n",
" learn.fit_one_cycle(epochs, lr)\n",
" return learn"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = fit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.recorder.plot_sched()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.activation_stats.plot_layer_stats(-2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.activation_stats.color_dim(-2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.activation_stats.color_dim(-2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Batch Normalization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def conv(ni, nf, ks=3, act=True):\n",
" layers = [nn.Conv2d(ni, nf, stride=2, kernel_size=ks, padding=ks//2)]\n",
" layers.append(nn.BatchNorm2d(nf))\n",
" if act: layers.append(nn.ReLU())\n",
" return nn.Sequential(*layers)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = fit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.activation_stats.color_dim(-4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = fit(5, lr=0.1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = fit(5, lr=0.1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Questionnaire"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. What is a \"feature\"?\n",
"1. Write out the convolutional kernel matrix for a top edge detector.\n",
"1. Write out the mathematical operation applied by a 3×3 kernel to a single pixel in an image.\n",
"1. What is the value of a convolutional kernel apply to a 3×3 matrix of zeros?\n",
"1. What is \"padding\"?\n",
"1. What is \"stride\"?\n",
"1. Create a nested list comprehension to complete any task that you choose.\n",
"1. What are the shapes of the `input` and `weight` parameters to PyTorch's 2D convolution?\n",
"1. What is a \"channel\"?\n",
"1. What is the relationship between a convolution and a matrix multiplication?\n",
"1. What is a \"convolutional neural network\"?\n",
"1. What is the benefit of refactoring parts of your neural network definition?\n",
"1. What is `Flatten`? Where does it need to be included in the MNIST CNN? Why?\n",
"1. What does \"NCHW\" mean?\n",
"1. Why does the third layer of the MNIST CNN have `7*7*(1168-16)` multiplications?\n",
"1. What is a \"receptive field\"?\n",
"1. What is the size of the receptive field of an activation after two stride 2 convolutions? Why?\n",
"1. Run *conv-example.xlsx* yourself and experiment with *trace precedents*.\n",
"1. Have a look at Jeremy or Sylvain's list of recent Twitter \"like\"s, and see if you find any interesting resources or ideas there.\n",
"1. How is a color image represented as a tensor?\n",
"1. How does a convolution work with a color input?\n",
"1. What method can we use to see that data in `DataLoaders`?\n",
"1. Why do we double the number of filters after each stride-2 conv?\n",
"1. Why do we use a larger kernel in the first conv with MNIST (with `simple_cnn`)?\n",
"1. What information does `ActivationStats` save for each layer?\n",
"1. How can we access a learner's callback after training?\n",
"1. What are the three statistics plotted by `plot_layer_stats`? What does the x-axis represent?\n",
"1. Why are activations near zero problematic?\n",
"1. What are the upsides and downsides of training with a larger batch size?\n",
"1. Why should we avoid using a high learning rate at the start of training?\n",
"1. What is 1cycle training?\n",
"1. What are the benefits of training with a high learning rate?\n",
"1. Why do we want to use a low learning rate at the end of training?\n",
"1. What is \"cyclical momentum\"?\n",
"1. What callback tracks hyperparameter values during training (along with other information)?\n",
"1. What does one column of pixels in the `color_dim` plot represent?\n",
"1. What does \"bad training\" look like in `color_dim`? Why?\n",
"1. What trainable parameters does a batch normalization layer contain?\n",
"1. What statistics are used to normalize in batch normalization during training? How about during validation?\n",
"1. Why do models with batch normalization layers generalize better?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Further Research"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. What features other than edge detectors have been used in computer vision (especially before deep learning became popular)?\n",
"1. There are other normalization layers available in PyTorch. Try them out and see what works best. Learn about why other normalization layers have been developed, and how they differ from batch normalization.\n",
"1. Try moving the activation function after the batch normalization layer in `conv`. Does it make a difference? See what you can find out about what order is recommended, and why."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"jupytext": {
"split_at_heading": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}