mirror of
https://github.com/fastai/fastbook.git
synced 2025-04-04 01:40:44 +00:00
1654 lines
33 KiB
Plaintext
1654 lines
33 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#hide\n",
|
||
"! [ -e /content ] && pip install -Uqq fastai # upgrade fastai on colab\n",
|
||
"import fastbook\n",
|
||
"fastbook.setup_book()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#hide\n",
|
||
"from fastai.vision.all import *\n",
|
||
"from fastbook import *\n",
|
||
"\n",
|
||
"matplotlib.rc('image', cmap='Greys')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Under the Hood: Training a Digit Classifier"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Pixels: The Foundations of Computer Vision"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Sidebar: Tenacity and Deep Learning"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## End sidebar"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"path = untar_data(URLs.MNIST_SAMPLE)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#hide\n",
|
||
"Path.BASE_PATH = path"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"path.ls()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"(path/'train').ls()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"threes = (path/'train'/'3').ls().sorted()\n",
|
||
"sevens = (path/'train'/'7').ls().sorted()\n",
|
||
"threes"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"im3_path = threes[1]\n",
|
||
"im3 = Image.open(im3_path)\n",
|
||
"im3"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"array(im3)[4:10,4:10]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tensor(im3)[4:10,4:10]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"im3_t = tensor(im3)\n",
|
||
"df = pd.DataFrame(im3_t[4:15,4:22])\n",
|
||
"df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## First Try: Pixel Similarity"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"seven_tensors = [tensor(Image.open(o)) for o in sevens]\n",
|
||
"three_tensors = [tensor(Image.open(o)) for o in threes]\n",
|
||
"len(three_tensors),len(seven_tensors)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"show_image(three_tensors[1]);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"stacked_sevens = torch.stack(seven_tensors).float()/255\n",
|
||
"stacked_threes = torch.stack(three_tensors).float()/255\n",
|
||
"stacked_threes.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"len(stacked_threes.shape)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"stacked_threes.ndim"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"mean3 = stacked_threes.mean(0)\n",
|
||
"show_image(mean3);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"mean7 = stacked_sevens.mean(0)\n",
|
||
"show_image(mean7);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"a_3 = stacked_threes[1]\n",
|
||
"show_image(a_3);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dist_3_abs = (a_3 - mean3).abs().mean()\n",
|
||
"dist_3_sqr = ((a_3 - mean3)**2).mean().sqrt()\n",
|
||
"dist_3_abs,dist_3_sqr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dist_7_abs = (a_3 - mean7).abs().mean()\n",
|
||
"dist_7_sqr = ((a_3 - mean7)**2).mean().sqrt()\n",
|
||
"dist_7_abs,dist_7_sqr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"F.l1_loss(a_3.float(),mean7), F.mse_loss(a_3,mean7).sqrt()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### NumPy Arrays and PyTorch Tensors"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"data = [[1,2,3],[4,5,6]]\n",
|
||
"arr = array (data)\n",
|
||
"tns = tensor(data)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"arr # numpy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tns # pytorch"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tns[1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tns[:,1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tns[1,1:3]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tns+1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tns.type()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tns*1.5"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Computing Metrics Using Broadcasting"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"valid_3_tens = torch.stack([tensor(Image.open(o)) \n",
|
||
" for o in (path/'valid'/'3').ls()])\n",
|
||
"valid_3_tens = valid_3_tens.float()/255\n",
|
||
"valid_7_tens = torch.stack([tensor(Image.open(o)) \n",
|
||
" for o in (path/'valid'/'7').ls()])\n",
|
||
"valid_7_tens = valid_7_tens.float()/255\n",
|
||
"valid_3_tens.shape,valid_7_tens.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def mnist_distance(a,b): return (a-b).abs().mean((-1,-2))\n",
|
||
"mnist_distance(a_3, mean3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"valid_3_dist = mnist_distance(valid_3_tens, mean3)\n",
|
||
"valid_3_dist, valid_3_dist.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tensor([1,2,3]) + tensor(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"(valid_3_tens-mean3).shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def is_3(x): return mnist_distance(x,mean3) < mnist_distance(x,mean7)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"is_3(a_3), is_3(a_3).float()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"is_3(valid_3_tens)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"accuracy_3s = is_3(valid_3_tens).float() .mean()\n",
|
||
"accuracy_7s = (1 - is_3(valid_7_tens).float()).mean()\n",
|
||
"\n",
|
||
"accuracy_3s,accuracy_7s,(accuracy_3s+accuracy_7s)/2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Stochastic Gradient Descent (SGD)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"gv('''\n",
|
||
"init->predict->loss->gradient->step->stop\n",
|
||
"step->predict[label=repeat]\n",
|
||
"''')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def f(x): return x**2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plot_function(f, 'x', 'x**2')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plot_function(f, 'x', 'x**2')\n",
|
||
"plt.scatter(-1.5, f(-1.5), color='red');"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Calculating Gradients"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"xt = tensor(3.).requires_grad_()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"yt = f(xt)\n",
|
||
"yt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"yt.backward()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"xt.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"xt = tensor([3.,4.,10.]).requires_grad_()\n",
|
||
"xt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def f(x): return (x**2).sum()\n",
|
||
"\n",
|
||
"yt = f(xt)\n",
|
||
"yt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"yt.backward()\n",
|
||
"xt.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Stepping With a Learning Rate"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### An End-to-End SGD Example"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"time = torch.arange(0,20).float(); time"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"speed = torch.randn(20)*3 + 0.75*(time-9.5)**2 + 1\n",
|
||
"plt.scatter(time,speed);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def f(t, params):\n",
|
||
" a,b,c = params\n",
|
||
" return a*(t**2) + (b*t) + c"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def mse(preds, targets): return ((preds-targets)**2).mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Step 1: Initialize the parameters"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"params = torch.randn(3).requires_grad_()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#hide\n",
|
||
"orig_params = params.clone()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Step 2: Calculate the predictions"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"preds = f(time, params)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def show_preds(preds, ax=None):\n",
|
||
" if ax is None: ax=plt.subplots()[1]\n",
|
||
" ax.scatter(time, speed)\n",
|
||
" ax.scatter(time, to_np(preds), color='red')\n",
|
||
" ax.set_ylim(-300,100)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"show_preds(preds)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Step 3: Calculate the loss"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"loss = mse(preds, speed)\n",
|
||
"loss"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Step 4: Calculate the gradients"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"loss.backward()\n",
|
||
"params.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"params.grad * 1e-5"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"params"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Step 5: Step the weights. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"lr = 1e-5\n",
|
||
"params.data -= lr * params.grad.data\n",
|
||
"params.grad = None"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"preds = f(time,params)\n",
|
||
"mse(preds, speed)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"show_preds(preds)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def apply_step(params, prn=True):\n",
|
||
" preds = f(time, params)\n",
|
||
" loss = mse(preds, speed)\n",
|
||
" loss.backward()\n",
|
||
" params.data -= lr * params.grad.data\n",
|
||
" params.grad = None\n",
|
||
" if prn: print(loss.item())\n",
|
||
" return preds"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Step 6: Repeat the process "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"for i in range(10): apply_step(params)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#hide\n",
|
||
"params = orig_params.detach().requires_grad_()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"_,axs = plt.subplots(1,4,figsize=(12,3))\n",
|
||
"for ax in axs: show_preds(apply_step(params, False), ax)\n",
|
||
"plt.tight_layout()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Step 7: stop"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Summarizing Gradient Descent"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"gv('''\n",
|
||
"init->predict->loss->gradient->step->stop\n",
|
||
"step->predict[label=repeat]\n",
|
||
"''')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## The MNIST Loss Function"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"train_x = torch.cat([stacked_threes, stacked_sevens]).view(-1, 28*28)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"train_y = tensor([1]*len(threes) + [0]*len(sevens)).unsqueeze(1)\n",
|
||
"train_x.shape,train_y.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dset = list(zip(train_x,train_y))\n",
|
||
"x,y = dset[0]\n",
|
||
"x.shape,y"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)\n",
|
||
"valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)\n",
|
||
"valid_dset = list(zip(valid_x,valid_y))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"weights = init_params((28*28,1))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"bias = init_params(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"(train_x[0]*weights.T).sum() + bias"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def linear1(xb): return xb@weights + bias\n",
|
||
"preds = linear1(train_x)\n",
|
||
"preds"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"corrects = (preds>0.5).float() == train_y\n",
|
||
"corrects"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"corrects.float().mean().item()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with torch.no_grad(): weights[0] *= 1.0001"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"preds = linear1(train_x)\n",
|
||
"((preds>0.0).float() == train_y).float().mean().item()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"trgts = tensor([1,0,1])\n",
|
||
"prds = tensor([0.9, 0.4, 0.2])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def mnist_loss(predictions, targets):\n",
|
||
" return torch.where(targets==1, 1-predictions, predictions).mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"torch.where(trgts==1, 1-prds, prds)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"mnist_loss(prds,trgts)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"mnist_loss(tensor([0.9, 0.4, 0.8]),trgts)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Sigmoid"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def sigmoid(x): return 1/(1+torch.exp(-x))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plot_function(torch.sigmoid, title='Sigmoid', min=-4, max=4)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def mnist_loss(predictions, targets):\n",
|
||
" predictions = predictions.sigmoid()\n",
|
||
" return torch.where(targets==1, 1-predictions, predictions).mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### SGD and Mini-Batches"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"coll = range(15)\n",
|
||
"dl = DataLoader(coll, batch_size=5, shuffle=True)\n",
|
||
"list(dl)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ds = L(enumerate(string.ascii_lowercase))\n",
|
||
"ds"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dl = DataLoader(ds, batch_size=6, shuffle=True)\n",
|
||
"list(dl)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Putting It All Together"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"weights = init_params((28*28,1))\n",
|
||
"bias = init_params(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dl = DataLoader(dset, batch_size=256)\n",
|
||
"xb,yb = first(dl)\n",
|
||
"xb.shape,yb.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"valid_dl = DataLoader(valid_dset, batch_size=256)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"batch = train_x[:4]\n",
|
||
"batch.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"preds = linear1(batch)\n",
|
||
"preds"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"loss = mnist_loss(preds, train_y[:4])\n",
|
||
"loss"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"loss.backward()\n",
|
||
"weights.grad.shape,weights.grad.mean(),bias.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def calc_grad(xb, yb, model):\n",
|
||
" preds = model(xb)\n",
|
||
" loss = mnist_loss(preds, yb)\n",
|
||
" loss.backward()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"calc_grad(batch, train_y[:4], linear1)\n",
|
||
"weights.grad.mean(),bias.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"calc_grad(batch, train_y[:4], linear1)\n",
|
||
"weights.grad.mean(),bias.grad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"weights.grad.zero_()\n",
|
||
"bias.grad.zero_();"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def train_epoch(model, lr, params):\n",
|
||
" for xb,yb in dl:\n",
|
||
" calc_grad(xb, yb, model)\n",
|
||
" for p in params:\n",
|
||
" p.data -= p.grad*lr\n",
|
||
" p.grad.zero_()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"(preds>0.0).float() == train_y[:4]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def batch_accuracy(xb, yb):\n",
|
||
" preds = xb.sigmoid()\n",
|
||
" correct = (preds>0.5) == yb\n",
|
||
" return correct.float().mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"batch_accuracy(linear1(batch), train_y[:4])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def validate_epoch(model):\n",
|
||
" accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]\n",
|
||
" return round(torch.stack(accs).mean().item(), 4)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"validate_epoch(linear1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"lr = 1.\n",
|
||
"params = weights,bias\n",
|
||
"train_epoch(linear1, lr, params)\n",
|
||
"validate_epoch(linear1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"for i in range(20):\n",
|
||
" train_epoch(linear1, lr, params)\n",
|
||
" print(validate_epoch(linear1), end=' ')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Creating an Optimizer"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"linear_model = nn.Linear(28*28,1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"w,b = linear_model.parameters()\n",
|
||
"w.shape,b.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"class BasicOptim:\n",
|
||
" def __init__(self,params,lr): self.params,self.lr = list(params),lr\n",
|
||
"\n",
|
||
" def step(self, *args, **kwargs):\n",
|
||
" for p in self.params: p.data -= p.grad.data * self.lr\n",
|
||
"\n",
|
||
" def zero_grad(self, *args, **kwargs):\n",
|
||
" for p in self.params: p.grad = None"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"opt = BasicOptim(linear_model.parameters(), lr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def train_epoch(model):\n",
|
||
" for xb,yb in dl:\n",
|
||
" calc_grad(xb, yb, model)\n",
|
||
" opt.step()\n",
|
||
" opt.zero_grad()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"validate_epoch(linear_model)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def train_model(model, epochs):\n",
|
||
" for i in range(epochs):\n",
|
||
" train_epoch(model)\n",
|
||
" print(validate_epoch(model), end=' ')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"train_model(linear_model, 20)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"linear_model = nn.Linear(28*28,1)\n",
|
||
"opt = SGD(linear_model.parameters(), lr)\n",
|
||
"train_model(linear_model, 20)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dls = DataLoaders(dl, valid_dl)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD,\n",
|
||
" loss_func=mnist_loss, metrics=batch_accuracy)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learn.fit(10, lr=lr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Adding a Nonlinearity"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def simple_net(xb): \n",
|
||
" res = xb@w1 + b1\n",
|
||
" res = res.max(tensor(0.0))\n",
|
||
" res = res@w2 + b2\n",
|
||
" return res"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"w1 = init_params((28*28,30))\n",
|
||
"b1 = init_params(30)\n",
|
||
"w2 = init_params((30,1))\n",
|
||
"b2 = init_params(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plot_function(F.relu)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"simple_net = nn.Sequential(\n",
|
||
" nn.Linear(28*28,30),\n",
|
||
" nn.ReLU(),\n",
|
||
" nn.Linear(30,1)\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learn = Learner(dls, simple_net, opt_func=SGD,\n",
|
||
" loss_func=mnist_loss, metrics=batch_accuracy)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learn.fit(40, 0.1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plt.plot(L(learn.recorder.values).itemgot(2));"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learn.recorder.values[-1][2]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Going Deeper"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dls = ImageDataLoaders.from_folder(path)\n",
|
||
"learn = vision_learner(dls, resnet18, pretrained=False,\n",
|
||
" loss_func=F.cross_entropy, metrics=accuracy)\n",
|
||
"learn.fit_one_cycle(1, 0.1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Jargon Recap"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Questionnaire"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"1. How is a grayscale image represented on a computer? How about a color image?\n",
|
||
"1. How are the files and folders in the `MNIST_SAMPLE` dataset structured? Why?\n",
|
||
"1. Explain how the \"pixel similarity\" approach to classifying digits works.\n",
|
||
"1. What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.\n",
|
||
"1. What is a \"rank-3 tensor\"?\n",
|
||
"1. What is the difference between tensor rank and shape? How do you get the rank from the shape?\n",
|
||
"1. What are RMSE and L1 norm?\n",
|
||
"1. How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?\n",
|
||
"1. Create a 3×3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom-right four numbers.\n",
|
||
"1. What is broadcasting?\n",
|
||
"1. Are metrics generally calculated using the training set, or the validation set? Why?\n",
|
||
"1. What is SGD?\n",
|
||
"1. Why does SGD use mini-batches?\n",
|
||
"1. What are the seven steps in SGD for machine learning?\n",
|
||
"1. How do we initialize the weights in a model?\n",
|
||
"1. What is \"loss\"?\n",
|
||
"1. Why can't we always use a high learning rate?\n",
|
||
"1. What is a \"gradient\"?\n",
|
||
"1. Do you need to know how to calculate gradients yourself?\n",
|
||
"1. Why can't we use accuracy as a loss function?\n",
|
||
"1. Draw the sigmoid function. What is special about its shape?\n",
|
||
"1. What is the difference between a loss function and a metric?\n",
|
||
"1. What is the function to calculate new weights using a learning rate?\n",
|
||
"1. What does the `DataLoader` class do?\n",
|
||
"1. Write pseudocode showing the basic steps taken in each epoch for SGD.\n",
|
||
"1. Create a function that, if passed two arguments `[1,2,3,4]` and `'abcd'`, returns `[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]`. What is special about that output data structure?\n",
|
||
"1. What does `view` do in PyTorch?\n",
|
||
"1. What are the \"bias\" parameters in a neural network? Why do we need them?\n",
|
||
"1. What does the `@` operator do in Python?\n",
|
||
"1. What does the `backward` method do?\n",
|
||
"1. Why do we have to zero the gradients?\n",
|
||
"1. What information do we have to pass to `Learner`?\n",
|
||
"1. Show Python or pseudocode for the basic steps of a training loop.\n",
|
||
"1. What is \"ReLU\"? Draw a plot of it for values from `-2` to `+2`.\n",
|
||
"1. What is an \"activation function\"?\n",
|
||
"1. What's the difference between `F.relu` and `nn.ReLU`?\n",
|
||
"1. The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Further Research"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"1. Create your own implementation of `Learner` from scratch, based on the training loop shown in this chapter.\n",
|
||
"1. Complete all the steps in this chapter using the full MNIST datasets (that is, for all digits, not just 3s and 7s). This is a significant project and will take you quite a bit of time to complete! You'll need to do some of your own research to figure out how to overcome some obstacles you'll meet on the way."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"jupytext": {
|
||
"split_at_heading": true
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|