From 48474eccde1955b55b43197d0d3eb525c52b16a2 Mon Sep 17 00:00:00 2001 From: SOVIETIC-BOSS88 Date: Wed, 3 Jun 2020 21:56:40 +0200 Subject: [PATCH] Update 14_resnet.ipynb --- 14_resnet.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/14_resnet.ipynb b/14_resnet.ipynb index 27ae1e7..fdb0f57 100644 --- a/14_resnet.ipynb +++ b/14_resnet.ipynb @@ -362,7 +362,7 @@ "\n", "> : Let us consider a shallower architecture and its deeper counterpart that adds more layers onto it. There exists a solution by construction to the deeper model: the added layers are identity mapping, and the other layers are copied from the learned shallower model.\n", "\n", - "AS this is an academic paper this process is described in a rather inaccessible way, but the concept is actually very simple: start with a 20-layer neural network that is trained well, and add another 36 layers that do nothing at all (for instance, they could be linear layers with a single weight equal to 1, and bias equal to 0). The result will be a 56-layer network that does exactly the same thing as the 20-layer network, proving that there are always deep networks that should be *at least as good* as any shallow network. But for some reason, SGD does not seem able to find them.\n", + "As this is an academic paper this process is described in a rather inaccessible way, but the concept is actually very simple: start with a 20-layer neural network that is trained well, and add another 36 layers that do nothing at all (for instance, they could be linear layers with a single weight equal to 1, and bias equal to 0). The result will be a 56-layer network that does exactly the same thing as the 20-layer network, proving that there are always deep networks that should be *at least as good* as any shallow network. But for some reason, SGD does not seem able to find them.\n", "\n", "> jargon: Identity mapping: Returning the input without changing it at all. This process is performed by an _identity function_.\n", "\n", @@ -824,7 +824,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `_make_layer` function is just there to create a series of `n_layers` blocks. The first one is is going from `ch_in` to `ch_out` with the indicated `stride` and all the others are blocks of stride 1 with `ch_out` to `ch_out` tensors. Once the blocks are defined, our model is purely sequential, which is why we define it as a subclass of `nn.Sequential`. (Ignore the `expansion` parameter for now; we'll discuss it in the next section. For now, it'll be `1`, so it doesn't do anything.)\n", + "The `_make_layer` function is just there to create a series of `n_layers` blocks. The first one is going from `ch_in` to `ch_out` with the indicated `stride` and all the others are blocks of stride 1 with `ch_out` to `ch_out` tensors. Once the blocks are defined, our model is purely sequential, which is why we define it as a subclass of `nn.Sequential`. (Ignore the `expansion` parameter for now; we'll discuss it in the next section. For now, it'll be `1`, so it doesn't do anything.)\n", "\n", "The various versions of the models (ResNet-18, -34, -50, etc.) just change the number of blocks in each of those groups. This is the definition of a ResNet-18:" ]