mirror of
https://github.com/fastai/fastbook.git
synced 2025-04-04 18:00:48 +00:00
Update and fix #56
This commit is contained in:
parent
07fd07476b
commit
314266a12e
@ -1278,7 +1278,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In other words, to recap, a neural network is a particular kind of machine learning model, which fits right in to Samuel's original conception. Neural networks are special because they are highly flexible, which means they can solve an unusually range of problems just by finding the right weights. This is powerful, because stochastic gradient descent provides us a way to find those weight values automatically.\n",
|
||||
"In other words, to recap, a neural network is a particular kind of machine learning model, which fits right in to Samuel's original conception. Neural networks are special because they are highly flexible, which means they can solve an unusually wide range of problems just by finding the right weights. This is powerful, because stochastic gradient descent provides us a way to find those weight values automatically.\n",
|
||||
"\n",
|
||||
"Having zoomed out, let's now zoom back in and revisit our image classification problem into Samuel's framework.\n",
|
||||
"\n",
|
||||
|
@ -35,7 +35,7 @@
|
||||
"\n",
|
||||
"To be exact, we'll discuss the role of arrays and tensors, and of broadcasting, a powerful technique for using them expressively. We'll explain stochastic gradient descent (SGD), the mechanism for learning by updating weights automatically. We'll discuss the choice of a loss function for our basic classification task, and the role of mini-batches. We'll also describe the math that a basic neural network is actually doing. Finally, we'll put all these pieces together to see them at work.\n",
|
||||
"\n",
|
||||
"In future chapters we’ll do deep dives into other applications as well, and see how these concepts and tools generalize. But this chapter is about laying foundation stones. To be frank, that also makes this one of the harder chapters, because of how these concepts all depend on each other. Like an arch, all the stones need to be in place for the structure to stay up. Also like an arch, once that happens, it's a powerful structure that can support other things. But it requires some patience to assemble.\n",
|
||||
"In future chapters we’ll do deep dives into other applications as well, and see how these concepts and tools generalize. But this chapter is about laying foundation stones. To be frank, that also makes this one of the hardest chapters, because of how these concepts all depend on each other. Like an arch, all the stones need to be in place for the structure to stay up. Also like an arch, once that happens, it's a powerful structure that can support other things. But it requires some patience to assemble.\n",
|
||||
"\n",
|
||||
"So let us begin. The first step is to consider how images are represented in a computer."
|
||||
]
|
||||
@ -2908,7 +2908,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To understand SGD, it might be easiest to start with a simple, synthetic, example. Let's imagine you were measuring the speed of a roller coaster as it went over the top of a hump. It would start fast, and then get slower as it went up the hill, and then would be slowest at the top, and it would then speed up again as it goes downhill. If you're measuring the speed manually every second for 20 seconds, it might look something like this:"
|
||||
"To understand SGD, it might be easier to start with a simple, synthetic, example. Let's imagine you were measuring the speed of a roller coaster as it went over the top of a hump. It would start fast, and then get slower as it went up the hill, and then would be slowest at the top, and it would then speed up again as it goes downhill. If you're measuring the speed manually every second for 20 seconds, it might look something like this:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -3401,7 +3401,7 @@
|
||||
"\n",
|
||||
"So we need to choose a loss function. The obvious approach would be to use accuracy, which is our metric, as our loss function as well. In this case, we would calculate our prediction for each image, collect these values to calculate an overall accuracy, and then calculate the gradients of each weight with respect to that overall accuracy.\n",
|
||||
"\n",
|
||||
"Unfortunately, we have a significant technical problem here. The gradient of a function is its *slope*, or its steepness, which can be defined as *rise over run* -- that is, how much the value of function goes up or down, divided by how much you changed the input. We can write this in maths: `(y_new-y_old) / (x_new-x_old)`. Specifically, it is defined when x_new is very similar to x_old, meaning that their difference is very small. But accuracy only changes at all when a prediction changes from a 3 to a 7, or vice versa. So the problem is that a small change in weights from from x_old to x_new isn't likely to cause any prediction to change, so `(y_new - y_old)` will be zero. In other words, the gradient is zero almost everywhere.\n",
|
||||
"Unfortunately, we have a significant technical problem here. The gradient of a function is its *slope*, or its steepness, which can be defined as *rise over run* -- that is, how much the value of function goes up or down, divided by how much you changed the input. We can write this in maths: `(y_new-y_old) / (x_new-x_old)`. Specifically, it is defined when x_new is very similar to x_old, meaning that their difference is very small. But accuracy only changes at all when a prediction changes from a 3 to a 7, or vice versa. So the problem is that a small change in weights from x_old to x_new isn't likely to cause any prediction to change, so `(y_new - y_old)` will be zero. In other words, the gradient is zero almost everywhere.\n",
|
||||
"\n",
|
||||
"As a result, a very small change in the value of a weight will often not actually change the accuracy at all. This means it is not useful to use accuracy as a loss function. When we use accuracy as a loss function, most of the time our gradients will actually be zero, and the model will not be able to learn from that number. That is not much use at all!\n",
|
||||
"\n",
|
||||
@ -3449,7 +3449,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We're using a new function, `torch.where(a,b,c)`. This is the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. In plain English, this function will measure how distant each prediction is from 1 if it should be 1, and how distant it is from from 0 if it should be 0, and then it will take the mean of all those distances.\n",
|
||||
"We're using a new function, `torch.where(a,b,c)`. This is the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. In plain English, this function will measure how distant each prediction is from 1 if it should be 1, and how distant it is from 0 if it should be 0, and then it will take the mean of all those distances.\n",
|
||||
"\n",
|
||||
"> note: It's important to learn about PyTorch functions like this, because looping over tensors in Python performs at Python speed, not C/CUDA speed!\n",
|
||||
"\n",
|
||||
@ -5378,7 +5378,7 @@
|
||||
"|**gradient** | The derivative of the loss with respect to some parameter of the model\n",
|
||||
"|**backard pass** | Computing the gradients of the loss with respect to all model parameters\n",
|
||||
"|**gradient descent** | Taking a step in the directions opposite to the gradients to make the model parameters a little bit better\n",
|
||||
"|**learning rate** | The size of the step we take when applying SGD to update the paramters of the model\n",
|
||||
"|**learning rate** | The size of the step we take when applying SGD to update the parameters of the model\n",
|
||||
"|=====\n",
|
||||
"```"
|
||||
]
|
||||
@ -5471,10 +5471,25 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -389,7 +389,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"More importantly, to take full advantage of transfer learning, we have to define a custom *splitter*. A splitter is a function that tells the fastai library how to split the model in several parameter groups. This is what is used behind the scenes not only train the head of a model when we do transfert learning. \n",
|
||||
"More importantly, to take full advantage of transfer learning, we have to define a custom *splitter*. A splitter is a function that tells the fastai library how to split the model in several parameter groups. This is what is used behind the scenes not only train the head of a model when we do transfer learning. \n",
|
||||
"\n",
|
||||
"Here we want two parameter groups: one for the encoder and one for the head. We can thus define the following splitter (`params` is jsut a function that returns all parameters of a given module):"
|
||||
]
|
||||
@ -821,36 +821,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.3"
|
||||
},
|
||||
"varInspector": {
|
||||
"cols": {
|
||||
"lenName": 16,
|
||||
"lenType": 16,
|
||||
"lenVar": 40
|
||||
},
|
||||
"kernels_config": {
|
||||
"python": {
|
||||
"delete_cmd_postfix": "",
|
||||
"delete_cmd_prefix": "del ",
|
||||
"library": "var_list.py",
|
||||
"varRefreshCmd": "print(var_dic_list())"
|
||||
},
|
||||
"r": {
|
||||
"delete_cmd_postfix": ") ",
|
||||
"delete_cmd_prefix": "rm(",
|
||||
"library": "var_list.r",
|
||||
"varRefreshCmd": "cat(var_dic_list()) "
|
||||
}
|
||||
},
|
||||
"types_to_exclude": [
|
||||
"module",
|
||||
"function",
|
||||
"builtin_function_or_method",
|
||||
"instance",
|
||||
"_Feature"
|
||||
],
|
||||
"window_display": false
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1318,36 +1318,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.3"
|
||||
},
|
||||
"varInspector": {
|
||||
"cols": {
|
||||
"lenName": 16,
|
||||
"lenType": 16,
|
||||
"lenVar": 40
|
||||
},
|
||||
"kernels_config": {
|
||||
"python": {
|
||||
"delete_cmd_postfix": "",
|
||||
"delete_cmd_prefix": "del ",
|
||||
"library": "var_list.py",
|
||||
"varRefreshCmd": "print(var_dic_list())"
|
||||
},
|
||||
"r": {
|
||||
"delete_cmd_postfix": ") ",
|
||||
"delete_cmd_prefix": "rm(",
|
||||
"library": "var_list.r",
|
||||
"varRefreshCmd": "cat(var_dic_list()) "
|
||||
}
|
||||
},
|
||||
"types_to_exclude": [
|
||||
"module",
|
||||
"function",
|
||||
"builtin_function_or_method",
|
||||
"instance",
|
||||
"_Feature"
|
||||
],
|
||||
"window_display": false
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -4003,10 +4003,25 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -247,7 +247,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"opt_func = partial(Optimizer, cbs=[sgd_step])"
|
||||
"opt_func = partial(Optimizer, cbs=[sgd_cb])"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
Loading…
Reference in New Issue
Block a user