mirror of
https://github.com/fastai/fastbook.git
synced 2025-04-04 01:40:44 +00:00
update explanation of negative log loss (cross entropy loss) (#501)
* update explanation of nll * spelling * clean * clean * add back stuff * fix lr syntax
This commit is contained in:
parent
e57e315582
commit
a251aae293
@ -2870,7 +2870,7 @@
|
||||
"w -= gradient(w) * lr\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This is known as *stepping* your parameters, using an *optimizer step*.\n",
|
||||
"This is known as *stepping* your parameters, using an *optimizer step*. Notice how we _subtract_ the `gradient * lr` from the parameter to update it. This allows us to adjust the parameter in the direction of the slope by increasing the parameter when the slope is negative and decreasing the parameter when the slope is positive. We want to adjust our parameters in the direction of the slope because our goal in deep learning is to _minimize_ the loss.\n",
|
||||
"\n",
|
||||
"If you pick a learning rate that's too low, it can mean having to do a lot of steps. <<descent_small>> illustrates that."
|
||||
]
|
||||
@ -3004,7 +3004,7 @@
|
||||
"\n",
|
||||
"If we can solve this problem for the three parameters of a quadratic function, we'll be able to apply the same approach for other, more complex functions with more parameters—such as a neural net. Let's find the parameters for `f` first, and then we'll come back and do the same thing for the MNIST dataset with a neural net.\n",
|
||||
"\n",
|
||||
"We need to define first what we mean by \"best.\" We define this precisely by choosing a *loss function*, which will return a value based on a prediction and a target, where lower values of the function correspond to \"better\" predictions. For continuous data, it's common to use *mean squared error*:"
|
||||
"We need to define first what we mean by \"best.\" We define this precisely by choosing a *loss function*, which will return a value based on a prediction and a target, where lower values of the function correspond to \"better\" predictions. It is important for loss functions to return _lower_ values when predictions are more accurate, as the SGD procedure we defined earlier will try to _minimize_ this loss. For continuous data, it's common to use *mean squared error*:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -5853,7 +5853,7 @@
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
}
|
||||
|
File diff suppressed because one or more lines are too long
@ -396,7 +396,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tensor([1,2,3]) + tensor([1,1,1])"
|
||||
"tensor([1,2,3]) + tensor(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -956,7 +956,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"corrects = (preds>0.0).float() == train_y\n",
|
||||
"corrects = (preds>0.5).float() == train_y\n",
|
||||
"corrects"
|
||||
]
|
||||
},
|
||||
@ -1643,7 +1643,7 @@
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
}
|
||||
|
@ -123,6 +123,7 @@
|
||||
"dblock1 = DataBlock(blocks=(ImageBlock(), CategoryBlock()),\n",
|
||||
" get_y=parent_label,\n",
|
||||
" item_tfms=Resize(460))\n",
|
||||
"# Place an image in the 'images/grizzly.jpg' subfolder where this notebook is located before running this\n",
|
||||
"dls1 = dblock1.dataloaders([(Path.cwd()/'images'/'grizzly.jpg')]*100, bs=8)\n",
|
||||
"dls1.train.get_idxs = lambda: Inf.ones\n",
|
||||
"x,y = dls1.valid.one_batch()\n",
|
||||
@ -341,7 +342,7 @@
|
||||
"df = pd.DataFrame(sm_acts, columns=[\"3\",\"7\"])\n",
|
||||
"df['targ'] = targ\n",
|
||||
"df['idx'] = idx\n",
|
||||
"df['loss'] = sm_acts[range(6), targ]\n",
|
||||
"df['result'] = sm_acts[range(6), targ]\n",
|
||||
"t = df.style.hide_index()\n",
|
||||
"#To have html code compatible with our script\n",
|
||||
"html = t._repr_html_().split('</style>')[1]\n",
|
||||
@ -371,7 +372,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Taking the Log"
|
||||
"#### Taking the Log\n",
|
||||
"\n",
|
||||
"Recall that cross entropy loss may involve the multiplication of many numbers. Multiplying lots of negative numbers together can cause problems like [numerical underflow](https://en.wikipedia.org/wiki/Arithmetic_underflow) in computers. Therefore, we want to transform these probabilities to larger values so we can perform mathematical operations on them. There is a mathematical function that does exactly this: the *logarithm* (available as `torch.log`). It is not defined for numbers less than 0, and looks like this between 0 and 1:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -380,7 +383,38 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_function(torch.log, min=0,max=4)"
|
||||
"plot_function(torch.log, min=0,max=1, ty='log(x)', tx='x')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_function(lambda x: -1*torch.log(x), min=0,max=1, tx='x', ty='- log(x)', title = 'Log Loss when true label = 1')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from IPython.display import HTML\n",
|
||||
"df['loss'] = -torch.log(tensor(df['result']))\n",
|
||||
"t = df.style.hide_index()\n",
|
||||
"#To have html code compatible with our script\n",
|
||||
"html = t._repr_html_().split('</style>')[1]\n",
|
||||
"html = re.sub(r'<table id=\"([^\"]+)\"\\s*>', r'<table >', html)\n",
|
||||
"display(HTML(html))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Negative Log Likelihood"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -476,7 +510,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"learn = cnn_learner(dls, resnet34, metrics=error_rate)\n",
|
||||
"lr_min,lr_steep = learn.lr_find()"
|
||||
"lr_min,lr_steep = learn.lr_find(suggest_funcs=(minimum, steep))"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -675,11 +709,11 @@
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user