add back stuff

2025-04-09 04:10:44 +00:00 · 2022-03-05 17:55:03 -08:00 · 2022-03-05 17:55:03 -08:00 · ee0ddf4cdb
commit ee0ddf4cdb
parent 8bf193ab41
1 changed files with 15 additions and 1 deletions
--- a/05_pet_breeds.ipynb
+++ b/05_pet_breeds.ipynb
@ -1277,6 +1277,13 @@
    "plot_function(torch.log, min=0,max=1, ty='log(x)', tx='x')"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Additionally, we want to ensure our model is able to detect differences between small numbers.  For example, consider the probabilities of .01 and .001.  Indeed, those numbers are very close together—but in another sense, 0.01 is 10 times more confident than 0.001.  By taking the log of our probabilities, we prevent these important differences from being ignored."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -1435,6 +1442,13 @@
    "> s: There are other loss functions such as [focal loss](https://arxiv.org/pdf/1708.02002.pdf) that allow you control this penalty with a parameter.  We do not discuss that loss function in this book."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We're calculating the loss from the column containing the correct label. Because there is only one \"right\" answer per example, we don't need to consider the other columns, because by the definition of softmax, they add up to 1 minus the activation corresponding to the correct label. As long as the activation columns sum to 1 (as they will, if we use softmax), then we'll have a loss function that shows how well we're predicting each digit.  Therefore, making the activation for the correct label as high as possible must mean we're also decreasing the activations of the remaining columns.  "
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},