From 398288594b0aeb6388dedc199482645590618be1 Mon Sep 17 00:00:00 2001 From: cjon256 <3659487+cjon256@users.noreply.github.com> Date: Tue, 15 Aug 2023 16:42:33 -0400 Subject: [PATCH 1/2] Update 10_nlp.ipynbto say 'spaces' In the videos it is quite clear spaces is what in meant here. --- 10_nlp.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/10_nlp.ipynb b/10_nlp.ipynb index 2fddb5f..94d109b 100644 --- a/10_nlp.ipynb +++ b/10_nlp.ipynb @@ -132,7 +132,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "When we said \"convert the text into a list of words,\" we left out a lot of details. For instance, what do we do with punctuation? How do we deal with a word like \"don't\"? Is it one word, or two? What about long medical or chemical words? Should they be split into their separate pieces of meaning? How about hyphenated words? What about languages like German and Polish where we can create really long words from many, many pieces? What about languages like Japanese and Chinese that don't use bases at all, and don't really have a well-defined idea of *word*?\n", + "When we said \"convert the text into a list of words,\" we left out a lot of details. For instance, what do we do with punctuation? How do we deal with a word like \"don't\"? Is it one word, or two? What about long medical or chemical words? Should they be split into their separate pieces of meaning? How about hyphenated words? What about languages like German and Polish where we can create really long words from many, many pieces? What about languages like Japanese and Chinese that don't use spaces at all, and don't really have a well-defined idea of *word*?\n", "\n", "Because there is no one correct answer to these questions, there is no one approach to tokenization. There are three main approaches:\n", "\n", From e77377191fc6344d307146e74dade8aeb3789369 Mon Sep 17 00:00:00 2001 From: cjon256 <3659487+cjon256@users.noreply.github.com> Date: Thu, 24 Aug 2023 18:23:28 -0400 Subject: [PATCH 2/2] 04_mnist_basics.ipynb mean() argument When mean() takes a tuple argument it treats it as a list of axes to reduce, not as a range of axes. --- 04_mnist_basics.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/04_mnist_basics.ipynb b/04_mnist_basics.ipynb index 675bb5b..846743f 100644 --- a/04_mnist_basics.ipynb +++ b/04_mnist_basics.ipynb @@ -2239,7 +2239,7 @@ "\n", "Next in `mnist_distance` we see `abs`. You might be able to guess now what this does when applied to a tensor. It applies the method to each individual element in the tensor, and returns a tensor of the results (that is, it applies the method \"elementwise\"). So in this case, we'll get back 1,010 matrices of absolute values.\n", "\n", - "Finally, our function calls `mean((-1,-2))`. The tuple `(-1,-2)` represents a range of axes. In Python, `-1` refers to the last element, and `-2` refers to the second-to-last. So in this case, this tells PyTorch that we want to take the mean ranging over the values indexed by the last two axes of the tensor. The last two axes are the horizontal and vertical dimensions of an image. After taking the mean over the last two axes, we are left with just the first tensor axis, which indexes over our images, which is why our final size was `(1010)`. In other words, for every image, we averaged the intensity of all the pixels in that image.\n", + "Finally, our function calls `mean((-1,-2))`. The tuple `(-1,-2)` represents a list of axes. In Python, `-1` refers to the last element, and `-2` refers to the second-to-last. So in this case, this tells PyTorch that we want to take the mean ranging over the values indexed by the last two axes of the tensor. The last two axes are the horizontal and vertical dimensions of an image. After taking the mean over the last two axes, we are left with just the first tensor axis, which indexes over our images, which is why our final size was `(1010)`. In other words, for every image, we averaged the intensity of all the pixels in that image.\n", "\n", "We'll be learning lots more about broadcasting throughout this book, especially in <>, and will be practicing it regularly too.\n", "\n",