mirror of
https://github.com/fastai/fastbook.git
synced 2025-04-05 10:20:48 +00:00
update
This commit is contained in:
parent
57c5dad7c0
commit
9ff17f393b
@ -69,14 +69,14 @@
|
||||
"\n",
|
||||
"If you take this approach, then you will be on your third iteration of learning and improving whilst the perfectionists are still in the planning stages!\n",
|
||||
"\n",
|
||||
"We also suggest that you iterate from end to end in your project; that is, don't spend months fine tuning your model, or polishing the perfect GUI, or labelling the perfect dataset… Instead, complete every step as well as you can in a reasonable amount of time, all the way to the end. For instance, if your final goal is an application that runs on a mobile phone, then that should be what you have after each iteration. But perhaps in the early iterations you take some shortcuts, for instance by doing all of the processing on a remote server, and using a simple responsive web application. By completing the project and to end, you will see where the most tricky bits are, and which bits make the biggest difference to the final result."
|
||||
"We also suggest that you iterate from end to end in your project; that is, don't spend months fine tuning your model, or polishing the perfect GUI, or labelling the perfect dataset… Instead, complete every step as well as you can in a reasonable amount of time, all the way to the end. For instance, if your final goal is an application that runs on a mobile phone, then that should be what you have after each iteration. But perhaps in the early iterations you take some shortcuts, for instance by doing all of the processing on a remote server, and using a simple responsive web application. By completing the project end to end, you will see where the most tricky bits are, and which bits make the biggest difference to the final result."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As you work through this book, we suggest that you both complete lots of small experiments, by running and adjusting the notebooks we provide, at the same time that you gradually develop your own projects. That way, you will be getting experience with all of the tools and techniques that were explaining, as we discuss them.\n",
|
||||
"As you work through this book, we suggest that you both complete lots of small experiments, by running and adjusting the notebooks we provide, at the same time that you gradually develop your own projects. That way, you will be getting experience with all of the tools and techniques that we're explaining, as we discuss them.\n",
|
||||
"\n",
|
||||
"> s: To make the most of this book, take the time to experiment between each chapter, be it on your own project or exploring the notebooks we provide. Then try re-writing those notebooks from scratch on a new dataset. It's only by practicing (and failing) a lot that you will get an intuition on how to train a model. \n",
|
||||
"\n",
|
||||
@ -91,9 +91,9 @@
|
||||
"source": [
|
||||
"Since it is easiest to get started on a project where you already have data available, that means it's probably easiest to get started on a project related to something you are already doing, because you already have data about things that you are doing. For instance, if you work in the music business, you may have access to many recordings. If you work as a radiologist, you probably have access to lots of medical images. If you are interested in wildlife preservation, you may have access to lots of images of wildlife.\n",
|
||||
"\n",
|
||||
"Sometimes, you have to get a bit creative. Maybe you can find some previous machine learning project, such as a Kaggle competition, that is related to your field of interest. Sometimes, you have to compromize. Maybe you can't find the exact data you need for the precise project you have in mind; but you might be able to find something from a similar domain, or measured in a different way, tackling a slightly different problem. Working on these kinds of similar projects will still give you a good understanding of the overall process, and may help you identify other shortcuts, data sources, and so forth.\n",
|
||||
"Sometimes, you have to get a bit creative. Maybe you can find some previous machine learning project, such as a Kaggle competition, that is related to your field of interest. Sometimes, you have to compromise. Maybe you can't find the exact data you need for the precise project you have in mind; but you might be able to find something from a similar domain, or measured in a different way, tackling a slightly different problem. Working on these kinds of similar projects will still give you a good understanding of the overall process, and may help you identify other shortcuts, data sources, and so forth.\n",
|
||||
"\n",
|
||||
"Especially when you are just starting out with deep learning it's not a good idea to branch out into very different areas to places that deep learning has not been applied to before. That's because if your model does not work at first, you will not know whether it is because you have made a mistake, or if the very problem you are trying to solve is simply not solvable with deep learning. And you won't know where to look to get help. Therefore, it is best at first to start with something where you can find an example online of somebody who has had good results with something that is at least somewhat similar to what you are trying to achieve, or where you can convert your data into a format similar what someone else has used before (such as creating an image from your data). Let's have a look at the state of deep learning, jsut so you know what kinds of things deep learning is good at right now."
|
||||
"Especially when you are just starting out with deep learning it's not a good idea to branch out into very different areas to places that deep learning has not been applied to before. That's because if your model does not work at first, you will not know whether it is because you have made a mistake, or if the very problem you are trying to solve is simply not solvable with deep learning. And you won't know where to look to get help. Therefore, it is best at first to start with something where you can find an example online of somebody who has had good results with something that is at least somewhat similar to what you are trying to achieve, or where you can convert your data into a format similar what someone else has used before (such as creating an image from your data). Let's have a look at the state of deep learning, just so you know what kinds of things deep learning is good at right now."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -107,7 +107,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's start by considering whether deep learning can be any good at the problem you are looking to work on. In general, here is a summary of the state of deep learning is at the start of 2020. However, things move very fast, and by the time you read this some of these constraints may no longer exist. We will try to keep the book website up-to-date; in addition, a Google search for \"what can AI do now\" there is likely to provide some up-to-date information."
|
||||
"Let's start by considering whether deep learning can be any good at the problem you are looking to work on. In general, here is a summary of the state of deep learning at the start of 2020. However, things move very fast, and by the time you read this some of these constraints may no longer exist. We will try to keep the book website up-to-date; in addition, a Google search for \"what can AI do now\" there is likely to provide some up-to-date information."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -159,7 +159,7 @@
|
||||
"source": [
|
||||
"The ability of deep learning to combine text and images into a single model is, generally, far better than most people intuitively expect. For example, a deep learning model can be trained on input images, and output captions written in English, and can learn to generate surprisingly appropriate captions automatically for new images! But again, we have the same warning that we discussed in the previous section: there is no guarantee that these captions will actually be correct.\n",
|
||||
"\n",
|
||||
"Because of this serious issue we generally recommend that deep learning be used not as an entirely automated process, but as part of a process in which the model and a human user interact closely. This can potentially make humans orders of magnitude more productive than they would be with entirely manual methods, and actually result in more accurate processes than using a human alone. For instance, an automatic system can be used to identify potential strokes directly from CT scans, send a high priority alert to have potential/scans looked at quickly. There is only a three-hour window to treat strokes, so this fast feedback loop could save lives. At the same time, however, all scans could continue to be sent to radiologists in the usual way, so there would be no reduction in human input. Other deep learning models could automatically measure items seen on the scan, and insert those measurements into reports, warning the radiologist about findings that they may have missed, and tell the radiologist about other cases which might be relevant."
|
||||
"Because of this serious issue we generally recommend that deep learning be used not as an entirely automated process, but as part of a process in which the model and a human user interact closely. This can potentially make humans orders of magnitude more productive than they would be with entirely manual methods, and actually result in more accurate processes than using a human alone. For instance, an automatic system can be used to identify potential strokes directly from CT scans, and send a high priority alert to have those scans looked at quickly. There is only a three-hour window to treat strokes, so this fast feedback loop could save lives. At the same time, however, all scans could continue to be sent to radiologists in the usual way, so there would be no reduction in human input. Other deep learning models could automatically measure items seen on the scan, and insert those measurements into reports, warning the radiologist about findings that they may have missed, and tell the radiologist about other cases which might be relevant."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -173,7 +173,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For analysing timeseries and tabular data, deep learning has recently been making great strides. However, deep learning is generally used as part of an ensemble of multiple types of model. If you already have a system that is using random forests or gradient boosting machines (popular tabular modelling tools that we will learn about soon) then switching to, or adding, deep learning may not result in any dramatic improvement. Deep learning does greatly increase the variety of columns that you can include, for example columns containing natural language (e.g. book titles, reviews, etc), and *high cardinality categorical* columns (i.e. something that contains a large number of discrete choices, such as zip code or product id). On the downside, deep learning models generally take longer to train than random forests or gradient boosting machines, although this is changing thanks to libraries such as [RAPIDS](https://rapids.ai/), which provides GPU acceleration for the whole modeling pipeline. We cover the pros and cons of all these methods in detail in <<chapter_tabular>> in this book."
|
||||
"For analysing timeseries and tabular data, deep learning has recently been making great strides. However, deep learning is generally used as part of an ensemble of multiple types of model. If you already have a system that is using random forests or gradient boosting machines (popular tabular modelling tools that we will learn about soon) then switching to, or adding, deep learning may not result in any dramatic improvement. Deep learning does greatly increase the variety of columns that you can include, for example columns containing natural language (e.g. book titles, reviews, etc.), and *high cardinality categorical* columns (i.e. something that contains a large number of discrete choices, such as zip code or product id). On the downside, deep learning models generally take longer to train than random forests or gradient boosting machines, although this is changing thanks to libraries such as [RAPIDS](https://rapids.ai/), which provides GPU acceleration for the whole modeling pipeline. We cover the pros and cons of all these methods in detail in <<chapter_tabular>> in this book."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -246,7 +246,7 @@
|
||||
"\n",
|
||||
"Finally, you could build two **models** for purchase probabilities, conditional on seeing or not seeing a recommendation. The difference between these two probabilities is a utility function for a given recommendation to a customer. It will be low in cases where the algorithm recommends a familiar book that the customer has already rejected (both components are small) or a book that he or she would have bought even without the recommendation (both components are large and cancel each other out).\n",
|
||||
"\n",
|
||||
"As you can see, in practice often the practical implementation of your model will require a lot more than just training a model! You'll often need to run experiments to collect more data, and consider how to incorporate your models into the overall system you're developing. Speaking of data, let's now focus on how to find find data for your project."
|
||||
"As you can see, in practice often the practical implementation of your model will require a lot more than just training a model! You'll often need to run experiments to collect more data, and consider how to incorporate your models into the overall system you're developing. Speaking of data, let's now focus on how to find data for your project."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -260,7 +260,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For many types of projects, you may be able to find all the data you need online. The project we'll be completing in this chapter is a *bear detector*. It will discriminate between three types of bear: grizzly, black, and teddy bear. There are many images on the Internet of each type of bear we can use. We just need a way to find them and download them. We've provided a tool you can use for this purpose, so you can follow along with this chapter, creating your own image recognition application for whatever kinds of object you're interested in. In the fast.ai course, thousands of students have presented their work on the course forums, displaying everything from Trinidad hummingbird varieties, to Panama bus types, and even an application that helped one student let his fiancee recognize his sixteen cousins during Christmas vacation!"
|
||||
"For many types of projects, you may be able to find all the data you need online. The project we'll be completing in this chapter is a *bear detector*. It will discriminate between three types of bear: grizzly, black, and teddy bear. There are many images on the Internet of each type of bear we can use. We just need a way to find them and download them. We've provided a tool you can use for this purpose, so you can follow along with this chapter, creating your own image recognition application for whatever kinds of object you're interested in. In the fast.ai course, thousands of students have presented their work on the course forums, displaying everything from Trinidad hummingbird varieties, to Panama bus types, and even an application that helped one student let his fiancée recognize his sixteen cousins during Christmas vacation!"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -550,7 +550,7 @@
|
||||
"File: ~/git/fastai/fastai/vision/utils.py\n",
|
||||
"Type: function\n",
|
||||
"```\n",
|
||||
"It tells us what argument the function accepts (`fns`) then shows us the source code and the file it comes from. Looking at that source code, we can see it applies the function `verify_image` in parallel and only keep the ones for which the result of that function is `False`, which is consistent with the doc string: it finds the images in `fns` that can't be opened.\n",
|
||||
"It tells us what argument the function accepts (`fns`) then shows us the source code and the file it comes from. Looking at that source code, we can see it applies the function `verify_image` in parallel and only keeps the ones for which the result of that function is `False`, which is consistent with the doc string: it finds the images in `fns` that can't be opened.\n",
|
||||
"\n",
|
||||
"Here are the commands that are very useful in Jupyter notebooks:\n",
|
||||
"\n",
|
||||
@ -558,7 +558,7 @@
|
||||
"- when inside the parenthesis of a function, pressing \"shift\" and \"tab\" simultaneously will display a window with the signature of the function and a short documentation. Pressing it twice will expand the documentation and pressing it three times will open a full window with the same information at the bottom of your screen.\n",
|
||||
"- in a cell, typing `?func_name` and executing will open a window with the signature of the function and a short documentation.\n",
|
||||
"- in a cell, typing `??func_name` and executing will open a window with the signature of the function, a short documentation and the source code.\n",
|
||||
"- if you are using the fasti library, we added a `doc` function for you, executing `doc(func_name)` in a cell will open a window with the signature of the function, a short documentation and links to the source code on GitHub and the full documentation of the funciton in the [documentation of the library](https://docs.fast.ai).\n",
|
||||
"- if you are using the fastai library, we added a `doc` function for you, executing `doc(func_name)` in a cell will open a window with the signature of the function, a short documentation and links to the source code on GitHub and the full documentation of the function in the [documentation of the library](https://docs.fast.ai).\n",
|
||||
"- unrelated to the documentation but still very useful to get help, at any point, if you get an error, type `%debug` in the next cell and execute to open the [python debugger](https://docs.python.org/3/library/pdb.html) that will let you inspect the content of every variable."
|
||||
]
|
||||
},
|
||||
@ -587,7 +587,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data.footnote:[Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.]"
|
||||
"So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data. footnote:[Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -800,7 +800,7 @@
|
||||
"\n",
|
||||
"Instead, what we normally do in practice is to randomly select part of the image, and crop to just that part. On each epoch (which is one complete pass through all of our images in the dataset) we randomly select a different part of each image. This means that our model can learn to focus on, and recognize, different features in our images. It also reflects how images work in the real world; different photos of the same thing may be framed in slightly different ways.\n",
|
||||
"\n",
|
||||
"Here is a another copy of the previous examples, but this time we are replacing `Resize` with `RandomResizedCrop`, which is the transform that provides the behaviour described above.The most important parameter to pass in is the `min_scale` parameter, which determines how much of the image to select at minimum each time."
|
||||
"Here is another copy of the previous examples, but this time we are replacing `Resize` with `RandomResizedCrop`, which is the transform that provides the behaviour described above. The most important parameter to pass in is the `min_scale` parameter, which determines how much of the image to select at minimum each time."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -855,7 +855,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Data augmentation refers to creating random variations of our input data, such that they appear different, but are not expected to change the meaning of the data. Examples of common data augmentation for images are rotation, flipping, perspective warping, brightness changes, contrast changes, and much more. For natural photo images such as the ones we are using here, there is a standard set of augmentations which we have found work pretty well, and are provided with the `aug_transforms` function. Because the images are now all the same size, we can apply these augmentations to an entire batch of them using the GPU, which will save a lot of time. To tell fastai we want to use these transforms to a batch, we use the `batch_tfms` parameter. (Note that's we're not using `RandomResizedCrop` in this example, so you can see the differences more clearly; we're also using double the amount of augmentation compared to the default, for the same reason)."
|
||||
"Data augmentation refers to creating random variations of our input data, such that they appear different, but are not expected to change the meaning of the data. Examples of common data augmentation for images are rotation, flipping, perspective warping, brightness changes, contrast changes, and much more. For natural photo images such as the ones we are using here, there is a standard set of augmentations which we have found work pretty well, and are provided with the `aug_transforms` function. Because the images are now all the same size, we can apply these augmentations to an entire batch of them using the GPU, which will save a lot of time. To tell fastai we want to use these transforms to a batch, we use the `batch_tfms` parameter. (Note that we're not using `RandomResizedCrop` in this example, so you can see the differences more clearly; we're also using double the amount of augmentation compared to the default, for the same reason)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -901,7 +901,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Time to use the same lined of codes as in <<chapter_intro>> to train our bear classifier.\n",
|
||||
"Time to use the same lines of code as in <<chapter_intro>> to train our bear classifier.\n",
|
||||
"\n",
|
||||
"We don't have a lot of data for our problem (150 pictures of each sort of bear at most), so to train our model, we'll use `RandomResizedCrop` and default `aug_transforms` for our model, on an image size of 224px, which is fairly standard for image classification."
|
||||
]
|
||||
@ -1065,7 +1065,7 @@
|
||||
"source": [
|
||||
"Each row here represents all the black, grizzly, and teddy bears in our dataset, respectively. Each column represents the images which the model predicted as black, grizzly, and teddy bears, respectively. Therefore, the diagonal of the matrix shows the images which were classified correctly, and the other, off diagonal, cells represent those which were classified incorrectly. This is called a *confusion matrix* and is one of the many ways that fastai allows you to view the results of your model. It is (of course!) calculated using the validation set. With the color coding, the goal is to have white everywhere, except the diagonal where we want dark blue. Our bear classifier isn't making many mistakes!\n",
|
||||
"\n",
|
||||
"It's helpful to see where exactly our errors are occuring, to see whether it's due to a dataset problem (e.g. images that aren't bears at all, or are labelled incorrectly, etc), or a model problem (e.g. perhaps it isn't handling images taken with unusual lighting, or from a different angle, etc). To do this, we can sort out images by their *loss*.\n",
|
||||
"It's helpful to see where exactly our errors are occurring, to see whether it's due to a dataset problem (e.g. images that aren't bears at all, or are labelled incorrectly, etc.), or a model problem (e.g. perhaps it isn't handling images taken with unusual lighting, or from a different angle, etc.). To do this, we can sort out images by their *loss*.\n",
|
||||
"\n",
|
||||
"The *loss* is a number that is higher if the model is incorrect (and especially if it's also confident of its incorrect answer), or if it's correct, but not confident of its correct answer. In a couple chapters we'll learn in depth how loss is calculated and used in training process. For now, `plot_top_losses` shows us the images with the highest loss in our dataset. As the title of the output says, each image is labeled with four things: prediction, actual (target label), loss, and probability. The *probability* here is the confidence level, from zero to one, that the model has assigned to its prediction."
|
||||
]
|
||||
@ -1172,7 +1172,7 @@
|
||||
"for idx,cat in cleaner.change(): shutil.move(cleaner.fns[idx], path/cat)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"> s: Cleaning the data or getting it ready for your model are two of the biggest challenges for data scientists, one they say take 90% of their time. The fastai library aims at providing tools to make it as easy as possible.\n",
|
||||
"> s: Cleaning the data or getting it ready for your model are two of the biggest challenges for data scientists; they say it takes 90% of their time. The fastai library aims at providing tools to make it as easy as possible.\n",
|
||||
"\n",
|
||||
"We'll be seeing more examples of model-driven data cleaning throughout this book. Once we've cleaned up our data, we can retrain our model. Try it yourself, and see if your accuracy improves!"
|
||||
]
|
||||
@ -1354,7 +1354,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We know how to make predictions from our saved model, so we have everything we need to start building our app. We can do it directly in a Jupyter Notenook."
|
||||
"We know how to make predictions from our saved model, so we have everything we need to start building our app. We can do it directly in a Jupyter Notebook."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1377,7 +1377,7 @@
|
||||
"\n",
|
||||
"*IPython widgets* are GUI components that bring together JavaScript and Python functionality in a web browser, and can be created and used within a Jupyter notebook. For instance, the image cleaner that we saw earlier in this chapter is entirely written with IPython widgets. However, we don't want to require users of our application to have to run Jupyter themselves.\n",
|
||||
"\n",
|
||||
"That is why *Voilà* exists. It is a system for making applications consisting of IPython widgets available to end-users, without them having to use Jupyter at all. Voila is taking advantage of the fact that a notebook _already is_ a kind of web application, just a rather complex one that depends on another web application Jupyter itself. Essentially, it helps us automatically convert the complex web application which we've already implicitly made (the notebook) into a simpler, easier-to-deploy web application, which functions like a normal web application rather than like a notebook.\n",
|
||||
"That is why *Voilà* exists. It is a system for making applications consisting of IPython widgets available to end-users, without them having to use Jupyter at all. Voilà is taking advantage of the fact that a notebook _already is_ a kind of web application, just a rather complex one that depends on another web application, Jupyter itself. Essentially, it helps us automatically convert the complex web application which we've already implicitly made (the notebook) into a simpler, easier-to-deploy web application, which functions like a normal web application rather than like a notebook.\n",
|
||||
"\n",
|
||||
"But we still have the advantage of developing in a notebook. So with ipywidgets, we can build up our GUI step by step. We will use this approach to create a simple image classifier. First, we need a file upload widget:"
|
||||
]
|
||||
@ -1644,7 +1644,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We have written all the code necessary for our app. The next step is to convert it in something we can deploy."
|
||||
"We have written all the code necessary for our app. The next step is to convert it into something we can deploy."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1660,16 +1660,16 @@
|
||||
"source": [
|
||||
"Now that we have everything working in this Jupyter notebook, we can create our application. To do this, create a notebook which contains only the code needed to create and show the widgets that you need, and markdown for any text that you want to appear. Have a look at the *bear_classifier* notebook in the book repo to see the simple notebook application we created.\n",
|
||||
"\n",
|
||||
"Next, install Voila if you have not already, by copying these lines into a Notebook cell, and executing it (if you're comfortable using the command line, you can also execute these two lines in your terminal, without the `!` prefix):\n",
|
||||
"Next, install Voilà if you have not already, by copying these lines into a Notebook cell, and executing it (if you're comfortable using the command line, you can also execute these two lines in your terminal, without the `!` prefix):\n",
|
||||
"\n",
|
||||
" !pip install voila\n",
|
||||
" !jupyter serverextension enable voila --sys-prefix\n",
|
||||
"\n",
|
||||
"Cells which begin with a `!` do not contain Python code, but instead contain code which is passed to your shell, such as bash, power shell in windows, or so forth. If you are comfortable using the command line (which we'll be learning about later in this book), you can of course simply type these two lines (without the `!` prefix) directly into your terminal. In this case, the first line installs the voila library and application, and the second connects it to your existing Jupyter notebook.\n",
|
||||
"\n",
|
||||
"Voila runs Jupyter notebooks, just like the Jupyter notebook server you are using now does, except that it does something very important: it removes all of the cell inputs, and only shows output (including ipywidgets), along with your markdown cells. So what's left is a web application! To view your notebook as a voila web application replace the word \"notebooks\" in your browser's URL with: \"voila/render\". You will see the same content as your notebook, but without any of the code cells.\n",
|
||||
"Voilà runs Jupyter notebooks, just like the Jupyter notebook server you are using now does, except that it does something very important: it removes all of the cell inputs, and only shows output (including ipywidgets), along with your markdown cells. So what's left is a web application! To view your notebook as a voila web application replace the word \"notebooks\" in your browser's URL with: \"voila/render\". You will see the same content as your notebook, but without any of the code cells.\n",
|
||||
"\n",
|
||||
"Of course, you don't need to use Voila or ipywidgets. Your model is just a function you can call: `pred,pred_idx,probs = learn.predict(img)` . So you can use it with any framework, hosted on any platform. And you can take something you've prototyped in ipywidgets and Voila and later convert it into a regular web application. We're showing you this approach in the book because we think it's a great way for data scientists and other folks that aren't web development experts to create applications from their models.\n",
|
||||
"Of course, you don't need to use Voilà or ipywidgets. Your model is just a function you can call: `pred,pred_idx,probs = learn.predict(img)` . So you can use it with any framework, hosted on any platform. And you can take something you've prototyped in ipywidgets and Voilà and later convert it into a regular web application. We're showing you this approach in the book because we think it's a great way for data scientists and other folks that aren't web development experts to create applications from their models.\n",
|
||||
"\n",
|
||||
"We have our app, now let's deploy it!"
|
||||
]
|
||||
@ -1692,7 +1692,7 @@
|
||||
"- The complexities of dealing with GPU inference are significant. In particular, the GPU's memory will need careful manual management, and you'll need some careful queueing system to ensure you only do one batch at a time\n",
|
||||
"- There's a lot more market competition in CPU servers than GPU, as a result of which there's much cheaper options available for CPU servers.\n",
|
||||
"\n",
|
||||
"Because of the complexity of GPU serving, many systems have sprung up to try to automate this. However, managing and running these systems is themselves complex, and generally requires compiling your model into a different form that's specialized for that system. It doesn't make sense to deal with this complexity until/unless your app gets popular enough that it makes clear financial sense for you to do so."
|
||||
"Because of the complexity of GPU serving, many systems have sprung up to try to automate this. However, managing and running these systems is also complex, and generally requires compiling your model into a different form that's specialized for that system. It doesn't make sense to deal with this complexity until/unless your app gets popular enough that it makes clear financial sense for you to do so."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1720,7 +1720,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The first time you do this Binder will take around 5 minutes to build your site. In other words, is it finding a virtual machine which can run your app, allocating storage, collecting the files needed for Jupyter, for your notebook, and for presenting your notebook as a web application. It's doing all of this behind the scenes.\n",
|
||||
"The first time you do this Binder will take around 5 minutes to build your site. In other words, it is finding a virtual machine which can run your app, allocating storage, collecting the files needed for Jupyter, for your notebook, and for presenting your notebook as a web application. It's doing all of this behind the scenes.\n",
|
||||
"\n",
|
||||
"Finally, once it has started the app running, it will navigate your browser to your new web app. You can share the URL you copied to allow others to access your app as well.\n",
|
||||
"\n",
|
||||
@ -1731,11 +1731,11 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You may well want to deploy your application onto mobile devices, or edge devices such as a Raspberry Pi. There are a lot of libraries and frameworks to allow you to integrate a model directly into a mobile application. However these approaches tend to require a lot of extra steps and boilerplate, and do not always support all the PyTorch and fastai layers that your model might use. In addition, the work you do will depend on what kind of mobile devices you are targeting for deployment. So you might need to do some work to run on iOS devices, different work to run on newer Android devices, different work for older Android devices, etc.. Instead, we recommend wherever possible that you deploy the model itself to a server, and have your mobile or edge application connect to it as a web service.\n",
|
||||
"You may well want to deploy your application onto mobile devices, or edge devices such as a Raspberry Pi. There are a lot of libraries and frameworks to allow you to integrate a model directly into a mobile application. However these approaches tend to require a lot of extra steps and boilerplate, and do not always support all the PyTorch and fastai layers that your model might use. In addition, the work you do will depend on what kind of mobile devices you are targeting for deployment. So you might need to do some work to run on iOS devices, different work to run on newer Android devices, different work for older Android devices, etc. Instead, we recommend wherever possible that you deploy the model itself to a server, and have your mobile or edge application connect to it as a web service.\n",
|
||||
"\n",
|
||||
"There is quite a few upsides to this approach. The initial installation is easier, because you only have to deploy a small GUI application, which connects to the server to do all the heavy lifting. More importantly perhaps, upgrades of that core logic can happen on your server, rather than needing to be distributed to all of your users. Your server can have a lot more memory and processing capacity than most edge devices, and it is far easier to scale those resources if your model becomes more demanding. The hardware that you will have on a server is going to be more standard and more easily supported by fastai and PyTorch, so you don't have to compile your model into a different form.\n",
|
||||
"\n",
|
||||
"There are downsides too, of course. Your application will require a network connection, and there will be some latency each time the model is called. It takes a while for a neural network model to run anyway, so this additional network latency may not make a big difference to your users in practice. In fact, since you can use better hardware on the server, the overall latency may even be less! If your application uses sensitive data then your users may be concerned about an approach which sends that data to a remote server, so sometimes privacy considerations will mean that you need to run the model on the edge device. Sometimes this can be avoided by having an *on premise* server, such as inside a company's firewall. Managing the complexity and scaling the server can create additional overhead, whereas if your model runs on the edge devices then each user is bringing their own compute resources, which leads to easier scaling with an increasing number of users (also known as _horizontal scaling_)."
|
||||
"There are downsides too, of course. Your application will require a network connection, and there will be some latency each time the model is called. It takes a while for a neural network model to run anyway, so this additional network latency may not make a big difference to your users in practice. In fact, since you can use better hardware on the server, the overall latency may even be less! If your application uses sensitive data then your users may be concerned about an approach which sends that data to a remote server, so sometimes privacy considerations will mean that you need to run the model on the edge device. Sometimes this can be avoided by having an *on premise* server, such as inside a company's firewall. Managing the complexity and scaling the server can create additional overhead, whereas if your model runs on the edge devices then each user is bringing their own compute resources, which leads to easier scaling with an increasing number of users (also known as *horizontal scaling*)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1746,7 +1746,7 @@
|
||||
"\n",
|
||||
"Overall, we'd recommend using a simple CPU-based server approach where possible, for as long as you can get away with it. If you're lucky enough to have a very successful application, then you'll be able to justify the investment in more complex deployment approaches at that time.\n",
|
||||
"\n",
|
||||
"Congratulations, you have succesfully built a deep learning model and deployed it! Now is a good time to take a pause and think about what could go wrong."
|
||||
"Congratulations, you have successfully built a deep learning model and deployed it! Now is a good time to take a pause and think about what could go wrong."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1762,7 +1762,7 @@
|
||||
"source": [
|
||||
"In practice, a deep learning model will be just one piece of a much bigger system. As we discussed at the start of this chapter, a *data product* requires thinking about the entire end to end process within which our model lives. In this book, we can't hope to cover all the complexity of managing deployed data products, such as managing multiple versions of models, A/B testing, canarying, refreshing the data (should we just grow and grow our datasets all the time, or should we regularly remove some of the old data), handling data labelling, monitoring all this, detecting model rot, and so forth. However, there is an excellent book that covers many deployment issues, which is [Building Machine Learning Powered Applications](https://www.amazon.com/Building-Machine-Learning-Powered-Applications/dp/149204511X), by Emmanuel Ameisen. In this section, we will give an overview of some of the most important issues to consider.\n",
|
||||
"\n",
|
||||
"One of the biggest issues with this is that understanding and testing the behavior of a deep learning model is much more difficult than most code that you would write. With normal software development you can analyse the exact steps that the software is taking, and carefully study with of these steps match the desired behaviour that you are trying to create. But with a neural network the behavior emerges from the models attempt to match the training data, rather than being exactly defined.\n",
|
||||
"One of the biggest issues with this is that understanding and testing the behavior of a deep learning model is much more difficult than most code that you would write. With normal software development you can analyse the exact steps that the software is taking, and carefully study which of these steps match the desired behaviour that you are trying to create. But with a neural network the behavior emerges from the models attempt to match the training data, rather than being exactly defined.\n",
|
||||
"\n",
|
||||
"This can result in disaster! For instance, let's say you really were rolling out a bear detection system which will be attached to video cameras around the campsite, and will warn campers of incoming bears. If we used a model trained with the dataset we downloaded, there are going to be all kinds of problems in practice, such as:\n",
|
||||
"\n",
|
||||
@ -1808,7 +1808,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> j: I started a company 20 years ago called _Optimal Decisions_ which used machine learning and optimisation to help giant insurance companies set their pricing, impacting tens of billions of dollars of risks. We used the approaches described above to manage the potential downsides of something that might go wrong. Also, before we worked with our clients to put anything in production, we tried to simulate the impact by testing the end to end system on their previous year's data. It was always quite a nerve-wracking process, putting these new algorithms in production, but every rollout was successful."
|
||||
"> J: I started a company 20 years ago called _Optimal Decisions_ which used machine learning and optimisation to help giant insurance companies set their pricing, impacting tens of billions of dollars of risks. We used the approaches described above to manage the potential downsides of something that might go wrong. Also, before we worked with our clients to put anything in production, we tried to simulate the impact by testing the end to end system on their previous year's data. It was always quite a nerve-wracking process, putting these new algorithms in production, but every rollout was successful."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1822,7 +1822,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One of the biggest challenges in rolling out a model is that your model may change the behaviour of the system it is a part of. For instance, consider a \"predictive policing\" algorithm that predicts more crime in certain neighborhoods, causing more police officers to be sent to those neighborhoods, which can result in more crime being recorded in those neighborhoods, and so on. In the Royal Statiscal Society paper [To predict and serve](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x), Kristian Lum and William Isaac write: \"predictive policing is aptly named: it is predicting future policing, not future crime\".\n",
|
||||
"One of the biggest challenges in rolling out a model is that your model may change the behaviour of the system it is a part of. For instance, consider a \"predictive policing\" algorithm that predicts more crime in certain neighborhoods, causing more police officers to be sent to those neighborhoods, which can result in more crime being recorded in those neighborhoods, and so on. In the Royal Statistical Society paper [To predict and serve](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x), Kristian Lum and William Isaac write: \"predictive policing is aptly named: it is predicting future policing, not future crime\".\n",
|
||||
"\n",
|
||||
"Part of the issue in this case is that in the presence of *bias* (which we'll discuss in depth in the next chapter), feedback loops can result in negative implications of that bias getting worse and worse. For instance, there are concerns that this is already happening in the US, where there is significant bias in arrest rates on racial grounds. [According to the ACLU](https://www.aclu.org/issues/smart-justice/sentencing-reform/war-marijuana-black-and-white), \"despite roughly equal usage rates, Blacks are 3.73 times more likely than whites to be arrested for marijuana\". The impact of this bias, along with the roll-out of predictive policing algorithms in many parts of the US, led Bärí Williams to [write in the NY Times](https://www.nytimes.com/2017/12/02/opinion/sunday/intelligent-policing-and-my-innocent-children.html): \"The same technology that’s the source of so much excitement in my career is being used in law enforcement in ways that could mean that in the coming years, my son, who is 7 now, is more likely to be profiled or arrested — or worse — for no reason other than his race and where we live.\"\n",
|
||||
"\n",
|
||||
@ -1874,7 +1874,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"1. Provide an example of where the bear classification model might work poorly, due to structural or style differences to the training data\n",
|
||||
"1. Provide an example of where the bear classification model might work poorly, due to structural or style differences to the training data.\n",
|
||||
"1. Where do text models currently have a major deficiency?\n",
|
||||
"1. What are possible negative societal implications of text generation models?\n",
|
||||
"1. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?\n",
|
||||
@ -1929,9 +1929,6 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
|
@ -50,7 +50,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> j: At university, philosophy of ethics was my main thing (it would have been the topic of my thesis, if I'd finished it, instead of dropping out to join the real-world). Based on the years I spent studying ethics, I can tell you this: no one really agrees on what right and wrong are, whether they exist, how to spot them, which people are good, and which bad, or pretty much anything else. So don't expect too much from the theory! We're going to focus on examples and thought starters here, not theory."
|
||||
"> J: At university, philosophy of ethics was my main thing (it would have been the topic of my thesis, if I'd finished it, instead of dropping out to join the real-world). Based on the years I spent studying ethics, I can tell you this: no one really agrees on what right and wrong are, whether they exist, how to spot them, which people are good, and which bad, or pretty much anything else. So don't expect too much from the theory! We're going to focus on examples and thought starters here, not theory."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -62,7 +62,7 @@
|
||||
"- Well-founded standards of right and wrong that prescribe what humans ought to do, and\n",
|
||||
"- The study and development of one's ethical standards.\n",
|
||||
"\n",
|
||||
"There is no list of right answers for ethics. There is no list of dos and don'ts. Ethics is complicated, and context-dependent. It involves the perspectives of many stakeholders. Ethics is a muscle that you have to develop and practice. In this chapter, our goal is to provide some signposts to help you on that journey.\n",
|
||||
"There is no list of right answers for ethics. There is no list of do's and dont's. Ethics is complicated, and context-dependent. It involves the perspectives of many stakeholders. Ethics is a muscle that you have to develop and practice. In this chapter, our goal is to provide some signposts to help you on that journey.\n",
|
||||
"\n",
|
||||
"Spotting ethical issues is best to do as part of a collaborative team. This is the only way you can really incorporate different perspectives. Different people's backgrounds will help them to see things which may not be obvious to you. Working with a team is helpful for many \"muscle building\" activities, including this one.\n",
|
||||
"\n",
|
||||
@ -84,7 +84,7 @@
|
||||
"\n",
|
||||
"1. **Recourse processes**: Arkansas's buggy healthcare algorithms left patients stranded\n",
|
||||
"2. **Feedback loops**: YouTube's recommendation system helped unleash a conspiracy theory boom\n",
|
||||
"3. **Bias**: When a traditionally African-American name is searched for on Google, it displays ads for criminal background checks.\n",
|
||||
"3. **Bias**: When a traditionally African-American name is searched for on Google, it displays ads for criminal background checks\n",
|
||||
"\n",
|
||||
"In fact, for every concept that we introduce in this chapter, we are going to provide at least one specific example. For each one, have a think about what you could have done in this situation, and think about what kinds of obstructions there might have been to you getting that done. How would you deal with them? What would you look out for?"
|
||||
]
|
||||
@ -242,7 +242,7 @@
|
||||
"\n",
|
||||
"The modern workplace is a very specialised place. Everybody tends to have very well-defined jobs to perform. Especially in large companies, it can be very hard to know what all the pieces of the puzzle are. Sometimes companies even intentionally obscure the overall project goals that are being worked on, if they know that their employees are not going to like the answers. This is sometimes done by compartmentalising pieces as much as possible\n",
|
||||
"\n",
|
||||
"In other words, we're not saying that any of this is easy. It's hard. It's really hard. We all have to do our best. And we have often seen that the people who do get involved in the higher-level context of these projects, and attempt to develop cross disciplinary capabilities and teams, become some of the most important and well rewarded parts of their organisations. It's the kind of work that tends to be highly appreciated by senior executives, even if it is considered, sometimes, rather uncomfortable by middle management."
|
||||
"In other words, we're not saying that any of this is easy. It's hard. It's really hard. We all have to do our best. And we have often seen that the people who do get involved in the higher-level context of these projects, and attempt to develop cross-disciplinary capabilities and teams, become some of the most important and well rewarded members of their organisations. It's the kind of work that tends to be highly appreciated by senior executives, even if it is considered, sometimes, rather uncomfortable by middle management."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -284,7 +284,7 @@
|
||||
"source": [
|
||||
"In a complex system, it is easy for no one person to feel responsible for outcomes. While this is understandable, it does not lead to good results. In the earlier example of the Arkansas healthcare system in which a bug led to people with cerebral palsy losing access to needed care, the creator of the algorithm blamed government officials, and government officials could blame those who implemented the software. NYU professor Danah Boyd described this phenomenon: \"bureaucracy has often been used to evade responsibility, and today's algorithmic systems are extending bureaucracy.\"\n",
|
||||
"\n",
|
||||
"An additional reason why recourse is so necessary, is because data often contains errors. Mechanisms for audits and error-correction are crucial. A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as “admitting to being gang members”). In this case, there was no process in place for correcting mistakes or removing people once they’ve been added. Another example is the US credit report system; in a large-scale study of credit reports by the FTC (Federal Trade Commission) in 2012, it was found that 26% of consumers had at least one mistake in their files, and 5% had errors that could be devastating. Yet, the process of getting such errors corrected is incredibly slow and opaque. When public-radio reporter Bobby Allyn discovered that he was erroneously listed as having a firearms conviction, it took him \"more than a dozen phone calls, the handiwork of a county court clerk and six weeks to solve the problem. And that was only after I contacted the company’s communications department as a journalist.\" (as covered in the article [How the careless errors of credit reporting agencies are ruining people’s lives](https://www.washingtonpost.com/posteverything/wp/2016/09/08/how-the-careless-errors-of-credit-reporting-agencies-are-ruining-peoples-lives/))\n",
|
||||
"An additional reason why recourse is so necessary is because data often contains errors. Mechanisms for audits and error-correction are crucial. A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as “admitting to being gang members”). In this case, there was no process in place for correcting mistakes or removing people once they’d been added. Another example is the US credit report system: in a large-scale study of credit reports by the FTC (Federal Trade Commission) in 2012, it was found that 26% of consumers had at least one mistake in their files, and 5% had errors that could be devastating. Yet, the process of getting such errors corrected is incredibly slow and opaque. When public-radio reporter Bobby Allyn discovered that he was erroneously listed as having a firearms conviction, it took him \"more than a dozen phone calls, the handiwork of a county court clerk and six weeks to solve the problem. And that was only after I contacted the company’s communications department as a journalist.\" (as covered in the article [How the careless errors of credit reporting agencies are ruining people’s lives](https://www.washingtonpost.com/posteverything/wp/2016/09/08/how-the-careless-errors-of-credit-reporting-agencies-are-ruining-peoples-lives/))\n",
|
||||
"\n",
|
||||
"As machine learning practitioners, we do not always think of it as our responsibility to understand how our algorithms end up being implemented in practice. But we need to."
|
||||
]
|
||||
@ -347,7 +347,7 @@
|
||||
"\n",
|
||||
"> : \"One important signal to classify the main topic of a video is the channel it comes from. For example, a video uploaded to a cooking channel is very likely to be a cooking video. But how do we know what topic a channel is about? Well… in part by looking at the topics of the videos it contains! Do you see the loop? For example, many videos have a description which indicates what camera was used to shoot the video. As a result, some of these videos might get classified as videos about “photography”. If a channel has such as misclassified video, it might be classified as a “photography” channel, making it even more likely for future videos on this channel to be wrongly classified as “photography”. This could even lead to runaway virus-like classifications! One way to break this feedback loop is to classify videos with and without the channel signal. Then when classifying the channels, you can only use the classes obtained without the channel signal. This way, the feedback loop is broken.\"\n",
|
||||
"\n",
|
||||
"There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. Meetup’s algorithm could recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, but explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but to consider their impact. \"You need to decide which feature not to use in your algorithm… the most optimal algorithm is perhaps not the best one to launch into production\", he said.\n",
|
||||
"There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. Meetup’s algorithm could recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, by explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but to consider their impact. \"You need to decide which feature not to use in your algorithm… the most optimal algorithm is perhaps not the best one to launch into production\", he said.\n",
|
||||
"\n",
|
||||
"While Meetup chose to avoid such an outcome, Facebook provides an example of allowing a runaway feedback loop to run wild. Facebook radicalizes users interested in one conspiracy theory by introducing them to more. As [Renee DiResta, a researcher on proliferation of disinformation, writes](https://www.fastcompany.com/3059742/social-network-algorithms-are-distorting-reality-by-boosting-conspiracy-theories):"
|
||||
]
|
||||
@ -414,7 +414,7 @@
|
||||
" - When doctors were shown identical files, they were much less likely to recommend cardiac catheterization (a helpful procedure) to Black patients\n",
|
||||
" - When bargaining for a used car, Black people were offered initial prices $700 higher and received far smaller concessions\n",
|
||||
" - Responding to apartment-rental ads on Craigslist with a Black name elicited fewer responses than with a white name\n",
|
||||
" - An all-white jury was 16 percentage points more likely to convict a Black defendant than a white one, but when a jury had 1 Black member, it convicted both at same rate.\n",
|
||||
" - An all-white jury was 16 percentage points more likely to convict a Black defendant than a white one, but when a jury had 1 Black member, it convicted both at the same rate\n",
|
||||
"\n",
|
||||
"The COMPAS algorithm, widely used for sentencing and bail decisions in the US, is an example of an important algorithm which, when tested by ProPublica, showed clear racial bias in practice:"
|
||||
]
|
||||
@ -635,7 +635,7 @@
|
||||
" - People are more likely to assume algorithms are objective or error-free (even if they’re given the option of a human override)\n",
|
||||
" - Algorithms are more likely to be implemented with no appeals process in place\n",
|
||||
" - Algorithms are often used at scale\n",
|
||||
" - Algorithmic systems are cheap.\n",
|
||||
" - Algorithmic systems are cheap\n",
|
||||
"\n",
|
||||
"Even in the absence of bias, algorithms (and deep learning especially, since it is such an effective and scalable algorithm) can lead to negative societal problems, such as when used for *disinformation*."
|
||||
]
|
||||
@ -714,7 +714,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It's easy to miss important issues when considered ethical implications of your work. One thing that helps enormously is simply asking the right questions. Rachel Thomas recommends considering the following questions throughout the development of a data project:\n",
|
||||
"It's easy to miss important issues when considering ethical implications of your work. One thing that helps enormously is simply asking the right questions. Rachel Thomas recommends considering the following questions throughout the development of a data project:\n",
|
||||
"\n",
|
||||
" - Should we even be doing this?\n",
|
||||
" - What bias is in the data?\n",
|
||||
@ -775,7 +775,7 @@
|
||||
" - Will the effects in aggregate likely create more good than harm, and what types of good and harm?\n",
|
||||
" - Are we thinking about all relevant types of harm/benefit (psychological, political, environmental, moral, cognitive, emotional, institutional, cultural)?\n",
|
||||
" - How might future generations be affected by this project?\n",
|
||||
" - Do the risks of harm from this project fall disproportionately on the least powerful in society? Will the benefits go disproportionately the well-off?\n",
|
||||
" - Do the risks of harm from this project fall disproportionately on the least powerful in society? Will the benefits go disproportionately to the well-off?\n",
|
||||
" - Have we adequately considered ‘dual-use?\n",
|
||||
"\n",
|
||||
"The alternative lens to this is the *deontological* perspective, which focuses on basic *right* and *wrong*:\n",
|
||||
@ -814,7 +814,7 @@
|
||||
"\n",
|
||||
"Receiving mentorship has been statistically shown to help men advance, but not women. The reason behind this is that when women receive mentorship, it’s advice on how they should change and gain more self-knowledge. When men receive mentorship, it’s public endorsement of their authority. Guess which is more useful in getting promoted?\n",
|
||||
"\n",
|
||||
"As long as qualified women keep dropping out of tech, teaching more girls to code will not solve the diversity issues plaguing the field. Diversity initiatives often end up focusing primarily on white women, even although women of colour face many additional barriers. In interviews with 60 women of color who work in STEM research, 100% had experienced discrimination."
|
||||
"As long as qualified women keep dropping out of tech, teaching more girls to code will not solve the diversity issues plaguing the field. Diversity initiatives often end up focusing primarily on white women, even though women of colour face many additional barriers. In interviews with 60 women of color who work in STEM research, 100% had experienced discrimination."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -843,7 +843,7 @@
|
||||
"\n",
|
||||
"FAccT is another lens that you may find useful in considering ethical issues. One useful resource for this is the free online book [Fairness and machine learning; Limitations and Opportunities](https://fairmlbook.org/), which \"gives a perspective on machine learning that treats fairness as a central concern rather than an afterthought.\" It also warns, however, that it \"is intentionally narrow in scope... A narrow framing of machine learning ethics might be tempting to technologists and businesses as a way to focus on technical interventions while sidestepping deeper questions about power and accountability. We caution against this temptation.\" Rather than provide an overview of the FAccT approach to ethics (which is better done in books such as the one linked above), our focus here will be on the limitations of this kind of narrow framing.\n",
|
||||
"\n",
|
||||
"One great way to consider whether an ethical lens is complete, is to try to come up with an example where the lens and our own ethical intuitions give diverging results. Os Keyes explored this in a graphic way in their paper [A Mulching Proposal\n",
|
||||
"One great way to consider whether an ethical lens is complete, is to try to come up with an example where the lens and our own ethical intuitions give diverging results. Os Keyes et al. explored this in a graphic way in their paper [A Mulching Proposal\n",
|
||||
"Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry](https://arxiv.org/abs/1908.06166). The paper's abstract says:"
|
||||
]
|
||||
},
|
||||
@ -860,7 +860,7 @@
|
||||
"source": [
|
||||
"In this paper, the rather controversial proposal (\"Turning the Elderly into High-Nutrient Slurry\") and the results (\"drastically increase the algorithm's adherence to the FAT framework, resulting in a more ethical and beneficent system\") are at odds... to say the least!\n",
|
||||
"\n",
|
||||
"In philosophy, and especially philosophy of ethics, this is one of the most effective tools: first, come up with a process, definition, set of questions, etc, which is designed to resolve some problem. Then try to come up with an example where that apparent solution results in a proposal that no-one would consider acceptable. This can then lead to a further refinement of the solution.\n",
|
||||
"In philosophy, and especially philosophy of ethics, this is one of the most effective tools: first, come up with a process, definition, set of questions, etc., which is designed to resolve some problem. Then try to come up with an example where that apparent solution results in a proposal that no-one would consider acceptable. This can then lead to a further refinement of the solution.\n",
|
||||
"\n",
|
||||
"So far, we've focused on things that you and your organization can do. But sometimes individual or organizational action is not enough. Sometimes, governments also need to consider policy implications."
|
||||
]
|
||||
@ -876,7 +876,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We often talk to people who are eager for technical or design fixes to be full solution to the kinds of problems that we've been discussing; for instance, a technical approach to debias data, or design guidelines for making technology less addictive. While such measures can be useful, they will not be sufficient to address the underlying problems that have led to our current state. For example, as long as it is incredibly profitable to create addictive technology, companies will continue to do so, regardless of whether this has the side effect of promoting conspiracy theories and polluting our information ecosystem. While individual designers may try to tweak product designs, we will not see substantial changes until the underlying profit incentives changes."
|
||||
"We often talk to people who are eager for technical or design fixes to be a full solution to the kinds of problems that we've been discussing; for instance, a technical approach to debias data, or design guidelines for making technology less addictive. While such measures can be useful, they will not be sufficient to address the underlying problems that have led to our current state. For example, as long as it is incredibly profitable to create addictive technology, companies will continue to do so, regardless of whether this has the side effect of promoting conspiracy theories and polluting our information ecosystem. While individual designers may try to tweak product designs, we will not see substantial changes until the underlying profit incentives changes."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -908,9 +908,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Clean air and clean drinking water are public goods which are nearly impossible to protect through individual market decisions, but rather require coordinated regulatory action. Similarly, many of the harms resulting from unintended consequences of misuses of technology involve public goods, such as a polluted information environment or deteriorated ambient privacy. Too often privacy is framed as an individual right, yet there are societal impacts to widespread surveillance (which would still be the case even if it was possible for a few individuals to opt out)\n",
|
||||
"Clean air and clean drinking water are public goods which are nearly impossible to protect through individual market decisions, but rather require coordinated regulatory action. Similarly, many of the harms resulting from unintended consequences of misuses of technology involve public goods, such as a polluted information environment or deteriorated ambient privacy. Too often privacy is framed as an individual right, yet there are societal impacts to widespread surveillance (which would still be the case even if it was possible for a few individuals to opt out).\n",
|
||||
"\n",
|
||||
"Many of the issues we are seeing in tech are actually human rights issues, such as when a biased algorithm recommends that Black defendants to have longer prison sentences, when particular job ads are only shown to young people, or when police use facial recognition to identify protesters. The appropriate venue to address human rights issues is typically through the law.\n",
|
||||
"Many of the issues we are seeing in tech are actually human rights issues, such as when a biased algorithm recommends that Black defendants have longer prison sentences, when particular job ads are only shown to young people, or when police use facial recognition to identify protesters. The appropriate venue to address human rights issues is typically through the law.\n",
|
||||
"\n",
|
||||
"We need both regulatory and legal changes, *and* the ethical behavior of individuals. Individual behavior change can’t address misaligned profit incentives, externalities (where corporations reap large profits while off-loading their costs & harms to the broader society), or systemic failures. However, the law will never cover all edge cases, and it is important that individual software developers and data scientists are equipped to make ethical decisions in practice."
|
||||
]
|
||||
@ -940,7 +940,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Coming from a background of working with binary logic, the lack of clear answers in ethics can be frustrating at first. Yet, the implications of how our work impacts the world, including unintended consequences and the work becoming weaponization by bad actors, are some of the most important questions we can (and should!) consider. Even though there aren't any easy answers, there are definite pitfalls to avoid and practices to move towards more ethical behavior.\n",
|
||||
"Coming from a background of working with binary logic, the lack of clear answers in ethics can be frustrating at first. Yet, the implications of how our work impacts the world, including unintended consequences and the work becoming weaponized by bad actors, are some of the most important questions we can (and should!) consider. Even though there aren't any easy answers, there are definite pitfalls to avoid and practices to move towards more ethical behavior.\n",
|
||||
"\n",
|
||||
"Many people (including us!) are looking for more satisfying, solid answers of how to address harmful impacts of technology. However, given the complex, far-reaching, and interdisciplinary nature of the problems we are facing, there are no simple solutions. Julia Angwin, former senior reporter at ProPublica who focuses on issues of algorithmic bias and surveillance (and one of the 2016 investigators of the COMPAS recidivism algorithm that helped spark the field of Fairness Accountability and Transparency) said in [a 2019 interview](https://www.fastcompany.com/90337954/who-cares-about-liberty-julia-angwin-and-trevor-paglen-on-privacy-surveillance-and-the-mess-were-in), “I strongly believe that in order to solve a problem, you have to diagnose it, and that we’re still in the diagnosis phase of this. If you think about the turn of the century and industrialization, we had, I don’t know, 30 years of child labor, unlimited work hours, terrible working conditions, and it took a lot of journalist muckraking and advocacy to diagnose the problem and have some understanding of what it was, and then the activism to get laws changed. I feel like we’re in a second industrialization of data information... I see my role as trying to make as clear as possible what the downsides are, and diagnosing them really accurately so that they can be solvable. That’s hard work, and lots more people need to be doing it.” It's reassuring that Angwin thinks we are largely still in the diagnosis phase: if your understanding of these problems feels incomplete, that is normal and natural. Nobody has a “cure” yet, although it is vital that we continue working to better understand and address the problems we are facing.\n",
|
||||
"\n",
|
||||
@ -1034,41 +1034,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.5"
|
||||
},
|
||||
"toc": {
|
||||
"base_numbering": 1,
|
||||
"nav_menu": {
|
||||
"height": "600px",
|
||||
"width": "365px"
|
||||
},
|
||||
"number_sections": false,
|
||||
"sideBar": true,
|
||||
"skip_h1_title": true,
|
||||
"title_cell": "Table of Contents",
|
||||
"title_sidebar": "Contents",
|
||||
"toc_cell": false,
|
||||
"toc_position": {},
|
||||
"toc_section_display": true,
|
||||
"toc_window_display": false
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -33,7 +33,7 @@
|
||||
"source": [
|
||||
"Having seen what it looks like to actually train a variety of models in chapter 2, let’s now look under the hood and see exactly what is going on. We’ll start with computer vision, and will use that to introduce fundamental tools and concepts of deep learning.\n",
|
||||
"\n",
|
||||
"To be exact, we'll discuss the role of arrays and tensors, and of brodcasting, a powerful technique for using them expressively. We'll explain stochastic gradient descent (SGD), the mechanism for learning by updating weights automatically. We'll discuss the choice of loss function for our basic classification task, and the role of mini-batches. We'll also finally describe the math that a basic neural network is actually doing. Finally, we'll put all these pieces together to see them working together.\n",
|
||||
"To be exact, we'll discuss the role of arrays and tensors, and of broadcasting, a powerful technique for using them expressively. We'll explain stochastic gradient descent (SGD), the mechanism for learning by updating weights automatically. We'll discuss the choice of a loss function for our basic classification task, and the role of mini-batches. We'll also describe the math that a basic neural network is actually doing. Finally, we'll put all these pieces together to see them at work.\n",
|
||||
"\n",
|
||||
"In future chapters we’ll do deep dives into other applications as well, and see how these concepts and tools generalize. But this chapter is about laying foundation stones. To be frank, that also makes this one of the harder chapters, because of how these concepts all depend on each other. Like an arch, all the stones need to be in place for the structure to stay up. Also like an arch, once that happens, it's a powerful structure that can support other things. But it requires some patience to assemble.\n",
|
||||
"\n",
|
||||
@ -71,7 +71,7 @@
|
||||
"\n",
|
||||
"Geoff Hinton has told of how even academic papers showing dramatically better results than anything previously published would be rejected from top journals and conferences, just because they used a neural network. Yann Lecun's work on convolutional neural networks, which we will study in the next section, showed that these models could read hand-written text--something that had never been achieved before. However his breakthrough was ignored by most researchers, even as it was used commercially to read 10% of the checks in the US!\n",
|
||||
"\n",
|
||||
"In addition to these three Turing Award winners, there are many other researchers who have battled to get us to where we are today. For instance, Jurgen Schmidhuber (who many believe should have shared in the Turing Award) pioneered many important ideas, including working on the *LSTM* architecture with his student Sepp Hochreiter (widely used for speech recognition and other text modeling tasks, and used in the IMDb example in <<chapter_intro>>). Perhaps most important of all, Paul Werbos in 1974 invented back-propagation for neural networks, the technique shown in this chapter and used universally for training neural networks ([Werbos 1994](https://books.google.com/books/about/The_Roots_of_Backpropagation.html?id=WdR3OOM2gBwC)). His development was almost entirely ignored for decades, but today it is the most important foundation of modern AI.\n",
|
||||
"In addition to these three Turing Award winners, there are many other researchers who have battled to get us to where we are today. For instance, Jurgen Schmidhuber (who many believe should have shared in the Turing Award) pioneered many important ideas, including working on the *LSTM* architecture with his student Sepp Hochreiter (widely used for speech recognition and other text modeling tasks, and used in the IMDB example in <<chapter_intro>>). Perhaps most important of all, Paul Werbos in 1974 invented back-propagation for neural networks, the technique shown in this chapter and used universally for training neural networks ([Werbos 1994](https://books.google.com/books/about/The_Roots_of_Backpropagation.html?id=WdR3OOM2gBwC)). His development was almost entirely ignored for decades, but today it is the most important foundation of modern AI.\n",
|
||||
"\n",
|
||||
"There is a lesson here for all of us! On your deep learning journey you will face many obstacles, both technical, and (even more difficult) people around you who don't believe you'll be successful. There's one *guaranteed* way to fail, and that's to stop trying. We've seen that the only consistent trait amongst every fast.ai student that's gone on to be a world-class practitioner is that they are all very tenacious."
|
||||
]
|
||||
@ -113,7 +113,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can see what's in this directory by using `ls()`, a method added by fastai. This method returns an object of a special fastai class called `L`, which has all the same functionality of Python's builtin `list`, plus a lot more. One of its handy features is that, when printed, it displays the count of items, before listing the items themselves (if there's more than 10 items, it just shows the first few)."
|
||||
"We can see what's in this directory by using `ls()`, a method added by fastai. This method returns an object of a special fastai class called `L`, which has all the same functionality of Python's built-in `list`, plus a lot more. One of its handy features is that, when printed, it displays the count of items, before listing the items themselves (if there's more than 10 items, it just shows the first few)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1462,7 +1462,7 @@
|
||||
"\n",
|
||||
"Some operations in PyTorch, such as taking a mean, require us to cast our integer types to float types. Since we'll be needing this later, we'll also cast our stacked tensor to `float` now. Casting in PyTorch is as simple as typing the name of the type you wish to cast to, and treating it as a method.\n",
|
||||
"\n",
|
||||
"Generally when images are floats, the pixels are expected to be be zero and one, so we will also divide by 255 here."
|
||||
"Generally when images are floats, the pixels are expected to be between zero and one, so we will also divide by 255 here."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1726,7 +1726,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> s: Intuitively, the difference between L1 norm and mean squared error (_MSE_) is that the latter will penalize bigger mistakes more heavily than the former (and be more lenient with small mistakes)."
|
||||
"> S: Intuitively, the difference between L1 norm and mean squared error (*MSE*) is that the latter will penalize bigger mistakes more heavily than the former (and be more lenient with small mistakes)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1760,14 +1760,14 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> j: When I first came across this \"L1\" thingie, I looked it up to see what on Earth it meant, found on Google that it is a _vector norm_ using _absolute value_, so looked up _vector norm_ and started reading: _Given a vector space V over a field F of the real or complex numbers, a norm on V is a nonnegative-valued any function p: V → \\[0,+∞) with the following properties: For all a ∈ F and all u, v ∈ V, p(u + v) ≤ p(u) + p(v)..._ Then I stopped reading. \"Ugh, I'll never understand math!\" I thought, for the thousandth time. Since then I've learned that every time these complex mathy bits of jargon come up in practice, it turns out I can replace them with a tiny bit of code! Like the _L1 loss_ is just equal to `(a-b).abs().mean()`, where `a` and `b` are tensors. I guess mathy folks just think differently to me... I'll make sure, in this book, every time some mathy jargon comes up, I'll give you the little bit of code it's equal to as well, and explain in common sense terms what's going on."
|
||||
"> J: When I first came across this \"L1\" thingie, I looked it up to see what on Earth it meant, found on Google that it is a *vector norm* using *absolute value*, so looked up *vector norm* and started reading: *Given a vector space V over a field F of the real or complex numbers, a norm on V is a nonnegative-valued any function p: V → \\[0,+∞) with the following properties: For all a ∈ F and all u, v ∈ V, p(u + v) ≤ p(u) + p(v)...* Then I stopped reading. \"Ugh, I'll never understand math!\" I thought, for the thousandth time. Since then I've learned that every time these complex mathy bits of jargon come up in practice, it turns out I can replace them with a tiny bit of code! Like the _L1 loss_ is just equal to `(a-b).abs().mean()`, where `a` and `b` are tensors. I guess mathy folks just think differently to me... I'll make sure, in this book, every time some mathy jargon comes up, I'll give you the little bit of code it's equal to as well, and explain in common sense terms what's going on."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the above code we completed various mathematical operations on *PyTorch tensors*. If you've done some numeric programming in Pytorch before, you may recognize these as being similar to *Numpy arrays*. Let's have a look at those two very important classes."
|
||||
"In the above code we completed various mathematical operations on *PyTorch tensors*. If you've done some numeric programming in PyTorch before, you may recognize these as being similar to *Numpy arrays*. Let's have a look at those two very important classes."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1796,11 +1796,11 @@
|
||||
"\n",
|
||||
"In fact, **arrays and tensors can finish computations many thousands of times faster than using pure Python.**\n",
|
||||
"\n",
|
||||
"A PyTorch tensor is nearly the same thing as a numpy array, but with an additional restriction which unlocks some additional capabilities. It's the same in that it, too, is a multidimensional table of data, with all items of the same type. However, the restriction is that a tensor cannot use just any old type — it has to use a single basic numeric type for all componentss. As a result, a tensor is not as flexible as a genuine array of arrays, which allows jagged arrays, where the inner arrays could have different sizes. So a PyTorch tensor cannot be jagged. It is always a regularly shaped multidimensional rectangular structure.\n",
|
||||
"A PyTorch tensor is nearly the same thing as a numpy array, but with an additional restriction which unlocks some additional capabilities. It's the same in that it, too, is a multidimensional table of data, with all items of the same type. However, the restriction is that a tensor cannot use just any old type — it has to use a single basic numeric type for all components. As a result, a tensor is not as flexible as a genuine array of arrays, which allows jagged arrays, where the inner arrays could have different sizes. So a PyTorch tensor cannot be jagged. It is always a regularly shaped multidimensional rectangular structure.\n",
|
||||
"\n",
|
||||
"The vast majority of methods and operators supported by numpy on these structures are also supported by PyTorch. But PyTorch tensors have additional capabilities. One major capability is that these structures can live on the GPU, in which case their computation will be optimised for the GPU, and can run much faster. In addition, PyTorch can automatically calculate derivatives of these operations, including combinations of operations. As you'll see, it would be impossible to do deep learning in practice without this capability.\n",
|
||||
"\n",
|
||||
"> s: If you don't know what C is, do not worry as you won't need it at all. In a nutshell, it's a low-level (low-level means more similar to the language that computers use internally) language that is very fast compared to Python. To take advantage of its speed while programming in Python, try to avoid as much as possible writing loops and replace them by commands that work directly on arrays or tensors.\n",
|
||||
"> S: If you don't know what C is, do not worry as you won't need it at all. In a nutshell, it's a low-level (low-level means more similar to the language that computers use internally) language that is very fast compared to Python. To take advantage of its speed while programming in Python, try to avoid as much as possible writing loops and replace them by commands that work directly on arrays or tensors.\n",
|
||||
"\n",
|
||||
"Perhaps the most important new coding skill for a Python programmer to learn is how to effectively use the array/tensor APIs. We will be showing lots more tricks later in this book, but here's a summary of the key things you need to know for now."
|
||||
]
|
||||
@ -1869,7 +1869,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"All the operations below are shown on tensors - the syntax and results for NumPy arrays is idential.\n",
|
||||
"All the operations below are shown on tensors - the syntax and results for NumPy arrays is identical.\n",
|
||||
"\n",
|
||||
"You can select a row:"
|
||||
]
|
||||
@ -2339,7 +2339,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Stochastic Gradient descent (SGD)"
|
||||
"## Stochastic Gradient Descent (SGD)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -2348,11 +2348,11 @@
|
||||
"source": [
|
||||
"Do you remember the way that Arthur Samuel described machine learning, which we quoted in <<chapter_intro>>:\n",
|
||||
"\n",
|
||||
"> : _Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programed would \"learn\" from its experience._\n",
|
||||
"> : _Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programmed would \"learn\" from its experience._\n",
|
||||
"\n",
|
||||
"As we discussed, this is the key to allowing us to have something which can get better and better — to learn. But our pixel similarity approach does not really do this. We do not have any kind of weight assignment, or any way of improving based on testing the effectiveness of a weight assignment. In other words, we can't really improve our pixel similarity approach by modifying a set of parameters (which will be the SGD part, as we will see). In order to take advantage of the power of deep learning, we will first have to represent our task in the way that Arthur Samuel described it.\n",
|
||||
"\n",
|
||||
"Instead of trying to find the similarity between an image and a \"ideal image\" we could instead look at each individual pixel, and come up with a set of weights for each pixel, such that the highest weights are associated with those pixels most likely to be black for a particular category. For instance, pixels towards the bottom right are not very likely to be activated for a seven, so they should have a low weight for a seven, but are more likely to be activated for an eight, so they should have a high weight for an eight. This can be represented as a function for each possible category, for instance the probability of being the number eight:\n",
|
||||
"Instead of trying to find the similarity between an image and an \"ideal image\" we could instead look at each individual pixel, and come up with a set of weights for each pixel, such that the highest weights are associated with those pixels most likely to be black for a particular category. For instance, pixels towards the bottom right are not very likely to be activated for a seven, so they should have a low weight for a seven, but are more likely to be activated for an eight, so they should have a high weight for an eight. This can be represented as a function for each possible category, for instance the probability of being the number eight:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def pr_eight(x,w) = (x*w).sum()\n",
|
||||
@ -2498,12 +2498,12 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These seven steps, illustrated in <<gradient_descent>> are the key to the training of all deep learning models, and we'll be using the seven terms in the above diagram throughout this book. That deep learning turns out to rely entirely on these steps is extremely surprising and counter-intuitive. It's amazing that this process can solve such complex problems. But, as you'll see, it really does!\n",
|
||||
"These seven steps, illustrated in <<gradient_descent>> are the key to the training of all deep learning models. That deep learning turns out to rely entirely on these steps is extremely surprising and counter-intuitive. It's amazing that this process can solve such complex problems. But, as you'll see, it really does!\n",
|
||||
"\n",
|
||||
"There are many different ways to do each of these seven steps, and we will be learning about them throughout the rest of this book. These are the details which make a big difference for deep learning practitioners. But it turns out that the general approach to each one generally follows some basic principles:\n",
|
||||
"\n",
|
||||
"- **Initialize**:: we initialise the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initialising them to the percentage of times that that pixel is activated for that category. But since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well\n",
|
||||
"- **Loss**:: This is the thing Arthur Samuel refered to: \"*testing the effectiveness of any current weight assignment in terms of actual performance*\". We need some function that will return a number that is small if the performance of the model is good (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention)\n",
|
||||
"- **Initialize**:: we initialize the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initialising them to the percentage of times that that pixel is activated for that category. But since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well\n",
|
||||
"- **Loss**:: This is the thing Arthur Samuel referred to: \"*testing the effectiveness of any current weight assignment in terms of actual performance*\". We need some function that will return a number that is small if the performance of the model is good (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention)\n",
|
||||
"- **Step**:: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it. Increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount which works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out which direction, and roughly how much, to change each weight, without having to try all these small changes, by calculating *gradients*. This is just a performance optimisation, we would get exactly the same results by using the slower manual process as well\n",
|
||||
"- **Stop**:: We have already discussed how to choose how many epochs to train a model for. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time."
|
||||
]
|
||||
@ -2662,7 +2662,7 @@
|
||||
"source": [
|
||||
"Notice the special method `requires_grad_`? That's the magical incantation we use to tell PyTorch that we want to calculate gradients with respect to that variable at that value. It is essentially tagging the variable, so PyTorch will remember to keep track of how to compute gradients of the other, direct calculations on it which you will ask for.\n",
|
||||
"\n",
|
||||
"> a: This API might throw you if you're coming from math or physics. In those contexts the \"gradient\" of a function is just another function (i.e., its derivative), so you might expect gradient-related API to give you a new function. But in deep learning, \"gradients\" usually means the _value_ of a function's derivative at a particular argument value. PyTorch API also puts the focus on that argument, not the function you're actually computing gradients of. It may feel backwards at first but it's just a different perspective.\n",
|
||||
"> A: This API might throw you if you're coming from math or physics. In those contexts the \"gradient\" of a function is just another function (i.e., its derivative), so you might expect gradient-related APIs to give you a new function. But in deep learning, \"gradients\" usually means the _value_ of a function's derivative at a particular argument value. PyTorch API also puts the focus on that argument, not the function you're actually computing gradients of. It may feel backwards at first but it's just a different perspective.\n",
|
||||
"\n",
|
||||
"Now we calculate our function with that value. Notice how PyTorch prints not just the value calculated, but also a note that it has a gradient function it'll be using to calculate our gradient when needed:"
|
||||
]
|
||||
@ -3381,7 +3381,7 @@
|
||||
"source": [
|
||||
"To summarize, at the beginning, the weights of our model can be random (training *from scratch*) or come from a pretrained model (*transfer learning*). In the first case, the output we will get from our inputs won't have anything to do with what we want, and even in the second case, it's very likely the pretrained model won't be very good at the specific task we are targeting. So the model will need to *learn* better weights.\n",
|
||||
"\n",
|
||||
"To do this, we will compare the outputs the model gives us with our targets (we have labelled data, so we know what result the model should give) using a *loss function*, which returns a number that needs to be as low as possible. Our weights need to be improved. To do this, we take a few data items (such as images) that we feed to our model. After going through our model, we compare to the corresponding targets using our loss function. The score we get tells us how wrong our predictions were, and we will change the weights a little bit to make it slightly better.\n",
|
||||
"To do this, we will compare the outputs the model gives us with our targets (we have labelled data, so we know what result the model should give) using a *loss function*, which returns a number that needs to be as low as possible. Our weights need to be improved. To do this, we take a few data items (such as images) that we feed to our model. After going through our model, we compare the corresponding targets using our loss function. The score we get tells us how wrong our predictions were, and we will change the weights a little bit to make it slightly better.\n",
|
||||
"\n",
|
||||
"To find how to change the weights to make the loss a bit better, we use calculus to calculate the *gradient*. (Actually, we let PyTorch do it for us!) Let's imagine you are lost in the mountains with your car parked at the lowest point. To find your way, you might wander in a random direction but that probably won't help much. Since you know your vehicle is at the lowest point, you would be better to go downhill. By always taking a step in the direction of the steepest downward slope, you should eventually arrive at your destination. We use the magnitude of the gradient (i.e., the steepness of the slope) to tell us how big a step to take; specifically, we multiply the gradient by a number we choose called the *learning rate* to decide on the step size."
|
||||
]
|
||||
@ -3405,7 +3405,7 @@
|
||||
"\n",
|
||||
"As a result, a very small change in the value of a weight will often not actually change the accuracy at all. This means it is not useful to use accuracy as a loss function. When we use accuracy as a loss function, most of the time our gradients will actually be zero, and the model will not be able to learn from that number. That is not much use at all!\n",
|
||||
"\n",
|
||||
"> s: In mathematical terms, accuracy is a function that is constant almost everywhere (except at the threshold, 0.5) so its derivative is nil almost everywhere (and infinity at the threshold). This then gives gradients that are zero or infinite, so useless to do an update of gradient descent.\n",
|
||||
"> S: In mathematical terms, accuracy is a function that is constant almost everywhere (except at the threshold, 0.5) so its derivative is nil almost everywhere (and infinity at the threshold). This then gives gradients that are zero or infinite, so, useless to do an update of gradient descent.\n",
|
||||
"\n",
|
||||
"Instead, we need a loss function which, when our weights result in slightly better predictions, gives us a slightly better loss. So what does a \"slightly better prediction\" look like, exactly? Well, in this case, it means that, if the correct answer is a 3, then the score is a little higher, or if the correct answer is a 7, then the score is a little lower.\n",
|
||||
"\n",
|
||||
@ -3415,7 +3415,7 @@
|
||||
"\n",
|
||||
"The purpose of the loss function is to measure the difference between predicted values and the true values -- that is, the targets (aka, the labels). So let's make another argument `targets`, a vector (i.e., another rank-1 tensor), indexed over the images, with a value of 0 or 1 which tells whether that image actually is a 3.\n",
|
||||
"\n",
|
||||
"So, for instance, suppose we had three images which we knew were a 3, a 7, and a 3. And suppose our model predicted with high confidence that the first was a 3, with slight confidence that the second was a 7, and with fair confidence (and incorrectly!) that the last was a 7. This would mean our loss function would take receive values as its inputs:"
|
||||
"So, for instance, suppose we had three images which we knew were a 3, a 7, and a 3. And suppose our model predicted with high confidence that the first was a 3, with slight confidence that the second was a 7, and with fair confidence (and incorrectly!) that the last was a 7. This would mean our loss function would take values as its inputs:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -3449,7 +3449,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We're using a new function, `torch.where(a,b,c)`. This the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. In plain English, this function will measure how distant each prediction is from 1 if it should be 1, and how distant it is from from 0 if it should be 0, and then it will take the mean of all those distances.\n",
|
||||
"We're using a new function, `torch.where(a,b,c)`. This is the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. In plain English, this function will measure how distant each prediction is from 1 if it should be 1, and how distant it is from from 0 if it should be 0, and then it will take the mean of all those distances.\n",
|
||||
"\n",
|
||||
"> note: It's important to learn about PyTorch functions like this, because looping over tensors in Python performs at Python speed, not C/CUDA speed!\n",
|
||||
"\n",
|
||||
@ -3617,7 +3617,7 @@
|
||||
"\n",
|
||||
"Having defined a loss function, now is a good moment to recapitulate why we did this. After all, we already had a *metric*, which was overall accuracy. So why did we define a *loss*?\n",
|
||||
"\n",
|
||||
"The key difference is that the metric is to drive human understanding and the loss is to drive automated learning. To drive automted learning, the loss must be a function which has a meaningful derivative. It can't have big flat sections, and large jumps, but instead must be reasonably smooth. This is why we designed a loss function that would respond to small changes in confidence level. The requirements on loss sometimes it does not really reflect exactly what we are trying to achieve, but is something that is a compromise between our real goal, and a function that can be optimised using its gradient. The loss function is calculated for each item in our dataset, and then at the end of an epoch these are all averaged, and the overall mean is reported for the epoch.\n",
|
||||
"The key difference is that the metric is to drive human understanding and the loss is to drive automated learning. To drive automated learning, the loss must be a function which has a meaningful derivative. It can't have big flat sections, and large jumps, but instead must be reasonably smooth. This is why we designed a loss function that would respond to small changes in confidence level. The requirements on loss sometimes do not really reflect exactly what we are trying to achieve, but are rather a compromise between our real goal, and a function that can be optimised using its gradient. The loss function is calculated for each item in our dataset, and then at the end of an epoch these are all averaged, and the overall mean is reported for the epoch.\n",
|
||||
"\n",
|
||||
"Metrics, on the other hand, are the numbers that we really care about. These are the things which are printed at the end of each epoch, and tell us how our model is really doing. It is important that we learn to focus on these metrics, rather than the loss, when judging the performance of a model."
|
||||
]
|
||||
@ -3639,7 +3639,7 @@
|
||||
"\n",
|
||||
"So instead we take a compromise between the two: we calculate the average loss for a few data items at a time. This is called a *mini-batch*. The number of data items in the mini batch is called the *batch size*. A larger batch size means that you will get a more accurate and stable estimate of your datasets gradient on the loss function, but it will take longer, and you will get less mini-batches per epoch. Choosing a good batch size is one of the decisions you need to make as a deep learning practitioner to train your model quickly and accurately. We will talk about how to make this choice throughout this book.\n",
|
||||
"\n",
|
||||
"Another good reason for using mini-batches rather than calculating the gradient on individual data items is that, in practice, we nearly always do our training on an accelerator such as a GPU. These accelerators only perform well if they have lots of work to do at a time. So it is helpful if we can give them lots of data items to work on at a time. Using mini-batches is one of the best ways to do this. However, if you give them too much data to work on at once, they run out of memory--making GPUs happy is also tricky!.\n",
|
||||
"Another good reason for using mini-batches rather than calculating the gradient on individual data items is that, in practice, we nearly always do our training on an accelerator such as a GPU. These accelerators only perform well if they have lots of work to do at a time. So it is helpful if we can give them lots of data items to work on at a time. Using mini-batches is one of the best ways to do this. However, if you give them too much data to work on at once, they run out of memory--making GPUs happy is also tricky!\n",
|
||||
"\n",
|
||||
"As we've seen, in the discussion of data augmentation, we get better generalisation if we can vary things during training. A simple and effective thing we can vary during training is what data items we put in each mini batch. Rather than simply enumerating our data set in order for every epoch, instead what we normally do in practice is to randomly shuffle it on every epoch, before we create mini batches. PyTorch and fastai provide a class that will do the shuffling and mini batch collation for you, called `DataLoader`.\n",
|
||||
"\n",
|
||||
@ -3702,7 +3702,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"When we pass a Dataset to a DataLoader we will get back many batches which are themselves tuples of independent and dependent variable many batches:"
|
||||
"When we pass a Dataset to a DataLoader we will get back many batches which are themselves tuples of independent and dependent variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -3991,7 +3991,7 @@
|
||||
"source": [
|
||||
"Whilst we could use a python for loop to calculate the prediction for each image, that would be very slow. Because Python loops don't run on the GPU, and because Python is a slow language for loops in general, we need to represent as much of the computation in a model as possible using higher-level functions.\n",
|
||||
"\n",
|
||||
"In this case, there's an extremely convenient mathematical operation that calculates `w*x` for every row of a matrix--it's called *matrix multiplication*. <<matmul>> show what matrix multiplication looks like (diagram from Wikipedia)."
|
||||
"In this case, there's an extremely convenient mathematical operation that calculates `w*x` for every row of a matrix--it's called *matrix multiplication*. <<matmul>> shows what matrix multiplication looks like (diagram from Wikipedia)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -4829,7 +4829,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> j: There is an enormous amount of jargon in deep learning, such as: _rectified linear unit_. The vast vast majority of this jargon is no more complicated than can be implemented in a short line of code and Python, as we saw in this example. The reality is that for academics to get their papers published they need to make them sound as impressive and sophisticated as possible. One of the ways that they do that is to introduce jargon. Unfortunately, this has the result that the field ends up becoming far more intimidating and difficult to get into than it should be. You do have to learn the jargon, because otherwise papers and tutorials are not going to mean much to you. But that doesn't mean you have to find the jargon intimidating. Just remember, when you come across a word or phrase that you haven't seen before, it will almost certainly turn out that it is a very simple concept that it is referring to."
|
||||
"> J: There is an enormous amount of jargon in deep learning, such as: _rectified linear unit_. The vast vast majority of this jargon is no more complicated than can be implemented in a short line of code and Python, as we saw in this example. The reality is that for academics to get their papers published they need to make them sound as impressive and sophisticated as possible. One of the ways that they do that is to introduce jargon. Unfortunately, this has the result that the field ends up becoming far more intimidating and difficult to get into than it should be. You do have to learn the jargon, because otherwise papers and tutorials are not going to mean much to you. But that doesn't mean you have to find the jargon intimidating. Just remember, when you come across a word or phrase that you haven't seen before, it will almost certainly turn out that it is a very simple concept that it is referring to."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -4845,7 +4845,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> s: Mathematically, we say the composition of two linear functions is another linear function. So we can stack as many linear classifiers on top or each other, without non-linear functions between them, it will jsut be the same as one linear classifier."
|
||||
"> S: Mathematically, we say the composition of two linear functions is another linear function. So we can stack as many linear classifiers on top of each other, without non-linear functions between them, it will just be the same as one linear classifier."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -5444,7 +5444,7 @@
|
||||
"1. What is \"ReLU\"? Draw a plot of it for values from `-2` to `+2`.\n",
|
||||
"1. What is an \"activation function\"?\n",
|
||||
"1. What's the difference between `F.relu` and `nn.ReLU`?\n",
|
||||
"1. The universal approximation theorem shows that any function can be approximately as closely as needed using just one nonlinearity. So why do we normally use more?"
|
||||
"1. The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -5471,9 +5471,6 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
|
@ -2525,38 +2525,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.5"
|
||||
},
|
||||
"toc": {
|
||||
"base_numbering": 1,
|
||||
"nav_menu": {},
|
||||
"number_sections": false,
|
||||
"sideBar": true,
|
||||
"skip_h1_title": true,
|
||||
"title_cell": "Table of Contents",
|
||||
"title_sidebar": "Contents",
|
||||
"toc_cell": false,
|
||||
"toc_position": {},
|
||||
"toc_section_display": true,
|
||||
"toc_window_display": false
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -9733,38 +9733,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.5"
|
||||
},
|
||||
"toc": {
|
||||
"base_numbering": 1,
|
||||
"nav_menu": {},
|
||||
"number_sections": false,
|
||||
"sideBar": true,
|
||||
"skip_h1_title": true,
|
||||
"title_cell": "Table of Contents",
|
||||
"title_sidebar": "Contents",
|
||||
"toc_cell": false,
|
||||
"toc_position": {},
|
||||
"toc_section_display": true,
|
||||
"toc_window_display": false
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1257,41 +1257,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.5"
|
||||
},
|
||||
"toc": {
|
||||
"base_numbering": 1,
|
||||
"nav_menu": {
|
||||
"height": "223px",
|
||||
"width": "308px"
|
||||
},
|
||||
"number_sections": false,
|
||||
"sideBar": true,
|
||||
"skip_h1_title": true,
|
||||
"title_cell": "Table of Contents",
|
||||
"title_sidebar": "Contents",
|
||||
"toc_cell": false,
|
||||
"toc_position": {},
|
||||
"toc_section_display": true,
|
||||
"toc_window_display": false
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -526,38 +526,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.5"
|
||||
},
|
||||
"toc": {
|
||||
"base_numbering": 1,
|
||||
"nav_menu": {},
|
||||
"number_sections": false,
|
||||
"sideBar": true,
|
||||
"skip_h1_title": true,
|
||||
"title_cell": "Table of Contents",
|
||||
"title_sidebar": "Contents",
|
||||
"toc_cell": false,
|
||||
"toc_position": {},
|
||||
"toc_section_display": true,
|
||||
"toc_window_display": false
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -2,7 +2,7 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -89,7 +89,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -178,7 +178,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -187,7 +187,7 @@
|
||||
"2"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -198,7 +198,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -208,7 +208,7 @@
|
||||
"<PIL.Image.Image image mode=RGB size=151x192 at 0x7F67678F7E90>"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -227,7 +227,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -252,7 +252,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"hide_input": true
|
||||
},
|
||||
@ -265,7 +265,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -1485,18 +1485,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1003,25 +1003,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -247,9 +247,6 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
|
@ -1865,7 +1865,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Stochastic Gradient descent (SGD)"
|
||||
"## Stochastic Gradient Descent (SGD)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -3952,9 +3952,6 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
|
@ -1706,25 +1706,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1313,18 +1313,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -635,18 +635,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1666,18 +1666,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -8287,9 +8287,6 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
|
@ -1512,18 +1512,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -185,8 +185,8 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"((#374) ['xxbos','xxmaj','well',',','\"','cube','\"','(','1997',')'...],\n",
|
||||
" (#207) ['xxbos','xxmaj','conrad','xxmaj','hall','went','out','with','a','bang'...])"
|
||||
"((#228) ['xxbos','xxmaj','this','movie',',','which','i','just','discovered','at'...],\n",
|
||||
" (#238) ['xxbos','i','stopped','watching','this','film','half','way','through','.'...])"
|
||||
]
|
||||
},
|
||||
"execution_count": null,
|
||||
@ -213,7 +213,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"3"
|
||||
"(3, 2.0)"
|
||||
]
|
||||
},
|
||||
"execution_count": null,
|
||||
@ -222,9 +222,31 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def f(x): return x+1\n",
|
||||
"def f(x:int): return x+1\n",
|
||||
"tfm = Transform(f)\n",
|
||||
"tfm(2)"
|
||||
"tfm(2),tfm(2.0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(3, 2.0)"
|
||||
]
|
||||
},
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"@Transform\n",
|
||||
"def f(x:int): return x+1\n",
|
||||
"f(2),f(2.0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -814,25 +836,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1571,18 +1571,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1815,7 +1815,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### A note about twitter"
|
||||
"### A note about Twitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -2607,20 +2607,8 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
@ -830,9 +830,7 @@
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"s"
|
||||
]
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@ -843,20 +841,8 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
@ -102,7 +102,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -195,7 +195,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -205,7 +205,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -215,7 +215,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -226,10 +226,8 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
@ -290,7 +288,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -402,18 +400,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -700,18 +700,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1546,18 +1546,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -443,18 +443,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -1300,18 +1300,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -23,18 +23,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -76,18 +76,6 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -259,9 +259,6 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"split_at_heading": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
|
Loading…
Reference in New Issue
Block a user