Minor Typos Chp2

This commit is contained in:
Jeff Kriske 2020-03-15 18:12:21 -04:00
parent 57c5dad7c0
commit 3bd6e95c5e
No known key found for this signature in database
GPG Key ID: 3922FC160326B328

View File

@ -69,14 +69,14 @@
"\n",
"If you take this approach, then you will be on your third iteration of learning and improving whilst the perfectionists are still in the planning stages!\n",
"\n",
"We also suggest that you iterate from end to end in your project; that is, don't spend months fine tuning your model, or polishing the perfect GUI, or labelling the perfect dataset… Instead, complete every step as well as you can in a reasonable amount of time, all the way to the end. For instance, if your final goal is an application that runs on a mobile phone, then that should be what you have after each iteration. But perhaps in the early iterations you take some shortcuts, for instance by doing all of the processing on a remote server, and using a simple responsive web application. By completing the project and to end, you will see where the most tricky bits are, and which bits make the biggest difference to the final result."
"We also suggest that you iterate from end to end in your project; that is, don't spend months fine tuning your model, or polishing the perfect GUI, or labelling the perfect dataset… Instead, complete every step as well as you can in a reasonable amount of time, all the way to the end. For instance, if your final goal is an application that runs on a mobile phone, then that should be what you have after each iteration. But perhaps in the early iterations you take some shortcuts, for instance by doing all of the processing on a remote server, and using a simple responsive web application. By completing the project end to end, you will see where the most tricky bits are, and which bits make the biggest difference to the final result."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you work through this book, we suggest that you both complete lots of small experiments, by running and adjusting the notebooks we provide, at the same time that you gradually develop your own projects. That way, you will be getting experience with all of the tools and techniques that were explaining, as we discuss them.\n",
"As you work through this book, we suggest that you both complete lots of small experiments, by running and adjusting the notebooks we provide, at the same time that you gradually develop your own projects. That way, you will be getting experience with all of the tools and techniques that we're explaining, as we discuss them.\n",
"\n",
"> s: To make the most of this book, take the time to experiment between each chapter, be it on your own project or exploring the notebooks we provide. Then try re-writing those notebooks from scratch on a new dataset. It's only by practicing (and failing) a lot that you will get an intuition on how to train a model. \n",
"\n",
@ -91,9 +91,9 @@
"source": [
"Since it is easiest to get started on a project where you already have data available, that means it's probably easiest to get started on a project related to something you are already doing, because you already have data about things that you are doing. For instance, if you work in the music business, you may have access to many recordings. If you work as a radiologist, you probably have access to lots of medical images. If you are interested in wildlife preservation, you may have access to lots of images of wildlife.\n",
"\n",
"Sometimes, you have to get a bit creative. Maybe you can find some previous machine learning project, such as a Kaggle competition, that is related to your field of interest. Sometimes, you have to compromize. Maybe you can't find the exact data you need for the precise project you have in mind; but you might be able to find something from a similar domain, or measured in a different way, tackling a slightly different problem. Working on these kinds of similar projects will still give you a good understanding of the overall process, and may help you identify other shortcuts, data sources, and so forth.\n",
"Sometimes, you have to get a bit creative. Maybe you can find some previous machine learning project, such as a Kaggle competition, that is related to your field of interest. Sometimes, you have to compromise. Maybe you can't find the exact data you need for the precise project you have in mind; but you might be able to find something from a similar domain, or measured in a different way, tackling a slightly different problem. Working on these kinds of similar projects will still give you a good understanding of the overall process, and may help you identify other shortcuts, data sources, and so forth.\n",
"\n",
"Especially when you are just starting out with deep learning it's not a good idea to branch out into very different areas to places that deep learning has not been applied to before. That's because if your model does not work at first, you will not know whether it is because you have made a mistake, or if the very problem you are trying to solve is simply not solvable with deep learning. And you won't know where to look to get help. Therefore, it is best at first to start with something where you can find an example online of somebody who has had good results with something that is at least somewhat similar to what you are trying to achieve, or where you can convert your data into a format similar what someone else has used before (such as creating an image from your data). Let's have a look at the state of deep learning, jsut so you know what kinds of things deep learning is good at right now."
"Especially when you are just starting out with deep learning it's not a good idea to branch out into very different areas to places that deep learning has not been applied to before. That's because if your model does not work at first, you will not know whether it is because you have made a mistake, or if the very problem you are trying to solve is simply not solvable with deep learning. And you won't know where to look to get help. Therefore, it is best at first to start with something where you can find an example online of somebody who has had good results with something that is at least somewhat similar to what you are trying to achieve, or where you can convert your data into a format similar what someone else has used before (such as creating an image from your data). Let's have a look at the state of deep learning, just so you know what kinds of things deep learning is good at right now."
]
},
{
@ -159,7 +159,7 @@
"source": [
"The ability of deep learning to combine text and images into a single model is, generally, far better than most people intuitively expect. For example, a deep learning model can be trained on input images, and output captions written in English, and can learn to generate surprisingly appropriate captions automatically for new images! But again, we have the same warning that we discussed in the previous section: there is no guarantee that these captions will actually be correct.\n",
"\n",
"Because of this serious issue we generally recommend that deep learning be used not as an entirely automated process, but as part of a process in which the model and a human user interact closely. This can potentially make humans orders of magnitude more productive than they would be with entirely manual methods, and actually result in more accurate processes than using a human alone. For instance, an automatic system can be used to identify potential strokes directly from CT scans, send a high priority alert to have potential/scans looked at quickly. There is only a three-hour window to treat strokes, so this fast feedback loop could save lives. At the same time, however, all scans could continue to be sent to radiologists in the usual way, so there would be no reduction in human input. Other deep learning models could automatically measure items seen on the scan, and insert those measurements into reports, warning the radiologist about findings that they may have missed, and tell the radiologist about other cases which might be relevant."
"Because of this serious issue we generally recommend that deep learning be used not as an entirely automated process, but as part of a process in which the model and a human user interact closely. This can potentially make humans orders of magnitude more productive than they would be with entirely manual methods, and actually result in more accurate processes than using a human alone. For instance, an automatic system can be used to identify potential strokes directly from CT scans and send a high priority alert to have potential/scans looked at quickly. There is only a three-hour window to treat strokes, so this fast feedback loop could save lives. At the same time, however, all scans could continue to be sent to radiologists in the usual way, so there would be no reduction in human input. Other deep learning models could automatically measure items seen on the scan, and insert those measurements into reports, warning the radiologist about findings that they may have missed, and tell the radiologist about other cases which might be relevant."
]
},
{
@ -173,7 +173,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For analysing timeseries and tabular data, deep learning has recently been making great strides. However, deep learning is generally used as part of an ensemble of multiple types of model. If you already have a system that is using random forests or gradient boosting machines (popular tabular modelling tools that we will learn about soon) then switching to, or adding, deep learning may not result in any dramatic improvement. Deep learning does greatly increase the variety of columns that you can include, for example columns containing natural language (e.g. book titles, reviews, etc), and *high cardinality categorical* columns (i.e. something that contains a large number of discrete choices, such as zip code or product id). On the downside, deep learning models generally take longer to train than random forests or gradient boosting machines, although this is changing thanks to libraries such as [RAPIDS](https://rapids.ai/), which provides GPU acceleration for the whole modeling pipeline. We cover the pros and cons of all these methods in detail in <<chapter_tabular>> in this book."
"For analysing timeseries and tabular data, deep learning has recently been making great strides. However, deep learning is generally used as part of an ensemble of multiple types of model. If you already have a system that is using random forests or gradient boosting machines (popular tabular modelling tools that we will learn about soon) then switching to, or adding, deep learning may not result in any dramatic improvement. Deep learning does greatly increase the variety of columns that you can include, for example columns containing natural language (e.g. book titles, reviews, etc.), and *high cardinality categorical* columns (i.e. something that contains a large number of discrete choices, such as zip code or product id). On the downside, deep learning models generally take longer to train than random forests or gradient boosting machines, although this is changing thanks to libraries such as [RAPIDS](https://rapids.ai/), which provides GPU acceleration for the whole modeling pipeline. We cover the pros and cons of all these methods in detail in <<chapter_tabular>> in this book."
]
},
{
@ -246,7 +246,7 @@
"\n",
"Finally, you could build two **models** for purchase probabilities, conditional on seeing or not seeing a recommendation. The difference between these two probabilities is a utility function for a given recommendation to a customer. It will be low in cases where the algorithm recommends a familiar book that the customer has already rejected (both components are small) or a book that he or she would have bought even without the recommendation (both components are large and cancel each other out).\n",
"\n",
"As you can see, in practice often the practical implementation of your model will require a lot more than just training a model! You'll often need to run experiments to collect more data, and consider how to incorporate your models into the overall system you're developing. Speaking of data, let's now focus on how to find find data for your project."
"As you can see, in practice often the practical implementation of your model will require a lot more than just training a model! You'll often need to run experiments to collect more data, and consider how to incorporate your models into the overall system you're developing. Speaking of data, let's now focus on how to find data for your project."
]
},
{
@ -260,7 +260,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For many types of projects, you may be able to find all the data you need online. The project we'll be completing in this chapter is a *bear detector*. It will discriminate between three types of bear: grizzly, black, and teddy bear. There are many images on the Internet of each type of bear we can use. We just need a way to find them and download them. We've provided a tool you can use for this purpose, so you can follow along with this chapter, creating your own image recognition application for whatever kinds of object you're interested in. In the fast.ai course, thousands of students have presented their work on the course forums, displaying everything from Trinidad hummingbird varieties, to Panama bus types, and even an application that helped one student let his fiancee recognize his sixteen cousins during Christmas vacation!"
"For many types of projects, you may be able to find all the data you need online. The project we'll be completing in this chapter is a *bear detector*. It will discriminate between three types of bear: grizzly, black, and teddy bear. There are many images on the Internet of each type of bear we can use. We just need a way to find them and download them. We've provided a tool you can use for this purpose, so you can follow along with this chapter, creating your own image recognition application for whatever kinds of object you're interested in. In the fast.ai course, thousands of students have presented their work on the course forums, displaying everything from Trinidad hummingbird varieties, to Panama bus types, and even an application that helped one student let his fiancée recognize his sixteen cousins during Christmas vacation!"
]
},
{
@ -550,7 +550,7 @@
"File: ~/git/fastai/fastai/vision/utils.py\n",
"Type: function\n",
"```\n",
"It tells us what argument the function accepts (`fns`) then shows us the source code and the file it comes from. Looking at that source code, we can see it applies the function `verify_image` in parallel and only keep the ones for which the result of that function is `False`, which is consistent with the doc string: it finds the images in `fns` that can't be opened.\n",
"It tells us what argument the function accepts (`fns`) then shows us the source code and the file it comes from. Looking at that source code, we can see it applies the function `verify_image` in parallel and only keeps the ones for which the result of that function is `False`, which is consistent with the doc string: it finds the images in `fns` that can't be opened.\n",
"\n",
"Here are the commands that are very useful in Jupyter notebooks:\n",
"\n",
@ -558,7 +558,7 @@
"- when inside the parenthesis of a function, pressing \"shift\" and \"tab\" simultaneously will display a window with the signature of the function and a short documentation. Pressing it twice will expand the documentation and pressing it three times will open a full window with the same information at the bottom of your screen.\n",
"- in a cell, typing `?func_name` and executing will open a window with the signature of the function and a short documentation.\n",
"- in a cell, typing `??func_name` and executing will open a window with the signature of the function, a short documentation and the source code.\n",
"- if you are using the fasti library, we added a `doc` function for you, executing `doc(func_name)` in a cell will open a window with the signature of the function, a short documentation and links to the source code on GitHub and the full documentation of the funciton in the [documentation of the library](https://docs.fast.ai).\n",
"- if you are using the fastai library, we added a `doc` function for you, executing `doc(func_name)` in a cell will open a window with the signature of the function, a short documentation and links to the source code on GitHub and the full documentation of the function in the [documentation of the library](https://docs.fast.ai).\n",
"- unrelated to the documentation but still very useful to get help, at any point, if you get an error, type `%debug` in the next cell and execute to open the [python debugger](https://docs.python.org/3/library/pdb.html) that will let you inspect the content of every variable."
]
},
@ -587,7 +587,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data.footnote:[Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.]"
"So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data. footnote:[Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.]"
]
},
{
@ -800,7 +800,7 @@
"\n",
"Instead, what we normally do in practice is to randomly select part of the image, and crop to just that part. On each epoch (which is one complete pass through all of our images in the dataset) we randomly select a different part of each image. This means that our model can learn to focus on, and recognize, different features in our images. It also reflects how images work in the real world; different photos of the same thing may be framed in slightly different ways.\n",
"\n",
"Here is a another copy of the previous examples, but this time we are replacing `Resize` with `RandomResizedCrop`, which is the transform that provides the behaviour described above.The most important parameter to pass in is the `min_scale` parameter, which determines how much of the image to select at minimum each time."
"Here is another copy of the previous examples, but this time we are replacing `Resize` with `RandomResizedCrop`, which is the transform that provides the behaviour described above. The most important parameter to pass in is the `min_scale` parameter, which determines how much of the image to select at minimum each time."
]
},
{
@ -901,7 +901,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Time to use the same lined of codes as in <<chapter_intro>> to train our bear classifier.\n",
"Time to use the same lines of code as in <<chapter_intro>> to train our bear classifier.\n",
"\n",
"We don't have a lot of data for our problem (150 pictures of each sort of bear at most), so to train our model, we'll use `RandomResizedCrop` and default `aug_transforms` for our model, on an image size of 224px, which is fairly standard for image classification."
]
@ -1065,7 +1065,7 @@
"source": [
"Each row here represents all the black, grizzly, and teddy bears in our dataset, respectively. Each column represents the images which the model predicted as black, grizzly, and teddy bears, respectively. Therefore, the diagonal of the matrix shows the images which were classified correctly, and the other, off diagonal, cells represent those which were classified incorrectly. This is called a *confusion matrix* and is one of the many ways that fastai allows you to view the results of your model. It is (of course!) calculated using the validation set. With the color coding, the goal is to have white everywhere, except the diagonal where we want dark blue. Our bear classifier isn't making many mistakes!\n",
"\n",
"It's helpful to see where exactly our errors are occuring, to see whether it's due to a dataset problem (e.g. images that aren't bears at all, or are labelled incorrectly, etc), or a model problem (e.g. perhaps it isn't handling images taken with unusual lighting, or from a different angle, etc). To do this, we can sort out images by their *loss*.\n",
"It's helpful to see where exactly our errors are occurring, to see whether it's due to a dataset problem (e.g. images that aren't bears at all, or are labelled incorrectly, etc.), or a model problem (e.g. perhaps it isn't handling images taken with unusual lighting, or from a different angle, etc.). To do this, we can sort out images by their *loss*.\n",
"\n",
"The *loss* is a number that is higher if the model is incorrect (and especially if it's also confident of its incorrect answer), or if it's correct, but not confident of its correct answer. In a couple chapters we'll learn in depth how loss is calculated and used in training process. For now, `plot_top_losses` shows us the images with the highest loss in our dataset. As the title of the output says, each image is labeled with four things: prediction, actual (target label), loss, and probability. The *probability* here is the confidence level, from zero to one, that the model has assigned to its prediction."
]
@ -1172,7 +1172,7 @@
"for idx,cat in cleaner.change(): shutil.move(cleaner.fns[idx], path/cat)\n",
"```\n",
"\n",
"> s: Cleaning the data or getting it ready for your model are two of the biggest challenges for data scientists, one they say take 90% of their time. The fastai library aims at providing tools to make it as easy as possible.\n",
"> s: Cleaning the data or getting it ready for your model are two of the biggest challenges for data scientists, one they say takes 90% of their time. The fastai library aims at providing tools to make it as easy as possible.\n",
"\n",
"We'll be seeing more examples of model-driven data cleaning throughout this book. Once we've cleaned up our data, we can retrain our model. Try it yourself, and see if your accuracy improves!"
]
@ -1354,7 +1354,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We know how to make predictions from our saved model, so we have everything we need to start building our app. We can do it directly in a Jupyter Notenook."
"We know how to make predictions from our saved model, so we have everything we need to start building our app. We can do it directly in a Jupyter Notebook."
]
},
{
@ -1377,7 +1377,7 @@
"\n",
"*IPython widgets* are GUI components that bring together JavaScript and Python functionality in a web browser, and can be created and used within a Jupyter notebook. For instance, the image cleaner that we saw earlier in this chapter is entirely written with IPython widgets. However, we don't want to require users of our application to have to run Jupyter themselves.\n",
"\n",
"That is why *Voilà* exists. It is a system for making applications consisting of IPython widgets available to end-users, without them having to use Jupyter at all. Voila is taking advantage of the fact that a notebook _already is_ a kind of web application, just a rather complex one that depends on another web application Jupyter itself. Essentially, it helps us automatically convert the complex web application which we've already implicitly made (the notebook) into a simpler, easier-to-deploy web application, which functions like a normal web application rather than like a notebook.\n",
"That is why *Voilà* exists. It is a system for making applications consisting of IPython widgets available to end-users, without them having to use Jupyter at all. Voilà is taking advantage of the fact that a notebook _already is_ a kind of web application, just a rather complex one that depends on another web application Jupyter itself. Essentially, it helps us automatically convert the complex web application which we've already implicitly made (the notebook) into a simpler, easier-to-deploy web application, which functions like a normal web application rather than like a notebook.\n",
"\n",
"But we still have the advantage of developing in a notebook. So with ipywidgets, we can build up our GUI step by step. We will use this approach to create a simple image classifier. First, we need a file upload widget:"
]
@ -1644,7 +1644,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We have written all the code necessary for our app. The next step is to convert it in something we can deploy."
"We have written all the code necessary for our app. The next step is to convert it into something we can deploy."
]
},
{
@ -1660,16 +1660,16 @@
"source": [
"Now that we have everything working in this Jupyter notebook, we can create our application. To do this, create a notebook which contains only the code needed to create and show the widgets that you need, and markdown for any text that you want to appear. Have a look at the *bear_classifier* notebook in the book repo to see the simple notebook application we created.\n",
"\n",
"Next, install Voila if you have not already, by copying these lines into a Notebook cell, and executing it (if you're comfortable using the command line, you can also execute these two lines in your terminal, without the `!` prefix):\n",
"Next, install Voilà if you have not already, by copying these lines into a Notebook cell, and executing it (if you're comfortable using the command line, you can also execute these two lines in your terminal, without the `!` prefix):\n",
"\n",
" !pip install voila\n",
" !jupyter serverextension enable voila --sys-prefix\n",
"\n",
"Cells which begin with a `!` do not contain Python code, but instead contain code which is passed to your shell, such as bash, power shell in windows, or so forth. If you are comfortable using the command line (which we'll be learning about later in this book), you can of course simply type these two lines (without the `!` prefix) directly into your terminal. In this case, the first line installs the voila library and application, and the second connects it to your existing Jupyter notebook.\n",
"\n",
"Voila runs Jupyter notebooks, just like the Jupyter notebook server you are using now does, except that it does something very important: it removes all of the cell inputs, and only shows output (including ipywidgets), along with your markdown cells. So what's left is a web application! To view your notebook as a voila web application replace the word \"notebooks\" in your browser's URL with: \"voila/render\". You will see the same content as your notebook, but without any of the code cells.\n",
"Voilà runs Jupyter notebooks, just like the Jupyter notebook server you are using now does, except that it does something very important: it removes all of the cell inputs, and only shows output (including ipywidgets), along with your markdown cells. So what's left is a web application! To view your notebook as a voila web application replace the word \"notebooks\" in your browser's URL with: \"voila/render\". You will see the same content as your notebook, but without any of the code cells.\n",
"\n",
"Of course, you don't need to use Voila or ipywidgets. Your model is just a function you can call: `pred,pred_idx,probs = learn.predict(img)` . So you can use it with any framework, hosted on any platform. And you can take something you've prototyped in ipywidgets and Voila and later convert it into a regular web application. We're showing you this approach in the book because we think it's a great way for data scientists and other folks that aren't web development experts to create applications from their models.\n",
"Of course, you don't need to use Voilà or ipywidgets. Your model is just a function you can call: `pred,pred_idx,probs = learn.predict(img)` . So you can use it with any framework, hosted on any platform. And you can take something you've prototyped in ipywidgets and Voilà and later convert it into a regular web application. We're showing you this approach in the book because we think it's a great way for data scientists and other folks that aren't web development experts to create applications from their models.\n",
"\n",
"We have our app, now let's deploy it!"
]
@ -1720,7 +1720,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The first time you do this Binder will take around 5 minutes to build your site. In other words, is it finding a virtual machine which can run your app, allocating storage, collecting the files needed for Jupyter, for your notebook, and for presenting your notebook as a web application. It's doing all of this behind the scenes.\n",
"The first time you do this Binder will take around 5 minutes to build your site. In other words, it is finding a virtual machine which can run your app, allocating storage, collecting the files needed for Jupyter, for your notebook, and for presenting your notebook as a web application. It's doing all of this behind the scenes.\n",
"\n",
"Finally, once it has started the app running, it will navigate your browser to your new web app. You can share the URL you copied to allow others to access your app as well.\n",
"\n",
@ -1731,7 +1731,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You may well want to deploy your application onto mobile devices, or edge devices such as a Raspberry Pi. There are a lot of libraries and frameworks to allow you to integrate a model directly into a mobile application. However these approaches tend to require a lot of extra steps and boilerplate, and do not always support all the PyTorch and fastai layers that your model might use. In addition, the work you do will depend on what kind of mobile devices you are targeting for deployment. So you might need to do some work to run on iOS devices, different work to run on newer Android devices, different work for older Android devices, etc.. Instead, we recommend wherever possible that you deploy the model itself to a server, and have your mobile or edge application connect to it as a web service.\n",
"You may well want to deploy your application onto mobile devices, or edge devices such as a Raspberry Pi. There are a lot of libraries and frameworks to allow you to integrate a model directly into a mobile application. However these approaches tend to require a lot of extra steps and boilerplate, and do not always support all the PyTorch and fastai layers that your model might use. In addition, the work you do will depend on what kind of mobile devices you are targeting for deployment. So you might need to do some work to run on iOS devices, different work to run on newer Android devices, different work for older Android devices, etc. Instead, we recommend wherever possible that you deploy the model itself to a server, and have your mobile or edge application connect to it as a web service.\n",
"\n",
"There is quite a few upsides to this approach. The initial installation is easier, because you only have to deploy a small GUI application, which connects to the server to do all the heavy lifting. More importantly perhaps, upgrades of that core logic can happen on your server, rather than needing to be distributed to all of your users. Your server can have a lot more memory and processing capacity than most edge devices, and it is far easier to scale those resources if your model becomes more demanding. The hardware that you will have on a server is going to be more standard and more easily supported by fastai and PyTorch, so you don't have to compile your model into a different form.\n",
"\n",
@ -1746,7 +1746,7 @@
"\n",
"Overall, we'd recommend using a simple CPU-based server approach where possible, for as long as you can get away with it. If you're lucky enough to have a very successful application, then you'll be able to justify the investment in more complex deployment approaches at that time.\n",
"\n",
"Congratulations, you have succesfully built a deep learning model and deployed it! Now is a good time to take a pause and think about what could go wrong."
"Congratulations, you have successfully built a deep learning model and deployed it! Now is a good time to take a pause and think about what could go wrong."
]
},
{
@ -1762,7 +1762,7 @@
"source": [
"In practice, a deep learning model will be just one piece of a much bigger system. As we discussed at the start of this chapter, a *data product* requires thinking about the entire end to end process within which our model lives. In this book, we can't hope to cover all the complexity of managing deployed data products, such as managing multiple versions of models, A/B testing, canarying, refreshing the data (should we just grow and grow our datasets all the time, or should we regularly remove some of the old data), handling data labelling, monitoring all this, detecting model rot, and so forth. However, there is an excellent book that covers many deployment issues, which is [Building Machine Learning Powered Applications](https://www.amazon.com/Building-Machine-Learning-Powered-Applications/dp/149204511X), by Emmanuel Ameisen. In this section, we will give an overview of some of the most important issues to consider.\n",
"\n",
"One of the biggest issues with this is that understanding and testing the behavior of a deep learning model is much more difficult than most code that you would write. With normal software development you can analyse the exact steps that the software is taking, and carefully study with of these steps match the desired behaviour that you are trying to create. But with a neural network the behavior emerges from the models attempt to match the training data, rather than being exactly defined.\n",
"One of the biggest issues with this is that understanding and testing the behavior of a deep learning model is much more difficult than most code that you would write. With normal software development you can analyse the exact steps that the software is taking, and carefully study which of these steps match the desired behaviour that you are trying to create. But with a neural network the behavior emerges from the models attempt to match the training data, rather than being exactly defined.\n",
"\n",
"This can result in disaster! For instance, let's say you really were rolling out a bear detection system which will be attached to video cameras around the campsite, and will warn campers of incoming bears. If we used a model trained with the dataset we downloaded, there are going to be all kinds of problems in practice, such as:\n",
"\n",
@ -1822,7 +1822,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the biggest challenges in rolling out a model is that your model may change the behaviour of the system it is a part of. For instance, consider a \"predictive policing\" algorithm that predicts more crime in certain neighborhoods, causing more police officers to be sent to those neighborhoods, which can result in more crime being recorded in those neighborhoods, and so on. In the Royal Statiscal Society paper [To predict and serve](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x), Kristian Lum and William Isaac write: \"predictive policing is aptly named: it is predicting future policing, not future crime\".\n",
"One of the biggest challenges in rolling out a model is that your model may change the behaviour of the system it is a part of. For instance, consider a \"predictive policing\" algorithm that predicts more crime in certain neighborhoods, causing more police officers to be sent to those neighborhoods, which can result in more crime being recorded in those neighborhoods, and so on. In the Royal Statistical Society paper [To predict and serve](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x), Kristian Lum and William Isaac write: \"predictive policing is aptly named: it is predicting future policing, not future crime\".\n",
"\n",
"Part of the issue in this case is that in the presence of *bias* (which we'll discuss in depth in the next chapter), feedback loops can result in negative implications of that bias getting worse and worse. For instance, there are concerns that this is already happening in the US, where there is significant bias in arrest rates on racial grounds. [According to the ACLU](https://www.aclu.org/issues/smart-justice/sentencing-reform/war-marijuana-black-and-white), \"despite roughly equal usage rates, Blacks are 3.73 times more likely than whites to be arrested for marijuana\". The impact of this bias, along with the roll-out of predictive policing algorithms in many parts of the US, led Bärí Williams to [write in the NY Times](https://www.nytimes.com/2017/12/02/opinion/sunday/intelligent-policing-and-my-innocent-children.html): \"The same technology thats the source of so much excitement in my career is being used in law enforcement in ways that could mean that in the coming years, my son, who is 7 now, is more likely to be profiled or arrested — or worse — for no reason other than his race and where we live.\"\n",
"\n",
@ -1874,7 +1874,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Provide an example of where the bear classification model might work poorly, due to structural or style differences to the training data\n",
"1. Provide an example of where the bear classification model might work poorly, due to structural or style differences to the training data.\n",
"1. Where do text models currently have a major deficiency?\n",
"1. What are possible negative societal implications of text generation models?\n",
"1. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?\n",