First batch of edits

2025-04-04 18:00:48 +00:00 · 2020-05-14 05:18:31 -07:00 · 2020-05-14 05:18:31 -07:00 · 7abc2c3979
commit 7abc2c3979
parent 5b70a64d66
42 changed files with 2036 additions and 2441 deletions
--- a/01_intro.ipynb
+++ b/01_intro.ipynb
--- a/02_production.ipynb
+++ b/02_production.ipynb
--- a/03_ethics.ipynb
+++ b/03_ethics.ipynb
@ -18,14 +18,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: Acknowledgement: Dr Rachel Thomas"
+    "### Sidebar: Acknowledgement: Dr. Rachel Thomas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This chapter was co-authored by Dr Rachel Thomas, the co-founder of fast.ai, and founding director of the Center for Applied Data Ethics at the University of San Francisco. It largely follows a subset of the syllabus she developed for the [Introduction to Data Ethics](https://ethics.fast.ai) course."
+    "This chapter was co-authored by Dr. Rachel Thomas, the cofounder of fast.ai, and founding director of the Center for Applied Data Ethics at the University of San Francisco. It largely follows a subset of the syllabus she developed for the [Introduction to Data Ethics](https://ethics.fast.ai) course."
   ]
  },
  {
@ -39,9 +39,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As we discussed in Chapters 1 and 2, sometimes, machine learning models can go wrong. They can have bugs. They can be presented with data that they haven't seen before, and behave in ways we don't expect. Or, they could work exactly as designed, but be used for something that you would much prefer they were never ever used for.\n",
+    "As we discussed in Chapters 1 and 2, sometimes machine learning models can go wrong. They can have bugs. They can be presented with data that they haven't seen before, and behave in ways we don't expect. Or they could work exactly as designed, but be used for something that we would much prefer they were never, ever used for.\n",
    "\n",
-    "Because deep learning is such a powerful tool and can be used for so many things, it becomes particularly important that we consider the consequences of our choices. The philosophical study of *ethics* is the study of right and wrong, including how we can define those terms, recognise right and wrong actions, and understand the connection between actions and consequences. The field of *data ethics* has been around for a long time, and there are many academics focused on this field. It is being used to help define policy in many jurisdictions; it is being used in companies big and small to consider how best to ensure good societal outcomes from product development; and it is being used by researchers who want to make sure that the work they are doing is used for good, and not for bad.\n",
+    "Because deep learning is such a powerful tool and can be used for so many things, it becomes particularly important that we consider the consequences of our choices. The philosophical study of *ethics* is the study of right and wrong, including how we can define those terms, recognize right and wrong actions, and understand the connection between actions and consequences. The field of *data ethics* has been around for a long time, and there are many academics focused on this field. It is being used to help define policy in many jurisdictions; it is being used in companies big and small to consider how best to ensure good societal outcomes from product development; and it is being used by researchers who want to make sure that the work they are doing is used for good, and not for bad.\n",
    "\n",
    "As a deep learning practitioner, therefore, it is likely that at some point you are going to be put in a situation where you need to consider data ethics. So what is data ethics? It's a subfield of ethics, so let's start there."
   ]
@ -50,30 +50,30 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "> J: At university, philosophy of ethics was my main thing (it would have been the topic of my thesis, if I'd finished it, instead of dropping out to join the real-world). Based on the years I spent studying ethics, I can tell you this: no one really agrees on what right and wrong are, whether they exist, how to spot them, which people are good, and which bad, or pretty much anything else. So don't expect too much from the theory! We're going to focus on examples and thought starters here, not theory."
+    "> J: At university, philosophy of ethics was my main thing (it would have been the topic of my thesis, if I'd finished it, instead of dropping out to join the real world). Based on the years I spent studying ethics, I can tell you this: no one really agrees on what right and wrong are, whether they exist, how to spot them, which people are good, and which bad, or pretty much anything else. So don't expect too much from the theory! We're going to focus on examples and thought starters here, not theory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In answering the question [What is Ethics](https://www.scu.edu/ethics/ethics-resources/ethical-decision-making/what-is-ethics/), The Markkula Center for Applied Ethics says that *ethics* refers to:\n",
+    "In answering the question [\"What Is Ethics\"](https://www.scu.edu/ethics/ethics-resources/ethical-decision-making/what-is-ethics/), The Markkula Center for Applied Ethics says that the term refers to:\n",
    "\n",
-    "- Well-founded standards of right and wrong that prescribe what humans ought to do, and\n",
+    "- Well-founded standards of right and wrong that prescribe what humans ought to do\n",
    "- The study and development of one's ethical standards.\n",
    "\n",
-    "There is no list of right answers for ethics. There is no list of do's and dont's. Ethics is complicated, and context-dependent. It involves the perspectives of many stakeholders. Ethics is a muscle that you have to develop and practice. In this chapter, our goal is to provide some signposts to help you on that journey.\n",
+    "There is no list of right answers. There is no list of do and don't. Ethics is complicated, and context-dependent. It involves the perspectives of many stakeholders. Ethics is a muscle that you have to develop and practice. In this chapter, our goal is to provide some signposts to help you on that journey.\n",
    "\n",
-    "Spotting ethical issues is best to do as part of a collaborative team. This is the only way you can really incorporate different perspectives. Different people's backgrounds will help them to see things which may not be obvious to you. Working with a team is helpful for many \"muscle building\" activities, including this one.\n",
+    "Spotting ethical issues is best to do as part of a collaborative team. This is the only way you can really incorporate different perspectives. Different people's backgrounds will help them to see things which may not be obvious to you. Working with a team is helpful for many \"muscle-building\" activities, including this one.\n",
    "\n",
-    "This chapter is certainly not the only part of the book where we talk about data ethics, but it's good to have a place where we focus on it for a while. To get oriented, it's perhaps easiest to look at a few examples. So we picked out three that we think illustrate effectively some of the key topics."
+    "This chapter is certainly not the only part of the book where we talk about data ethics, but it's good to have a place where we focus on it for a while. To get oriented, it's perhaps easiest to look at a few examples. So, we picked out three that we think illustrate effectively some of the key topics."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Key examples for data ethics"
+    "## Key Examples for Data Ethics"
   ]
  },
  {
@ -82,32 +82,32 @@
   "source": [
    "We are going to start with three specific examples that illustrate three common ethical issues in tech:\n",
    "\n",
-    "1.  **Recourse processes**: Arkansas's buggy healthcare algorithms left patients stranded\n",
-    "2.  **Feedback loops**: YouTube's recommendation system helped unleash a conspiracy theory boom\n",
-    "3.  **Bias**: When a traditionally African-American name is searched for on Google, it displays ads for criminal background checks\n",
+    "1.  *Recourse processes*--Arkansas's buggy healthcare algorithms left patients stranded.\n",
+    "2.  *Feedback loops*--YouTube's recommendation system helped unleash a conspiracy theory boom.\n",
+    "3.  *Bias*--When a traditionally African-American name is searched for on Google, it displays ads for criminal background checks.\n",
    "\n",
-    "In fact, for every concept that we introduce in this chapter, we are going to provide at least one specific example. For each one, have a think about what you could have done in this situation, and think about what kinds of obstructions there might have been to you getting that done. How would you deal with them? What would you look out for?"
+    "In fact, for every concept that we introduce in this chapter, we are going to provide at least one specific example. For each one, think about what you could have done in this situation, and what kinds of obstructions there might have been to you getting that done. How would you deal with them? What would you look out for?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Bugs and recourse: Buggy algorithm used for healthcare benefits"
+    "### Bugs and Recourse: Buggy Algorithm Used for Healthcare Benefits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The Verge investigated software used in over half of the U.S. states to determine how much healthcare people receive, and documented their findings in an article [What Happens When an Algorithm Cuts Your Healthcare](https://www.theverge.com/2018/3/21/17144260/healthcare-medicaid-algorithm-arkansas-cerebral-palsy). After implementation of the algorithm in Arkansas, people (many with severe disabilities) drastically had their healthcare cut. For instance, Tammy Dobbs, a woman with cerebral palsy who needs an aid to help her to get out of bed, to go to the bathroom, to get food, and more, had her hours of help suddenly reduced by 20 hours a week. She couldn’t get any explanation for why her healthcare was cut. Eventually, a court case revealed that there were mistakes in the software implementation of the algorithm, negatively impacting people with diabetes or cerebral palsy. However, Dobbs and many other people reliant on these health care benefits live in fear that their benefits could again be cut suddenly and inexplicably."
+    "The Verge investigated software used in over half of the US states to determine how much healthcare people receive, and documented their findings in the article [\"What Happens When an Algorithm Cuts Your Healthcare\"](https://www.theverge.com/2018/3/21/17144260/healthcare-medicaid-algorithm-arkansas-cerebral-palsy). After implementation of the algorithm in Arkansas, hundreds of people (many with severe disabilities) had their healthcare drastically cut. For instance, Tammy Dobbs, a woman with cerebral palsy who needs an aid to help her to get out of bed, to go to the bathroom, to get food, and more, had her hours of help suddenly reduced by 20 hours a week. She couldn’t get any explanation for why her healthcare was cut. Eventually, a court case revealed that there were mistakes in the software implementation of the algorithm, negatively impacting people with diabetes or cerebral palsy. However, Dobbs and many other people reliant on these healthcare benefits live in fear that their benefits could again be cut suddenly and inexplicably."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Feedback loops: YouTube's recommendation system"
+    "### Feedback Loops: YouTube's Recommendation System"
   ]
  },
  {
@ -116,44 +116,44 @@
   "source": [
    "Feedback loops can occur when your model is controlling the next round of data you get. The data that is returned quickly becomes flawed by the software itself.\n",
    "\n",
-    "For instance, in <<chapter_production>> we briefly mentioned the reinforcement learning algorithm which Google introduced for YouTube's recommendation system. YouTube has 1.9bn users, who watch over 1 billion hours of YouTube videos a day. Their algorithm, which was designed to optimise watch time, is responsible for around 70% of the content that is watched. It led to out-of-control feedback loops, leading the New York Times to run the headline \"YouTube Unleashed a Conspiracy Theory Boom. Can It Be Contained?\". Ostensibly recommendation systems are predicting what content people will like, but they also have a lot of power in determining what content people even see."
+    "For instance, YouTube has 1.9 billion users, who watch over 1 billion hours of YouTube videos a day. Its recommendation algorithm (built by Google), which was designed to optimize watch time, is responsible for around 70% of the content that is watched. But there was a problem: it led to out-of-control feedback loops, leading the *New York Times* to run the headline [\"YouTube Unleashed a Conspiracy Theory Boom. Can It Be Contained?\"](https://www.nytimes.com/2019/02/19/technology/youtube-conspiracy-stars.html). Ostensibly recommendation systems are predicting what content people will like, but they also have a lot of power in determining what content people even see."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Bias: Professor Lantanya Sweeney \"arrested\""
+    "### Bias: Professor Lantanya Sweeney \"Arrested\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Dr. Latanya Sweeney is a professor at Harvard and director of their data privacy lab. In the paper [Discrimination in Online Ad Delivery](https://arxiv.org/abs/1301.6822) (see <<lantanya_arrested>>) she describes her discovery that googling her name resulted in advertisements saying \"Latanya Sweeney arrested\" even though she is the only Latanya Sweeney and has never been arrested. However when she googled other names, such as Kirsten Lindquist, she got more neutral ads, even though Kirsten Lindquist has been arrested three times."
+    "Dr. Latanya Sweeney is a professor at Harvard and director of the university's data privacy lab. In the paper [\"Discrimination in Online Ad Delivery\"](https://arxiv.org/abs/1301.6822) (see <<lantanya_arrested>>) she describes her discovery that Googling her name resulted in advertisements saying \"Latanya Sweeney, arrested?\" even though she is the only known Latanya Sweeney and has never been arrested. However when she Googled other names, such as \"Kirsten Lindquist,\" she got more neutral ads, even though Kirsten Lindquist has been arrested three times."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<img src=\"images/ethics/image1.png\" id=\"lantanya_arrested\" caption=\"Google search showing Professor Lantanya Sweeney 'arrested'\" alt=\"Screenshot of google search showing Professor Lantanya Sweeney 'arrested'\" width=\"400\">"
+    "<img src=\"images/ethics/image1.png\" id=\"lantanya_arrested\" caption=\"Google search showing ads about Professor Lantanya Sweeney's arrest record\" alt=\"Screenshot of google search showing ads about Professor Lantanya Sweeney's arrest record\" width=\"400\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Being a computer scientist, she studied this systematically, and looked at over 2000 names. She found that this pattern held where historically black names received advertisements suggesting that the person had a criminal record. Whereas, white names had more neutral advertisements.\n",
+    "Being a computer scientist, she studied this systematically, and looked at over 2000 names. She found a clear pattern where historically Black names received advertisements suggesting that the person had a criminal record, whereas, white names had more neutral advertisements.\n",
    "\n",
-    "This is an example of bias. It can make a big difference to people's lives — for instance, if a job applicant is googled then it may appear that they have a criminal record when they do not."
+    "This is an example of bias. It can make a big difference to people's lives—for instance, if a job applicant is Googled it may appear that they have a criminal record when they do not."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Why does this matter?"
+    "### Why Does This Matter?"
   ]
  },
  {
@ -162,11 +162,11 @@
   "source": [
    "One very natural reaction to considering these issues is: \"So what? What's that got to do with me? I'm a data scientist, not a politician. I'm not one of the senior executives at my company who make the decisions about what we do. I'm just trying to build the most predictive model I can.\"\n",
    "\n",
-    "These are very reasonable questions. But we're going to try to convince you that the answer is: everybody who is training models absolutely needs to consider how their model will be used. And to consider how to best ensure that it is used as positively as possible. There are things you can do. And if you don't do these things, then things can go pretty badly.\n",
+    "These are very reasonable questions. But we're going to try to convince you that the answer is that everybody who is training models absolutely needs to consider how their models will be used, and consider how to best ensure that they are used as positively as possible. There are things you can do. And if you don't do them, then things can go pretty badly.\n",
    "\n",
-    "One particularly hideous example of what happens when technologists focus on technology at all costs is the story of IBM and Nazi Germany. A Swiss judge ruled \"It does not thus seem unreasonable to deduce that IBM's technical assistance facilitated the tasks of the Nazis in the commission of their crimes against humanity, acts also involving accountancy and classification by IBM machines and utilized in the concentration camps themselves.\"\n",
+    "One particularly hideous example of what happens when technologists focus on technology at all costs is the story of IBM and Nazi Germany. In 2001, a Swiss judge ruled that it was not unreasonable \"to deduce that IBM's technical assistance facilitated the tasks of the Nazis in the commission of their crimes against humanity, acts also involving accountancy and classification by IBM machines and utilized in the concentration camps themselves.\"\n",
    "\n",
-    "IBM, you see, supplied the Nazis with data tabulation products necessary to track the extermination of Jews and other groups on a massive scale. This was driven from the top of the company, with marketing to Hitler and his leadership team. Company President Thomas Watson personally approved the 1939 release of special IBM alphabetizing machines to help organize the deportation of Polish Jews. Pictured here is Adolf Hitler (far left) meeting with IBM CEO Tom Watson Sr. (2nd from left), shortly before Hitler awarded Watson a special “Service to the Reich” medal in 1937:"
+    "IBM, you see, supplied the Nazis with data tabulation products necessary to track the extermination of Jews and other groups on a massive scale. This was driven from the top of the company, with marketing to Hitler and his leadership team. Company President Thomas Watson personally approved the 1939 release of special IBM alphabetizing machines to help organize the deportation of Polish Jews. Pictured in <<meeting>> is Adolf Hitler (far left) meeting with IBM CEO Tom Watson Sr. (second from left), shortly before Hitler awarded Watson a special “Service to the Reich” medal in 1937."
   ]
  },
  {
@ -180,7 +180,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "But it also happened throughout the organization. IBM and its subsidiaries provided regular training and maintenance on-site at the concentration camps: printing off cards, configuring machines, and repairing them as they broke frequently. IBM set up categorizations on their punch card system for the way that each person was killed, which group they were assigned to, and the logistical information necessary to track them through the vast Holocaust system. IBM's code for Jews in the concentration camps  was 8, where around 6,000,000 were killed. Its code for Romanis was 12 (they were labeled by the Nazis as \"asocials\", with over 300,000 killed in the *Zigeunerlager*, or “Gypsy camp”). General executions were coded as 4, death in the gas chambers as 6."
+    "But this was not an isolated incident--the organization's involvement was extensive. IBM and its subsidiaries provided regular training and maintenance onsite at the concentration camps: printing off cards, configuring machines, and repairing them as they broke frequently. IBM set up categorizations on its punch card system for the way that each person was killed, which group they were assigned to, and the logistical information necessary to track them through the vast Holocaust system. IBM's code for Jews in the concentration camps  was 8: some 6,000,000 were killed. Its code for Romanis was 12 (they were labeled by the Nazis as \"asocials,\" with over 300,000 killed in the *Zigeunerlager*, or “Gypsy camp”). General executions were coded as 4, death in the gas chambers as 6."
   ]
  },
  {
@ -194,26 +194,26 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Of course, the project managers and engineers and technicians involved were just living their ordinary lives. Caring for their families, going to the church on Sunday, doing their jobs the best they could. Following orders. The marketers were just doing what they could to meet their business development goals. Edwin Black, author of \"IBM and the Holocaust\", said: \"To the blind technocrat, the means were more important than the ends. The destruction of the Jewish people became even less important because the invigorating nature of IBM's technical achievement was only heightened by the fantastical profits to be made at a time when bread lines stretched across the world.\"\n",
+    "Of course, the project managers and engineers and technicians involved were just living their ordinary lives. Caring for their families, going to the church on Sunday, doing their jobs the best they could. Following orders. The marketers were just doing what they could to meet their business development goals. As Edwin Black, author of *IBM and the Holocaust* (Dialog Press) observed: \"To the blind technocrat, the means were more important than the ends. The destruction of the Jewish people became even less important because the invigorating nature of IBM's technical achievement was only heightened by the fantastical profits to be made at a time when bread lines stretched across the world.\"\n",
    "\n",
-    "Step back for a moment and consider: how would you feel if you discovered that you had been part of a system that ended up hurting society? Would you even know? Would you be open to finding out? How can you help make sure this doesn't happen? We have described the most extreme situation here in Nazi Germany, but there are many negative societal consequences happening due to AI and machine learning right now, some of which we'll describe in this chapter.\n",
+    "Step back for a moment and consider: How would you feel if you discovered that you had been part of a system that ended up hurting society? Would you be open to finding out? How can you help make sure this doesn't happen? We have described the most extreme situation here, but there are many negative societal consequences linked to AI and machine learning being observed today, some of which we'll describe in this chapter.\n",
    "\n",
-    "It's not just a moral burden either. Sometimes, technologists pay very directly for their actions. For instance, the first person who was jailed as a result of the Volkswagen scandal, where the car company cheated on their diesel emissions tests, was not the manager that oversaw the project, or an executive at the helm of the company. It was one of the engineers, James Liang, who just did what he was told.\n",
+    "It's not just a moral burden, either. Sometimes technologists pay very directly for their actions. For instance, the first person who was jailed as a result of the Volkswagen scandal, where the car company was revealed to have cheated on its diesel emissions tests, was not the manager that oversaw the project, or an executive at the helm of the company. It was one of the engineers, James Liang, who just did what he was told.\n",
    "\n",
-    "On the other hand, if a project you are involved in turns out to make a huge positive impact on even one person, this is going to make you feel pretty great!\n",
+    "Of course, it's not all bad--if a project you are involved in turns out to make a huge positive impact on even one person, this is going to make you feel pretty great!\n",
    "\n",
-    "Okay, so hopefully we have convinced you that you ought to care. But what should you do? As data scientists, we're naturally inclined to focus on making our model better at optimizing some metric. But optimizing that metric may not actually lead to better outcomes. And even if optimizing that metric *does* help create better outcomes, it almost certainly won't be the only thing that matters. Consider the pipeline of steps that occurs between the development of a model or an algorithm by a researcher or practitioner, and the point at which this work is actually used to make some decision. This entire pipeline needs to be considered *as a whole* if we're to have a hope of getting the kinds of outcomes we want.\n",
+    "Okay, so hopefully we have convinced you that you ought to care. But what should you do? As data scientists, we're naturally inclined to focus on making our models better by optimizing some metric or other. But optimizing that metric may not actually lead to better outcomes. And even if it *does* help create better outcomes, it almost certainly won't be the only thing that matters. Consider the pipeline of steps that occurs between the development of a model or an algorithm by a researcher or practitioner, and the point at which this work is actually used to make some decision. This entire pipeline needs to be considered *as a whole* if we're to have a hope of getting the kinds of outcomes we want.\n",
    "\n",
-    "Normally there is a very long chain from one end to the other. This is especially true if you are a researcher where you don't even know if your research will ever get used for anything, or if you're involved in data collection, which is even earlier in the pipeline. But no-one is better placed to inform everyone involved in this chain about the capabilities, constraints, and details of your work than you are. Although there's no \"silver bullet\" that can ensure your work is used the right way, by getting involved in the process, and asking the right questions, you can at the very least ensure that the right issues are being considered.\n",
+    "Normally there is a very long chain from one end to the other. This is especially true if you are a researcher, where you might not even know if your research will ever get used for anything, or if you're involved in data collection, which is even earlier in the pipeline. But no one is better placed to inform everyone involved in this chain about the capabilities, constraints, and details of your work than you are. Although there's no \"silver bullet\" that can ensure your work is used the right way, by getting involved in the process, and asking the right questions, you can at the very least ensure that the right issues are being considered.\n",
    "\n",
-    "Sometimes, the right response to being asked to do a piece of work is to just say \"no\". Often, however, the response we hear is \"if I don’t do it, someone else will\". But consider this: if you’ve been picked for the job, you’re the best person they’ve found; so if you don’t do it, the best person isn’t working on that project. If the first 5 they ask all say no too, then even better!"
+    "Sometimes, the right response to being asked to do a piece of work is to just say \"no.\" Often, however, the response we hear is, \"If I don’t do it, someone else will.\" But consider this: if you’ve been picked for the job, you’re the best person they’ve found to do it--so if you don’t do it, the best person isn’t working on that project. If the first five people they ask all say no too, even better!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Integrating machine learning with product design"
+    "## Integrating Machine Learning with Product Design"
   ]
  },
  {
@ -224,25 +224,25 @@
    "\n",
    "These are not just algorithm questions. They are data product design questions. But the product managers, executives, judges, journalists, doctors… whoever ends up developing and using the system of which your model is a part will not be well-placed to understand the decisions that you made, let alone change them.\n",
    "\n",
-    "For instance, two studies found that Amazon’s facial recognition software produced [inaccurate](https://www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html) and [racially biased results](https://www.theverge.com/2019/1/25/18197137/amazon-rekognition-facial-recognition-bias-race-gender). Amazon claimed that the researchers should have changed the default parameters; they did not explain how it would change the racially biased results. Furthermore, it turned out that [Amazon was not instructing police departments](https://gizmodo.com/defense-of-amazons-face-recognition-tool-undermined-by-1832238149) that used its software to do this either. There was, presumably, a big distance between the researchers that developed these algorithms, and the Amazon documentation staff that wrote the guidelines provided to the police. A lack of tight integration led to serious problems for society, the police, and Amazon themselves. It turned out that their system erroneously *matched* 28 members of congress to criminal mugshots!  (And these members of congress wrongly matched to criminal mugshots disproportionately included people of color as seen in <<congressmen>>.)"
+    "For instance, two studies found that Amazon’s facial recognition software produced [inaccurate](https://www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html) and [racially biased](https://www.theverge.com/2019/1/25/18197137/amazon-rekognition-facial-recognition-bias-race-gender) results. Amazon claimed that the researchers should have changed the default parameters, without explaining how this would have changed the biased results. Furthermore, it turned out that [Amazon was not instructing police departments](https://gizmodo.com/defense-of-amazons-face-recognition-tool-undermined-by-1832238149) that used its software to do this either. There was, presumably, a big distance between the researchers that developed these algorithms and the Amazon documentation staff that wrote the guidelines provided to the police. A lack of tight integration led to serious problems for society at large, the police, and Amazon themselves. It turned out that their system erroneously matched 28 members of congress to criminal mugshots!  (And the Congresspeople wrongly matched to criminal mugshots were disproportionately people of color, as seen in <<congressmen>>.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<img src=\"images/ethics/image4.png\" id=\"congressmen\" caption=\"Congressmen matched to criminal mugshots by Amazon software\" alt=\"Picture of the congressmen matched to criminal mugshots by Amazon software, they are disproportionatedly people of color\" width=\"500\">"
+    "<img src=\"images/ethics/image4.png\" id=\"congressmen\" caption=\"Congresspeople matched to criminal mugshots by Amazon software\" alt=\"Picture of the congresspeople matched to criminal mugshots by Amazon software, they are disproportionatedly people of color\" width=\"500\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Data scientists need to be part of a cross-disciplinary team. And researchers need to work closely with the kinds of people who will end up using their research. Better still is if the domain experts themselves have learnt enough to be able to train and debug some models themselves — hopefully there are a few of you reading this book right now!\n",
+    "Data scientists need to be part of a cross-disciplinary team. And researchers need to work closely with the kinds of people who will end up using their research. Better still is if the domain experts themselves have learned enough to be able to train and debug some models themselves—hopefully there are a few of you reading this book right now!\n",
    "\n",
-    "The modern workplace is a very specialised place. Everybody tends to have very well-defined jobs to perform. Especially in large companies, it can be very hard to know what all the pieces of the puzzle are. Sometimes companies even intentionally obscure the overall project goals that are being worked on, if they know that their employees are not going to like the answers. This is sometimes done by compartmentalising pieces as much as possible.\n",
+    "The modern workplace is a very specialized place. Everybody tends to have well-defined jobs to perform. Especially in large companies, it can be hard to know what all the pieces of the puzzle are. Sometimes companies even intentionally obscure the overall project goals that are being worked on, if they know that their employees are not going to like the answers. This is sometimes done by compartmentalising pieces as much as possible.\n",
    "\n",
-    "In other words, we're not saying that any of this is easy. It's hard. It's really hard. We all have to do our best. And we have often seen that the people who do get involved in the higher-level context of these projects, and attempt to develop cross-disciplinary capabilities and teams, become some of the most important and well rewarded members of their organisations. It's the kind of work that tends to be highly appreciated by senior executives, even if it is sometimes considered rather uncomfortable by middle management."
+    "In other words, we're not saying that any of this is easy. It's hard. It's really hard. We all have to do our best. And we have often seen that the people who do get involved in the higher-level context of these projects, and attempt to develop cross-disciplinary capabilities and teams, become some of the most important and well rewarded members of their organizations. It's the kind of work that tends to be highly appreciated by senior executives, even if it is sometimes considered rather uncomfortable by middle management."
   ]
  },
  {
@ -256,12 +256,12 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Data ethics is a big field, and we can't cover everything. Instead, we're going to pick a few topics which we think are particularly relevant:\n",
+    "Data ethics is a big field, and we can't cover everything. Instead, we're going to pick a few topics that we think are particularly relevant:\n",
    "\n",
-    "- need for recourse and accountability\n",
-    "- feedback loops\n",
-    "- bias\n",
-    "- disinformation"
+    "- The need for recourse and accountability\n",
+    "- Feedback loops\n",
+    "- Bias\n",
+    "- Disinformation"
   ]
  },
  {
@ -275,16 +275,16 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Recourse and accountability"
+    "### Recourse and Accountability"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In a complex system, it is easy for no one person to feel responsible for outcomes. While this is understandable, it does not lead to good results. In the earlier example of the Arkansas healthcare system in which a bug led to people with cerebral palsy losing access to needed care, the creator of the algorithm blamed government officials, and government officials could blame those who implemented the software. NYU professor Danah Boyd described this phenomenon: \"bureaucracy has often been used to evade responsibility, and today's algorithmic systems are extending bureaucracy.\"\n",
+    "In a complex system, it is easy for no one person to feel responsible for outcomes. While this is understandable, it does not lead to good results. In the earlier example of the Arkansas healthcare system in which a bug led to people with cerebral palsy losing access to needed care, the creator of the algorithm blamed government officials, and government officials blamed those who implemented the software. NYU professor [Danah Boyd](https://www.youtube.com/watch?v=NTl0yyPqf3E) described this phenomenon: \"Bureaucracy has often been used to shift or evade responsibility... Today's algorithmic systems are extending bureaucracy.\"\n",
    "\n",
-    "An additional reason why recourse is so necessary is because data often contains errors. Mechanisms for audits and error-correction are crucial. A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as “admitting to being gang members”). In this case, there was no process in place for correcting mistakes or removing people once they’d been added. Another example is the US credit report system: in a large-scale study of credit reports by the FTC (Federal Trade Commission) in 2012, it was found that 26% of consumers had at least one mistake in their files, and 5% had errors that could be devastating.  Yet, the process of getting such errors corrected is incredibly slow and opaque. When public-radio reporter Bobby Allyn discovered that he was erroneously listed as having a firearms conviction, it took him \"more than a dozen phone calls, the handiwork of a county court clerk and six weeks to solve the problem. And that was only after I contacted the company’s communications department as a journalist.\" (as covered in the article [How the careless errors of credit reporting agencies are ruining people’s lives](https://www.washingtonpost.com/posteverything/wp/2016/09/08/how-the-careless-errors-of-credit-reporting-agencies-are-ruining-peoples-lives/))\n",
+    "An additional reason why recourse is so necessary is because data often contains errors. Mechanisms for audits and error correction are crucial. A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as “admitting to being gang members”). In this case, there was no process in place for correcting mistakes or removing people once they’d been added. Another example is the US credit report system: in a large-scale study of credit reports by the Federal Trade Commission (FTC) in 2012, it was found that 26% of consumers had at least one mistake in their files, and 5% had errors that could be devastating.  Yet, the process of getting such errors corrected is incredibly slow and opaque. When public radio reporter [Bobby Allyn](https://www.washingtonpost.com/posteverything/wp/2016/09/08/how-the-careless-errors-of-credit-reporting-agencies-are-ruining-peoples-lives/) discovered that he was erroneously listed as having a firearms conviction, it took him \"more than a dozen phone calls, the handiwork of a county court clerk and six weeks to solve the problem. And that was only after I contacted the company’s communications department as a journalist.\"\n",
    "\n",
    "As machine learning practitioners, we do not always think of it as our responsibility to understand how our algorithms end up being implemented in practice. But we need to."
   ]
@ -293,19 +293,19 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Feedback loops"
+    "### Feedback Loops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We have already explained in <<chapter_intro>> how an algorithm can interact with its enviromnent to create a feedback loop, making predictions that reinforce actions taken in the real world, which lead to predictions even more pronounced in the same direction. \n",
-    "As an example, we'll discuss YouTube's recommendation system. A couple of years ago Google talked about how they had introduced reinforcement learning (closely related to deep learning, but where your loss function represents a result which could be a long time after an action occurs) to improve their recommendation system. They described how they used an algorithm which made recommendations such that watch time would be optimised.\n",
+    "We explained in <<chapter_intro>> how an algorithm can interact with its enviromnent to create a feedback loop, making predictions that reinforce actions taken in the real world, which lead to predictions even more pronounced in the same direction. \n",
+    "As an example, let's again consider YouTube's recommendation system. A couple of years ago the Google team talked about how they had introduced reinforcement learning (closely related to deep learning, but where your loss function represents a result potentially a long time after an action occurs) to improve YouTube's recommendation system. They described how they used an algorithm that made recommendations such that watch time would be optimized.\n",
    "\n",
-    "However, human beings tend to be drawn towards controversial content. This meant that videos about things like conspiracy theories started to get recommended more and more by the recommendation system. Furthermore, it turns out that the kinds of people that are interested in conspiracy theories are also people that watch a lot of online videos! So, they started to get drawn more and more towards YouTube. The increasing number of conspiracy theorists watching YouTube resulted in the algorithm recommending more and more conspiracy theories and other extremist content, which resulted in more extremists watching videos on YouTube, and more people watching YouTube developing extremist views, which led to the algorithm recommending more extremist content... The system became so out of control that in February 2019 it led the New York Times to run the headline \"YouTube Unleashed a Conspiracy Theory Boom. Can It Be Contained?\"footnote:[https://www.nytimes.com/2019/02/19/technology/youtube-conspiracy-stars.html]\n",
+    "However, human beings tend to be drawn to controversial content. This meant that videos about things like conspiracy theories started to get recommended more and more by the recommendation system. Furthermore, it turns out that the kinds of people that are interested in conspiracy theories are also people that watch a lot of online videos! So, they started to get drawn more and more toward YouTube. The increasing number of conspiracy theorists watching videos on YouTube resulted in the algorithm recommending more and more conspiracy theory and other extremist content, which resulted in more extremists watching videos on YouTube, and more people watching YouTube developing extremist views, which led to the algorithm recommending more extremist content... The system was spiraling out of control.\n",
    "\n",
-    "The New York Times published another article on YouTube's recommendation system, titled [On YouTube’s Digital Playground, an Open Gate for Pedophiles](https://www.nytimes.com/2019/06/03/world/americas/youtube-pedophiles.html). The article started with this chilling story:"
+    "And this phenomenon was not contained to this particular type of content. In June 2019 the *New York Times* published an article on YouTube's recommendation system, titled [\"On YouTube’s Digital Playground, an Open Gate for Pedophiles\"](https://www.nytimes.com/2019/06/03/world/americas/youtube-pedophiles.html). The article started with this chilling story:"
   ]
  },
  {
@ -325,9 +325,9 @@
    "\n",
    "No one at Google planned to create a system that turned family videos into porn for pedophiles. So what happened?\n",
    "\n",
-    "Part of the problem here is the centrality of metrics in driving a financially important system. When an algorithm has a metric to optimise, as you have seen, it will do everything it can to optimise that number. This tends to lead to all kinds of edge cases, and humans interacting with a system will search for, find, and exploit these edge cases and feedback loops for their advantage.\n",
+    "Part of the problem here is the centrality of metrics in driving a financially important system. When an algorithm has a metric to optimize, as you have seen, it will do everything it can to optimize that number. This tends to lead to all kinds of edge cases, and humans interacting with a system will search for, find, and exploit these edge cases and feedback loops for their advantage.\n",
    "\n",
-    "There are signs that this is exactly what has happened with YouTube's recommendation system. The Guardian ran an article [How an ex-YouTube insider investigated its secret algorithm](https://www.theguardian.com/technology/2018/feb/02/youtube-algorithm-election-clinton-trump-guillaume-chaslot) about Guillaume Chaslot, an ex-YouTube engineer who created AlgoTransparency, which tracks these issues. Chaslot published the chart in <<ethics_yt_rt>>, following the release of Robert Mueller's \"Report on the Investigation Into Russian Interference in the 2016 Presidential Election\"."
+    "There are signs that this is exactly what has happened with YouTube's recommendation system. *The Guardian* ran an article called [\"How an ex-YouTube Insider Investigated its Secret Algorithm\"](https://www.theguardian.com/technology/2018/feb/02/youtube-algorithm-election-clinton-trump-guillaume-chaslot) about Guillaume Chaslot, an ex-YouTube engineer who created AlgoTransparency, which tracks these issues. Chaslot published the chart in <<ethics_yt_rt>>, following the release of Robert Mueller's \"Report on the Investigation Into Russian Interference in the 2016 Presidential Election.\""
   ]
  },
  {
@ -341,29 +341,29 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Russia Today's coverage of the Mueller report was an extreme outlier in how many channels were recommending it. This suggests the possibility that Russia Today, a state-owned Russia media outlet, has been successful in gaming YouTube's recommendation algorithm.  The lack of transparency of systems like this makes it hard to uncover the kinds of problems that we're discussing.\n",
+    "Russia Today's coverage of the Mueller report was an extreme outlier in terms of how many channels were recommending it. This suggests the possibility that Russia Today, a state-owned Russia media outlet, has been successful in gaming YouTube's recommendation algorithm. Unfortunately, the lack of transparency of systems like this makes it hard to uncover the kinds of problems that we're discussing.\n",
    "\n",
-    "One of our reviewers for this book, Aurélien Géron, led YouTube's video classification team from 2013 to 2016 (well before the events discussed above). He pointed out that it's not just feedback loops involving humans that are a problem. There can also be feedback loops without humans! He told us about an example from YouTube:\n",
+    "One of our reviewers for this book, Aurélien Géron, led YouTube's video classification team from 2013 to 2016 (well before the events discussed here). He pointed out that it's not just feedback loops involving humans that are a problem. There can also be feedback loops without humans! He told us about an example from YouTube:\n",
    "\n",
-    "> : \"One important signal to classify the main topic of a video is the channel it comes from. For example, a video uploaded to a cooking channel is very likely to be a cooking video. But how do we know what topic a channel is about? Well… in part by looking at the topics of the videos it contains! Do you see the loop? For example, many videos have a description which indicates what camera was used to shoot the video. As a result, some of these videos might get classified as videos about “photography”. If a channel has such a misclassified video, it might be classified as a “photography” channel, making it even more likely for future videos on this channel to be wrongly classified as “photography”. This could even lead to runaway virus-like classifications! One way to break this feedback loop is to classify videos with and without the channel signal. Then when classifying the channels, you can only use the classes obtained without the channel signal. This way, the feedback loop is broken.\"\n",
+    "> : One important signal to classify the main topic of a video is the channel it comes from. For example, a video uploaded to a cooking channel is very likely to be a cooking video. But how do we know what topic a channel is about? Well… in part by looking at the topics of the videos it contains! Do you see the loop? For example, many videos have a description which indicates what camera was used to shoot the video. As a result, some of these videos might get classified as videos about “photography.” If a channel has such a misclassified video, it might be classified as a “photography” channel, making it even more likely for future videos on this channel to be wrongly classified as “photography.” This could even lead to runaway virus-like classifications! One way to break this feedback loop is to classify videos with and without the channel signal. Then when classifying the channels, you can only use the classes obtained without the channel signal. This way, the feedback loop is broken.\n",
    "\n",
-    "There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. Meetup’s algorithm could recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, by explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but to consider its impact. \"You need to decide which feature not to use in your algorithm… the most optimal algorithm is perhaps not the best one to launch into production\", he said.\n",
+    "There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. taking gender into account could therefore cause Meetup’s algorithm to recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. So, Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, by explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but consider its impact. According to Evan, \"You need to decide which feature not to use in your algorithm... the most optimal algorithm is perhaps not the best one to launch into production.\"\n",
    "\n",
-    "While Meetup chose to avoid such an outcome, Facebook provides an example of allowing a runaway feedback loop to run wild. Facebook radicalizes users interested in one conspiracy theory by introducing them to more. As [Renee DiResta, a researcher on proliferation of disinformation, writes](https://www.fastcompany.com/3059742/social-network-algorithms-are-distorting-reality-by-boosting-conspiracy-theories):"
+    "While Meetup chose to avoid such an outcome, Facebook provides an example of allowing a runaway feedback loop to run wild. Like YouTube, it tends to radicalize users interested in one conspiracy theory by introducing them to more. As Renee DiResta, a researcher on proliferation of disinformation, [writes](https://www.fastcompany.com/3059742/social-network-algorithms-are-distorting-reality-by-boosting-conspiracy-theories):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "> : \"once people join a single conspiracy-minded \\[Facebook\\] group, they are algorithmically routed to a plethora of others. Join an anti-vaccine group, and your suggestions will include anti-GMO, chemtrail watch, flat Earther (yes, really), and ‘curing cancer naturally’ groups. Rather than pulling a user out of the rabbit hole, the recommendation engine pushes them further in.\""
+    "> : Once people join a single conspiracy-minded [Facebook] group, they are algorithmically routed to a plethora of others. Join an anti-vaccine group, and your suggestions will include anti-GMO, chemtrail watch, flat Earther (yes, really), and \"curing cancer naturally groups. Rather than pulling a user out of the rabbit hole, the recommendation engine pushes them further in.\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "It is extremely important to keep in mind this kind of behavior can happen, and to either anticipate a feedback loop or take positive action to break it when you can see the first signs of it in your own projects. Another thing to keep in mind is *bias*, which, as we discussed in the previous chapter, can interact with feedback loops in very troublesome ways."
+    "It is extremely important to keep in mind that this kind of behavior can happen, and to either anticipate a feedback loop or take positive action to break it when you see the first signs of it in your own projects. Another thing to keep in mind is *bias*, which, as we discussed briefly in the previous chapter, can interact with feedback loops in very troublesome ways."
   ]
  },
  {
@ -377,16 +377,16 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Discussions of bias online tend to get pretty confusing pretty fast. The word bias means so many different things. Statisticians often think that when data ethicists are talking about bias that they're talking about the statistical definition of the term bias. But they're not. And they're certainly not talking about the biases that appear in the weights and biases which are the parameters of your model!\n",
+    "Discussions of bias online tend to get pretty confusing pretty fast. The word \"bias\" means so many different things. Statisticians often think when data ethicists are talking about bias that they're talking about the statistical definition of the term bias. But they're not. And they're certainly not talking about the biases that appear in the weights and biases which are the parameters of your model!\n",
    "\n",
-    "What they're talking about is the social science concept of bias. In [A Framework for Understanding Unintended Consequences of Machine Learning](https://arxiv.org/abs/1901.10002) MIT's Suresh and Guttag describe six types of bias in machine learning, summarized in <<bias>> from their paper."
+    "What they're talking about is the social science concept of bias. In [\"A Framework for Understanding Unintended Consequences of Machine Learning\"](https://arxiv.org/abs/1901.10002) MIT's Harini Suresh and John Guttag describe six types of bias in machine learning, summarized in <<bias>> from their paper."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<img src=\"images/ethics/pipeline_diagram.svg\" id=\"bias\" caption=\"Bias in machine learning can come from multiple sources (curtesy of Harini Suresh and John V. Guttag)\" alt=\"A diagram showing all sources where bias can appear in machine learning\" width=\"700\">"
+    "<img src=\"images/ethics/pipeline_diagram.svg\" id=\"bias\" caption=\"Bias in machine learning can come from multiple sources (courtesy of Harini Suresh and John V. Guttag)\" alt=\"A diagram showing all sources where bias can appear in machine learning\" width=\"700\">"
   ]
  },
  {
@ -407,16 +407,16 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "*Historical bias* comes from the fact that people are biased, processes are biased, and society is biased. Suresh and Guttag say: \"Historical bias is a fundamental, structural issue with the first step of the data generation process and can exist even given perfect sampling and feature selection\".\n",
+    "*Historical bias* comes from the fact that people are biased, processes are biased, and society is biased. Suresh and Guttag say: \"Historical bias is a fundamental, structural issue with the first step of the data generation process and can exist even given perfect sampling and feature selection.\"\n",
    "\n",
-    "For instance, here's a few examples of historical *race bias* in the US, from the NY Times article [Racial Bias, Even When We Have Good Intentions](https://www.nytimes.com/2015/01/04/upshot/the-measuring-sticks-of-racial-bias-.html), by the University of Chicago's Sendhil Mullainathan:\n",
+    "For instance, here are a few examples of historical *race bias* in the US, from the *New York Times* article [\"Racial Bias, Even When We Have Good Intentions\"](https://www.nytimes.com/2015/01/04/upshot/the-measuring-sticks-of-racial-bias-.html) by the University of Chicago's Sendhil Mullainathan:\n",
    "\n",
-    "  - When doctors were shown identical files, they were much less likely to recommend cardiac catheterization (a helpful procedure) to Black patients\n",
-    "  - When bargaining for a used car, Black people were offered initial prices $700 higher and received far smaller concessions\n",
-    "  - Responding to apartment-rental ads on Craigslist with a Black name elicited fewer responses than with a white name\n",
-    "  - An all-white jury was 16 percentage points more likely to convict a Black defendant than a white one, but when a jury had 1 Black member, it convicted both at the same rate\n",
+    "  - When doctors were shown identical files, they were much less likely to recommend cardiac catheterization (a helpful procedure) to Black patients.\n",
+    "  - When bargaining for a used car, Black people were offered initial prices $700 higher and received far smaller concessions.\n",
+    "  - Responding to apartment rental ads on Craigslist with a Black name elicited fewer responses than with a white name.\n",
+    "  - An all-white jury was 16 percentage points more likely to convict a Black defendant than a white one, but when a jury had one Black member it convicted both at the same rate.\n",
    "\n",
-    "The COMPAS algorithm, widely used for sentencing and bail decisions in the US, is an example of an important algorithm which, when tested by ProPublica, showed clear racial bias in practice:"
+    "The COMPAS algorithm, widely used for sentencing and bail decisions in the US, is an example of an important algorithm that, when tested by [ProPublica](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing), showed clear racial bias in practice (<<bail_algorithm>>)."
   ]
  },
  {
@ -430,7 +430,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Any dataset involving humans can have this kind of bias, such as medical data, sales data, housing data, political data, and so on. Because underlying bias is so pervasive, bias in datasets is very pervasive. Racial bias even turns up in computer vision, as shown in this example of auto-categorized photos shared on Twitter by a Google Photos user:"
+    "Any dataset involving humans can have this kind of bias: medical data, sales data, housing data, political data, and so on. Because underlying bias is so pervasive, bias in datasets is very pervasive. Racial bias even turns up in computer vision, as shown in the example of autocategorized photos shared on Twitter by a Google Photos user shown in <<google_photos>>."
   ]
  },
  {
@ -446,21 +446,21 @@
   "source": [
    "Yes, that is showing what you think it is: Google Photos classified a Black user's photo with their friend as \"gorillas\"! This algorithmic misstep got a lot of attention in the media. “We’re appalled and genuinely sorry that this happened,” a company spokeswoman said. “There is still clearly a lot of work to do with automatic image labeling, and we’re looking at how we can prevent these types of mistakes from happening in the future.”\n",
    "\n",
-    "Unfortunately, fixing problems in machine learning systems when the input data has problems is hard. Google's first attempt didn't inspire confidence, as covered by The Guardian:"
+    "Unfortunately, fixing problems in machine learning systems when the input data has problems is hard. Google's first attempt didn't inspire confidence, as coverage by *The Guardian* suggested (<<gorilla-ban>>)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<img src=\"images/ethics/image8.png\" id=\"gorilla-ban\" caption=\"Google first response to the problem\" alt=\"Pictures of a headlines from the Guardian, whoing Google removed gorillas and other moneys from the possible labels of its algorithm\" width=\"500\">"
+    "<img src=\"images/ethics/image8.png\" id=\"gorilla-ban\" caption=\"Google's first response to the problem\" alt=\"Pictures of a headlines from the Guardian, showing Google removed gorillas and other moneys from the possible labels of its algorithm\" width=\"500\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "These kinds of problem are certainly not limited to just Google. MIT researchers studied the most popular online computer vision APIs to see how accurate they were. But they didn't just calculate a single accuracy number—instead, they looked at the accuracy across four different groups:"
+    "These kinds of problems are certainly not limited to just Google. MIT researchers studied the most popular online computer vision APIs to see how accurate they were. But they didn't just calculate a single accuracy number—instead, they looked at the accuracy across four different groups, as illustrated in <<face_recognition>>."
   ]
  },
  {
@ -474,11 +474,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "IBM's system, for instance, had a 34.7% error rate for darker females, vs 0.3% for lighter males—over 100 times more errors! Some people incorrectly reacted to these experiments by claiming that the difference was simply because darker skin is harder for computers to recognise. However, what actually happened is that, after the negative publicity that this result created, all of the companies in question dramatically improved their models for darker skin, such that one year later they were nearly as good as for lighter skin. So what this actually showed is that the developers failed to utilise datasets containing enough darker faces, or test their product with darker faces.\n",
+    "IBM's system, for instance, had a 34.7% error rate for darker females, versus 0.3% for lighter males—over 100 times more errors! Some people incorrectly reacted to these experiments by claiming that the difference was simply because darker skin is harder for computers to recognize. However, what actually happened was that, after the negative publicity that this result created, all of the companies in question dramatically improved their models for darker skin, such that one year later they were nearly as good as for lighter skin. So what this actually showed is that the developers failed to utilize datasets containing enough darker faces, or test their product with darker faces.\n",
    "\n",
-    "One of the MIT researchers, Joy Buolamwini, warned, \"We have entered the age of automation overconfident yet underprepared. If we fail to make ethical and inclusive artificial intelligence, we risk losing gains made in civil rights and gender equity under the guise of machine neutrality\".\n",
+    "One of the MIT researchers, Joy Buolamwini, warned: \"We have entered the age of automation overconfident yet underprepared. If we fail to make ethical and inclusive artificial intelligence, we risk losing gains made in civil rights and gender equity under the guise of machine neutrality.\"\n",
    "\n",
-    "Part of the issue appears to be a systematic imbalance in the make up of popular datasets used for training models. The abstract to the paper [No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World](https://arxiv.org/abs/1711.08536) states, \"We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales\". <<image_provenance>> shows one of the charts from the paper, showing the geographic make up of what was, at the time (and still, as this book is being written), the two most important image datasets for training models."
+    "Part of the issue appears to be a systematic imbalance in the makeup of popular datasets used for training models. The abstract to the paper [\"No Classification Without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World\"](https://arxiv.org/abs/1711.08536) by Shreya Shankar et al. states, \"We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales.\" <<image_provenance>> shows one of the charts from the paper, showing the geographic makeup of what was, at the time (and still are, as this book is being written) the two most important image datasets for training models."
   ]
  },
  {
@ -492,7 +492,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The vast majority of the images are from the United States and other Western countries, leading to models trained on ImageNet performing worse on scenes from other countries and cultures. For instance, [research](https://arxiv.org/pdf/1906.02659.pdf) found that such models are worse at identifying household items (such as soap, spices, sofas, or beds) from lower-income countries. <<object_detect>> shows an image from the paper, [Does Object Recognition Work for Everyone?](https://arxiv.org/pdf/1906.02659.pdf)."
+    "The vast majority of the images are from the United States and other Western countries, leading to models trained on ImageNet performing worse on scenes from other countries and cultures. For instance, research found that such models are worse at identifying household items (such as soap, spices, sofas, or beds) from lower-income countries. <<object_detect>> shows an image from the paper, [\"Does Object Recognition Work for Everyone?\"](https://arxiv.org/pdf/1906.02659.pdf) by Terrance DeVries et al. of Facebook AI Research that illustrates this point."
   ]
  },
  {
@ -510,7 +510,7 @@
    "\n",
    "As we will discuss shortly, in addition, the vast majority of AI researchers and developers are young white men. Most projects that we have seen do most user testing using friends and families of the immediate product development group. Given this, the kinds of problems we just discussed should not be surprising.\n",
    "\n",
-    "Similar historical bias is found in the texts used as data for natural language processing models. This crops up in downstream machine learning tasks in many ways. For instance, it [was widely reported](https://nypost.com/2017/11/30/google-translates-algorithm-has-a-gender-bias/) that until last year Google Translate showed systematic bias in how it translated the Turkish gender-neutral pronoun \"o\" into English. For instance, when applied to jobs which are often associated with males, it used \"he\", and when applied to jobs which are often associated with females, it used \"she\":"
+    "Similar historical bias is found in the texts used as data for natural language processing models. This crops up in downstream machine learning tasks in many ways. For instance, it [was widely reported](https://nypost.com/2017/11/30/google-translates-algorithm-has-a-gender-bias/) that until last year Google Translate showed systematic bias in how it translated the Turkish gender-neutral pronoun \"o\" into English: when applied to jobs which are often associated with males it used \"he,\" and when applied to jobs which are often associated with females it used \"she\" (<<turkish_gender>>)."
   ]
  },
  {
@ -524,7 +524,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We also see this kind of bias in online advertisements. For instance, a study in 2019 found that even when the person placing the ad does not intentionally discriminate, Facebook will show the ad to very different audiences based on race and gender. Housing ads with the same text, but changing the picture between a white or black family, were shown to racially different audiences."
+    "We also see this kind of bias in online advertisements. For instance, a [study](https://arxiv.org/abs/1904.02095) in 2019 by Muhammad Ali et al. found that even when the person placing the ad does not intentionally discriminate, Facebook will show ads to very different audiences based on race and gender. Housing ads with the same text, but picture either a white or a Black family, were shown to racially different audiences."
   ]
  },
  {
@ -538,48 +538,48 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In the paper [Does Machine Learning Automate Moral Hazard and Error](https://scholar.harvard.edu/files/sendhil/files/aer.p20171084.pdf) in *American Economic Review*, the authors look at a model that tries to answer the question: using historical electronic health record (EHR) data, what factors are most predictive of stroke? These are the top predictors from the model:\n",
+    "In the paper [\"Does Machine Learning Automate Moral Hazard and Error\"](https://scholar.harvard.edu/files/sendhil/files/aer.p20171084.pdf) in *American Economic Review*, Sendhil Mullainathan and Ziad Obermeyer look at a model that tries to answer the question: using historical electronic health record (EHR) data, what factors are most predictive of stroke? These are the top predictors from the model:\n",
    "\n",
-    "  - Prior Stroke\n",
+    "  - Prior stroke\n",
    "  - Cardiovascular disease\n",
    "  - Accidental injury\n",
    "  - Benign breast lump\n",
    "  - Colonoscopy\n",
    "  - Sinusitis\n",
    "\n",
-    "However, only the top two have anything to do with a stroke! Based on what we've studied so far, you can probably guess why. We haven’t really measured *stroke*, which occurs when a region of the brain is denied oxygen due to an interruption in the blood supply. What we’ve measured is who: had symptoms, went to a doctor, got the appropriate tests, AND received a diagnosis of stroke. Actually having a stroke is not the only thing correlated with this complete list — it's also correlated with being the kind of person who actually goes to the doctor (which is influenced by who has access to healthcare, can afford their co-pay, doesn't experience racial or gender-based medical discrimination, and more)! If you are likely to go to the doctor for an *accidental injury*, then you are likely to also go the doctor when you are having a stroke.\n",
+    "However, only the top two have anything to do with a stroke! Based on what we've studied so far, you can probably guess why. We haven’t really measured *stroke*, which occurs when a region of the brain is denied oxygen due to an interruption in the blood supply. What we’ve measured is who had symptoms, went to a doctor, got the appropriate tests, *and* received a diagnosis of stroke. Actually having a stroke is not the only thing correlated with this complete list—it's also correlated with being the kind of person who actually goes to the doctor (which is influenced by who has access to healthcare, can afford their co-pay, doesn't experience racial or gender-based medical discrimination, and more)! If you are likely to go to the doctor for an *accidental injury*, then you are likely to also go the doctor when you are having a stroke.\n",
    "\n",
-    "This is an example of *measurement bias*. It occurs when our models make mistakes because we are measuring the wrong thing, or measuring it in the wrong way, or incorporating that measurement into our model inappropriately."
+    "This is an example of *measurement bias*. It occurs when our models make mistakes because we are measuring the wrong thing, or measuring it in the wrong way, or incorporating that measurement into the model inappropriately."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Aggregation Bias"
+    "#### Aggregation bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "*Aggregation bias* occurs when models do not aggregate data in a way that incorporates all of the appropriate factors, or when a model does not include the necessary interaction terms, nonlinearities, or so forth. This can particularly occur in medical settings. For instance, the way diabetes is treated is often based on simple univariate statistics and studies involving small groups of heterogeneous people. Analysis of results is often done in a way that does not take account of different ethnicities or genders. However, it turns out that diabetes patients have [different complications across ethnicities](https://www.ncbi.nlm.nih.gov/pubmed/24037313), and HbA1c levels (widely used to diagnose and monitor diabetes) [differ in complex ways across ethnicities and genders](https://www.ncbi.nlm.nih.gov/pubmed/22238408). This can result in people being misdiagnosed or incorrectly treated because medical decisions are based on a model which does not include these important variables and interactions."
+    "*Aggregation bias* occurs when models do not aggregate data in a way that incorporates all of the appropriate factors, or when a model does not include the necessary interaction terms, nonlinearities, or so forth. This can particularly occur in medical settings. For instance, the way diabetes is treated is often based on simple univariate statistics and studies involving small groups of heterogeneous people. Analysis of results is often done in a way that does not take account of different ethnicities or genders. However, it turns out that diabetes patients have [different complications across ethnicities](https://www.ncbi.nlm.nih.gov/pubmed/24037313), and HbA1c levels (widely used to diagnose and monitor diabetes) [differ in complex ways across ethnicities and genders](https://www.ncbi.nlm.nih.gov/pubmed/22238408). This can result in people being misdiagnosed or incorrectly treated because medical decisions are based on a model that does not include these important variables and interactions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Representation Bias"
+    "#### Representation bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The abstract of the paper [Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting](https://arxiv.org/abs/1901.09451) notes that there is gender imbalance in occupations (e.g. females are more likely to be nurses, and males are more likely to be pastors), and says that: \"differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances\".\n",
+    "The abstract of the paper [\"Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting\"](https://arxiv.org/abs/1901.09451) by Maria De-Arteaga et al. notes that there is gender imbalance in occupations (e.g., females are more likely to be nurses, and males are more likely to be pastors), and says that: \"differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances.\"\n",
    "\n",
-    "What this is saying is that the researchers noticed that models predicting occupation did not only reflect the actual gender imbalance in the underlying population, but actually amplified it! This is quite common, particularly for simple models. When there is some clear, easy-to-see underlying relationship, a simple model will often simply assume that this relationship holds all the time. As <<representation_bias>> from the paper shows, for occupations which had a higher percentage of females, the model tended to overestimate the prevalence of that occupation."
+    "In other words, the researchers noticed that models predicting occupation did not only *reflect* the actual gender imbalance in the underlying population, but actually *amplified* it! This type of *representation bias* is quite common, particularly for simple models. When there is some clear, easy-to-see underlying relationship, a simple model will often simply assume that this relationship holds all the time. As <<representation_bias>> from the paper shows, for occupations that had a higher percentage of females, the model tended to overestimate the prevalence of that occupation."
   ]
  },
  {
@ -593,7 +593,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "For example, in the training dataset, 14.6% of surgeons were women, yet in the model predictions, only 11.6% of the true positives were women. The model is thus amplifying the bias existing in the training set.\n",
+    "For example, in the training dataset 14.6% of surgeons were women, yet in the model predictions only 11.6% of the true positives were women. The model is thus amplifying the bias existing in the training set.\n",
    "\n",
    "Now that we've seen that those biases exist, what can we do to mitigate them?"
   ]
@ -602,40 +602,33 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Addressing different types of bias"
+    "### Addressing different types of bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Different types of bias require different approaches for mitigation. While gathering a more diverse dataset can address representation bias, this would not help with historical bias or measurement bias.  All datasets contain bias.  There is no such thing as a completely de-biased dataset.  Many researchers in the field have been converging on a set of proposals towards better documenting the decisions, context, and specifics about how and why a particular dataset was created, what scenarios it is appropriate to use in, and what the limitations are.  This way, those using the dataset will not be caught off-guard by its biases and limitations."
+    "Different types of bias require different approaches for mitigation. While gathering a more diverse dataset can address representation bias, this would not help with historical bias or measurement bias.  All datasets contain bias.  There is no such thing as a completely debiased dataset.  Many researchers in the field have been converging on a set of proposals to enable better documentation of the decisions, context, and specifics about how and why a particular dataset was created, what scenarios it is appropriate to use in, and what the limitations are. This way, those using a particular dataset will not be caught off guard by its biases and limitations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Humans are biased, so does algorithmic bias matter?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We often hear this question — \"humans are biased, so does algorithmic bias even matter?\" This comes up so often, there must be some reasoning that makes sense to the people that ask it, but it doesn't seem very logically sound to us! Independently of whether this is logically sound, it's important to realise that algorithms and people are different. Machine learning, particularly so. Consider these points about machine learning algorithms:\n",
+    "We often hear the question—\"Humans are biased, so does algorithmic bias even matter?\" This comes up so often, there must be some reasoning that makes sense to the people that ask it, but it doesn't seem very logically sound to us! Independently of whether this is logically sound, it's important to realize that algorithms (particularly machine learning algorithms!) and people are different. Consider these points about machine learning algorithms:\n",
    "\n",
-    "  - _Machine learning can create feedback loops_:: small amounts of bias can very rapidly, exponentially increase due to feedback loops\n",
-    "  - _Machine learning can amplify bias_:: human bias can lead to larger amounts of machine learning bias\n",
-    "  - _Algorithms & humans are used differently_:: human decision makers and algorithmic decision makers are not used in a plug-and-play interchangeable way in practice.  For instance, algorithmic decisions are more likely to be implemented at scale and without a process for recourse.  Furthermore, people are more likely to mistakenly believe that the result of an algorithm is objective and error-free.\n",
+    "  - _Machine learning can create feedback loops_:: Small amounts of bias can rapidly increase exponentially due to feedback loops.\n",
+    "  - _Machine learning can amplify bias_:: Human bias can lead to larger amounts of machine learning bias.\n",
+    "  - _Algorithms & humans are used differently_:: Human decision makers and algorithmic decision makers are not used in a plug-and-play interchangeable way in practice.\n",
    "  - _Technology is power_:: And with that comes responsibility.\n",
    "\n",
-    "As the Arkansas healthcare example showed, machine learning is often implemented in practice not because it leads to better outcomes, but because it is cheaper and more efficient. Cathy O'Neill, in her book *Weapons of Math Destruction*, described the pattern of how the privileged are processed by people, whereas the poor are processed by algorithms. This is just one of a number of ways that algorithms are used differently than human decision makers. Others include:\n",
+    "As the Arkansas healthcare example showed, machine learning is often implemented in practice not because it leads to better outcomes, but because it is cheaper and more efficient. Cathy O'Neill, in her book *Weapons of Math Destruction* (Crown), described the pattern of how the privileged are processed by people, whereas the poor are processed by algorithms. This is just one of a number of ways that algorithms are used differently than human decision makers. Others include:\n",
    "\n",
-    "  - People are more likely to assume algorithms are objective or error-free (even if they’re given the option of a human override)\n",
-    "  - Algorithms are more likely to be implemented with no appeals process in place\n",
-    "  - Algorithms are often used at scale\n",
-    "  - Algorithmic systems are cheap\n",
+    "  - People are more likely to assume algorithms are objective or error-free (even if they’re given the option of a human override).\n",
+    "  - Algorithms are more likely to be implemented with no appeals process in place.\n",
+    "  - Algorithms are often used at scale.\n",
+    "  - Algorithmic systems are cheap.\n",
    "\n",
    "Even in the absence of bias, algorithms (and deep learning especially, since it is such an effective and scalable algorithm) can lead to negative societal problems, such as when used for *disinformation*."
   ]
@ -644,47 +637,47 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Disinformation"
+    "### Disinformation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "*Disinformation* has a history stretching back hundreds or even thousands of years. It is not necessarily about getting someone to believe something false, but rather, often to sow disharmony and uncertainty, and to get people to give up on seeking the truth.  Receiving conflicting accounts can lead people to assume that they can never know what to trust.\n",
+    "*Disinformation* has a history stretching back hundreds or even thousands of years. It is not necessarily about getting someone to believe something false, but rather often used to sow disharmony and uncertainty, and to get people to give up on seeking the truth.  Receiving conflicting accounts can lead people to assume that they can never know whom or what to trust.\n",
    "\n",
-    "Some people think disinformation is primarily about false information or *fake news*, but in reality, disinformation can often contain seeds of truth, or involve half-truths taken out of context.  Ladislav Bittman was an intelligence officer in the USSR who later defected to the United States and wrote some books in the 1970s and 1980s on the role of disinformation in Soviet propaganda operations. He said, \"Most campaigns are a carefully designed mixture of facts, half-truths, exaggerations, & deliberate lies.\"\n",
+    "Some people think disinformation is primarily about false information or *fake news*, but in reality, disinformation can often contain seeds of truth, or half-truths taken out of context. Ladislav Bittman was an intelligence officer in the USSR who later defected to the US and wrote some books in the 1970s and 1980s on the role of disinformation in Soviet propaganda operations. In *The KGB and Soviet Disinformation* (Pergamon) he wrote, \"Most campaigns are a carefully designed mixture of facts, half-truths, exaggerations, and deliberate lies.\"\n",
    "\n",
-    "In the United States this has hit close to home in recent years, with the FBI detailing a massive disinformation campaign linked to Russia in the 2016 US election. Understanding the disinformation that was used in this campaign is very educational. For instance, the FBI found that the Russian disinformation campaign often organized two separate fake *grass roots* protests, one for each side of an issue, and got them to protest at the same time! The Houston Chronicle reported on one of these odd events:\n",
+    "In the US this has hit close to home in recent years, with the FBI detailing a massive disinformation campaign linked to Russia in the 2016 election. Understanding the disinformation that was used in this campaign is very educational. For instance, the FBI found that the Russian disinformation campaign often organized two separate fake \"grass roots\" protests, one for each side of an issue, and got them to protest at the same time! The [*Houston Chronicle*](https://www.houstonchronicle.com/local/gray-matters/article/A-Houston-protest-organized-by-Russian-trolls-12625481.php) reported on one of these odd events (<<texas>>).\n",
    "\n",
-    "> : A group that called itself the \"Heart of Texas\" had organized it on social media — a protest, they said, against the \"Islamization\" of Texas. On one side of Travis Street, I found about 10 protesters. On the other side, I found around 50 counterprotesters. But I couldn't find the rally organizers. No \"Heart of Texas.\" I thought that was odd, and mentioned it in the article: What kind of group is a no-show at its own event? Now I know why. Apparently, the rally's organizers were in Saint Petersburg, Russia, at the time. \"Heart of Texas\" is one of the internet troll groups cited in Special Prosecutor Robert Mueller's recent indictment of Russians attempting to tamper with the U.S. presidential election."
+    "> : A group that called itself the \"Heart of Texas\" had organized it on social media—a protest, they said, against the \"Islamization\" of Texas. On one side of Travis Street, I found about 10 protesters. On the other side, I found around 50 counterprotesters. But I couldn't find the rally organizers. No \"Heart of Texas.\" I thought that was odd, and mentioned it in the article: What kind of group is a no-show at its own event? Now I know why. Apparently, the rally's organizers were in Saint Petersburg, Russia, at the time. \"Heart of Texas\" is one of the internet troll groups cited in Special Prosecutor Robert Mueller's recent indictment of Russians attempting to tamper with the U.S. presidential election."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<img src=\"images/ethics/image13.png\" id=\"teax\" caption=\"Event organized by the group Heart of Texas\" alt=\"Screenshot of an event organized by the group Heart of Texas\" width=\"300\">"
+    "<img src=\"images/ethics/image13.png\" id=\"texas\" caption=\"Event organized by the group Heart of Texas\" alt=\"Screenshot of an event organized by the group Heart of Texas\" width=\"300\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Disinformation often involves coordinated campaigns of inauthentic behavior.  For instance, fraudulent accounts may try to make it seem like many people hold a particular viewpoint.  While most of us like to think of ourselves as independent-minded, in reality we evolved to be influenced by others in our in-group, and in opposition to those in our out-group.  Online discussions can influence our viewpoints, or alter the range of what we consider acceptable viewpoints. Humans are social animals, and as social animals we are extremely influenced by the people around us. Increasingly, radicalisation occurs in online environments. So influence is coming from people in the virtual space of online forums and social networks.\n",
+    "Disinformation often involves coordinated campaigns of inauthentic behavior.  For instance, fraudulent accounts may try to make it seem like many people hold a particular viewpoint.  While most of us like to think of ourselves as independent-minded, in reality we evolved to be influenced by others in our in-group, and in opposition to those in our out-group.  Online discussions can influence our viewpoints, or alter the range of what we consider acceptable viewpoints. Humans are social animals, and as social animals we are extremely influenced by the people around us. Increasingly, radicalization occurs in online environments; influence is coming from people in the virtual space of online forums and social networks.\n",
    "\n",
-    "Disinformation through auto-generated text is a particularly significant issue, due to the greatly increased capability provided by deep learning. We discuss this issue in depth when we learn to create language models, in <<chapter_nlp>>.\n",
+    "Disinformation through autogenerated text is a particularly significant issue, due to the greatly increased capability provided by deep learning. We discuss this issue in depth when we delve into creating language models, in <<chapter_nlp>>.\n",
    "\n",
-    "One proposed approach is to develop some form of digital signature, to implement it in a seamless way, and to create norms that we should only trust content which has been verified.  The head of the Allen Institute on AI, Oren Etzioni, wrote such a proposal in an article titled [How Will We Prevent AI-Based Forgery?](https://hbr.org/2019/03/how-will-we-prevent-ai-based-forgery): \"AI is poised to make high-fidelity forgery inexpensive and automated, leading to potentially disastrous consequences for democracy, security, and society. The specter of AI forgery means that we need to act to make digital signatures de rigueur as a means of authentication of digital content.\"\n",
+    "One proposed approach is to develop some form of digital signature, to implement it in a seamless way, and to create norms that we should only trust content that has been verified. The head of the Allen Institute on AI, Oren Etzioni, wrote such a proposal in an article titled [\"How Will We Prevent AI-Based Forgery?\"](https://hbr.org/2019/03/how-will-we-prevent-ai-based-forgery): \"AI is poised to make high-fidelity forgery inexpensive and automated, leading to potentially disastrous consequences for democracy, security, and society. The specter of AI forgery means that we need to act to make digital signatures de rigueur as a means of authentication of digital content.\"\n",
    "\n",
-    "Whilst we can't hope to discuss all the ethical issues that deep learning, and algorithms more generally, bring up, hopefully this brief introduction has been a useful starting point you can build on. We'll now move on to the questions of how to identify ethical issues, and what to do about them."
+    "Whilst we can't hope to discuss all the ethical issues that deep learning, and algorithms more generally, brings up, hopefully this brief introduction has been a useful starting point you can build on. We'll now move on to the questions of how to identify ethical issues, and what to do about them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Identifying and addressing ethical issues"
+    "## Identifying and Addressing Ethical Issues"
   ]
  },
  {
@ -695,19 +688,19 @@
    "\n",
    "So what can we do?  This is a big topic, but a few steps towards addressing ethical issues are:\n",
    "\n",
-    "- analyze a project you are working on\n",
-    "- implement processes at your company to find and address ethical risks\n",
-    "- support good policy\n",
-    "- increase diversity\n",
+    "- Analyze a project you are working on.\n",
+    "- Implement processes at your company to find and address ethical risks.\n",
+    "- Support good policy.\n",
+    "- Increase diversity.\n",
    "\n",
-    "Let's walk through each step next, starting with analyzing a project you are working on."
+    "Let's walk through each of these steps, starting with analyzing a project you are working on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Analyze a project you are working on"
+    "### Analyze a Project You Are Working On"
   ]
  },
  {
@ -726,25 +719,25 @@
    "\n",
    "These questions may be able to help you identify outstanding issues, and possible alternatives that are easier to understand and control. In addition to asking the right questions, it's also important to consider practices and processes to implement.\n",
    "\n",
-    "One thing to consider at this stage is what data you are collecting and storing. Data often ends up being used for different purposes than what it was originally collected for. For instance, IBM began selling to Nazi Germany well before the Holocaust, including helping with Germany’s 1933 census conducted by Adolf Hitler, which was effective at identifying far more Jewish people than had previously been recognized in Germany. US census data was used to round up Japanese-Americans (who were US citizens) for internment during World War II. It is important to recognize how data and images collected can be weaponized later. Columbia professor [Tim Wu wrote](https://www.nytimes.com/2019/04/10/opinion/sunday/privacy-capitalism.html) that “You must assume that any personal data that Facebook or Android keeps are data that governments around the world will try to get or that thieves will try to steal.”"
+    "One thing to consider at this stage is what data you are collecting and storing. Data often ends up being used for different purposes than what it was originally collected for. For instance, IBM began selling to Nazi Germany well before the Holocaust, including helping with Germany’s 1933 census conducted by Adolf Hitler, which was effective at identifying far more Jewish people than had previously been recognized in Germany. Similarly, US census data was used to round up Japanese-Americans (who were US citizens) for internment during World War II. It is important to recognize how data and images collected can be weaponized later. Columbia professor [Tim Wu wrote](https://www.nytimes.com/2019/04/10/opinion/sunday/privacy-capitalism.html) that “You must assume that any personal data that Facebook or Android keeps are data that governments around the world will try to get or that thieves will try to steal.”"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Processes to implement"
+    "### Processes to Implement"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The Markkula Center has released [An Ethical Toolkit for Engineering/Design Practice](https://www.scu.edu/ethics-in-technology-practice/ethical-toolkit/), which includes some concrete practices to implement at your company, including regularly scheduled ethical risk sweeps to proactively search for ethical risks (in a manner similar to cybersecurity penetration testing), expanding the ethical circle to include the perspectives of a variety of stakeholders, and considering the terrible people (how could bad actors abuse, steal, misinterpret, hack, destroy, or weaponize what you are building?). \n",
+    "The Markkula Center has released [An Ethical Toolkit for Engineering/Design Practice](https://www.scu.edu/ethics-in-technology-practice/ethical-toolkit/) that includes some concrete practices to implement at your company, including regularly scheduled sweeps to proactively search for ethical risks (in a manner similar to cybersecurity penetration testing), expanding the ethical circle to include the perspectives of a variety of stakeholders, and considering the terrible people (how could bad actors abuse, steal, misinterpret, hack, destroy, or weaponize what you are building?). \n",
    "\n",
    "Even if you don't have a diverse team, you can still try to pro-actively include the perspectives of a wider group, considering questions such as these (provided by the Markkula Center):\n",
    "\n",
-    "  - Whose interests, desires, skills, experiences and values have we simply assumed, rather than actually consulted?\n",
+    "  - Whose interests, desires, skills, experiences, and values have we simply assumed, rather than actually consulted?\n",
    "  - Who are all the stakeholders who will be directly affected by our product? How have their interests been protected? How do we know what their interests really are—have we asked?\n",
    "  - Who/which groups and individuals will be indirectly affected in significant ways?\n",
    "  - Who might use this product that we didn’t expect to use it, or for purposes we didn’t initially intend?"
@ -754,35 +747,35 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Ethical Lenses"
+    "#### Ethical lenses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Another useful resource from the Markkula Center is [Conceptual Frameworks in Technology and Engineering Practice](https://www.scu.edu/ethics-in-technology-practice/conceptual-frameworks/). This considers how different foundational ethical lenses can help identify concrete issues, and lays out the following approaches and key questions:\n",
+    "Another useful resource from the Markkula Center is its [Conceptual Frameworks in Technology and Engineering Practice](https://www.scu.edu/ethics-in-technology-practice/conceptual-frameworks/). This considers how different foundational ethical lenses can help identify concrete issues, and lays out the following approaches and key questions:\n",
    "\n",
-    "  - The Rights Approach:: Which option best respects the rights of all who have a stake?\n",
-    "  - The Justice Approach:: Which option treats people equally or proportionately?\n",
-    "  - The Utilitarian Approach:: Which option will produce the most good and do the least harm?\n",
-    "  - The Common Good Approach:: Which option best serves the community as a whole, not just some members?\n",
-    "  - The Virtue Approach:: Which option leads me to act as the sort of person I want to be?\n",
+    "  - The rights approach:: Which option best respects the rights of all who have a stake?\n",
+    "  - The justice approach:: Which option treats people equally or proportionately?\n",
+    "  - The utilitarian approach:: Which option will produce the most good and do the least harm?\n",
+    "  - The common good approach:: Which option best serves the community as a whole, not just some members?\n",
+    "  - The virtue approach:: Which option leads me to act as the sort of person I want to be?\n",
    "\n",
-    "Markkula's recommendations include a deeper dive into each of these perspectives, including looking at a project based on a focus on its *consequences*:\n",
+    "Markkula's recommendations include a deeper dive into each of these perspectives, including looking at a project through the lenses of its *consequences*:\n",
    "\n",
    "  - Who will be directly affected by this project? Who will be indirectly affected?\n",
    "  - Will the effects in aggregate likely create more good than harm, and what types of good and harm?\n",
    "  - Are we thinking about all relevant types of harm/benefit (psychological, political, environmental, moral, cognitive, emotional, institutional, cultural)?\n",
    "  - How might future generations be affected by this project?\n",
    "  - Do the risks of harm from this project fall disproportionately on the least powerful in society? Will the benefits go disproportionately to the well-off?\n",
-    "  - Have we adequately considered ‘dual-use?\n",
+    "  - Have we adequately considered \"dual-use\"?\n",
    "\n",
-    "The alternative lens to this is the *deontological* perspective, which focuses on basic *right* and *wrong*:\n",
+    "The alternative lens to this is the *deontological* perspective, which focuses on basic concepts of *right* and *wrong*:\n",
    "\n",
-    "  - What rights of others & duties to others must we respect?\n",
-    "  - How might the dignity & autonomy of each stakeholder be impacted by this project?\n",
-    "  - What considerations of trust & of justice are relevant to this design/project?\n",
+    "  - What rights of others and duties to others must we respect?\n",
+    "  - How might the dignity and autonomy of each stakeholder be impacted by this project?\n",
+    "  - What considerations of trust and of justice are relevant to this design/project?\n",
    "  - Does this project involve any conflicting moral duties to others, or conflicting stakeholder rights? How can we prioritize these?\n",
    "\n",
    "One of the best ways to help come up with complete and thoughtful answers to questions like these is to ensure that the people asking the questions are *diverse*."
@ -792,36 +785,36 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The power of diversity"
+    "### The Power of Diversity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Currently, less than 12% of AI researchers are women, according to a study from Element AI. The statistics are similarly dire when it comes to race and age. When everybody on a team has similar backgrounds, they are likely to have similar blindspots around ethical risks. The Harvard Business Review (HBR) has published a number of studies showing many benefits of diverse teams, including:\n",
+    "Currently, less than 12% of AI researchers are women, according to [a study from Element AI](https://medium.com/element-ai-research-lab/estimating-the-gender-ratio-of-ai-researchers-around-the-world-81d2b8dbe9c3). The statistics are similarly dire when it comes to race and age. When everybody on a team has similar backgrounds, they are likely to have similar blindspots around ethical risks. The *Harvard Business Review* (HBR) has published a number of studies showing many benefits of diverse teams, including:\n",
    "\n",
-    "- [How Diversity Can Drive Innovation](https://hbr.org/2013/12/how-diversity-can-drive-innovation)\n",
-    "- [Teams Solve Problems Faster When They’re More Cognitively Diverse](https://hbr.org/2017/03/teams-solve-problems-faster-when-theyre-more-cognitively-diverse)\n",
-    "- [Why Diverse Teams Are Smarter](https://hbr.org/2016/11/why-diverse-teams-are-smarter), and\n",
-    "- [What Makes a Team Smarter? More Women](https://hbr.org/2011/06/defend-your-research-what-makes-a-team-smarter-more-women).\n",
+    "- [\"How Diversity Can Drive Innovation\"](https://hbr.org/2013/12/how-diversity-can-drive-innovation)\n",
+    "- [\"Teams Solve Problems Faster When They’re More Cognitively Diverse\"](https://hbr.org/2017/03/teams-solve-problems-faster-when-theyre-more-cognitively-diverse)\n",
+    "- [\"Why Diverse Teams Are Smarter\"](https://hbr.org/2016/11/why-diverse-teams-are-smarter), and\n",
+    "- [\"Defend Your Research: What Makes a Team Smarter? More Women\"](https://hbr.org/2011/06/defend-your-research-what-makes-a-team-smarter-more-women)\n",
    "\n",
-    "Diversity can lead to problems being identified earlier, and a wider range of solutions being considered. For instance, Tracy Chou was an early engineer at Quora. She [wrote of her experiences](https://qz.com/1016900/tracy-chou-leading-silicon-valley-engineer-explains-why-every-tech-worker-needs-a-humanities-education/), describing how she advocated internally for adding a feature that would allow trolls and other bad actors to be blocked. Chou recounts, “I was eager to work on the feature because I personally felt antagonized and abused on the site (gender isn’t an unlikely reason as to why)... But if I hadn’t had that personal perspective, it’s possible that the Quora team wouldn’t have prioritized building a block button so early in its existence.” Harassment often drives people from marginalised groups off online platforms, so this functionality has been important for maintaining the health of Quora's community.\n",
+    "Diversity can lead to problems being identified earlier, and a wider range of solutions being considered. For instance, Tracy Chou was an early engineer at Quora. She [wrote of her experiences](https://qz.com/1016900/tracy-chou-leading-silicon-valley-engineer-explains-why-every-tech-worker-needs-a-humanities-education/), describing how she advocated internally for adding a feature that would allow trolls and other bad actors to be blocked. Chou recounts, “I was eager to work on the feature because I personally felt antagonized and abused on the site (gender isn’t an unlikely reason as to why)... But if I hadn’t had that personal perspective, it’s possible that the Quora team wouldn’t have prioritized building a block button so early in its existence.” Harassment often drives people from marginalized groups off online platforms, so this functionality has been important for maintaining the health of Quora's community.\n",
    "\n",
-    "A crucial aspect to understand is that women leave the tech industry at over twice the rate that men do, according to the Harvard business review (41% of women working in tech leave, compared to 17% of men). An analysis of over 200 books, white papers, and articles found that the reason they leave is that “they’re treated unfairly; underpaid, less likely to be fast-tracked than their male colleagues, and unable to advance.” \n",
+    "A crucial aspect to understand is that women leave the tech industry at over twice the rate that men do, according to the [*Harvard Business Review*](https://www.researchgate.net/publication/268325574_By_RESEARCH_REPORT_The_Athena_Factor_Reversing_the_Brain_Drain_in_Science_Engineering_and_Technology) (41% of women working in tech leave, compared to 17% of men). An analysis of over 200 books, white papers, and articles found that the reason they leave is that “they’re treated unfairly; underpaid, less likely to be fast-tracked than their male colleagues, and unable to advance.” \n",
    "\n",
-    "Studies have confirmed a number of the factors that make it harder for women to advance in the workplace. Women receive more vague feedback and personality criticism in performance evaluations, whereas men receive actionable advice tied to business outcomes (which is more useful). Women frequently experience being excluded from more creative and innovative roles, and not receiving high visibility “stretch” assignments that are helpful in getting promoted. One study found that men’s voices are perceived as more persuasive, fact-based, and logical than women’s voices, even when reading identical scripts.\n",
+    "Studies have confirmed a number of the factors that make it harder for women to advance in the workplace. Women receive more vague feedback and personality criticism in performance evaluations, whereas men receive actionable advice tied to business outcomes (which is more useful). Women frequently experience being excluded from more creative and innovative roles, and not receiving high-visibility “stretch” assignments that are helpful in getting promoted. One study found that men’s voices are perceived as more persuasive, fact-based, and logical than women’s voices, even when reading identical scripts.\n",
    "\n",
    "Receiving mentorship has been statistically shown to help men advance, but not women. The reason behind this is that when women receive mentorship, it’s advice on how they should change and gain more self-knowledge. When men receive mentorship, it’s public endorsement of their authority. Guess which is more useful in getting promoted?\n",
    "\n",
-    "As long as qualified women keep dropping out of tech, teaching more girls to code will not solve the diversity issues plaguing the field. Diversity initiatives often end up focusing primarily on white women, even though women of colour face many additional barriers. In interviews with 60 women of color who work in STEM research, 100% had experienced discrimination."
+    "As long as qualified women keep dropping out of tech, teaching more girls to code will not solve the diversity issues plaguing the field. Diversity initiatives often end up focusing primarily on white women, even though women of color face many additional barriers. In [interviews](https://worklifelaw.org/publications/Double-Jeopardy-Report_v6_full_web-sm.pdf) with 60 women of color who work in STEM research, 100% had experienced discrimination."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The hiring process is particularly broken in tech. One study indicative of the disfunction comes from Triplebyte, a company that helps place software engineers in companies. They conduct a standardised technical interview as part of this process. They have a fascinating dataset: the results of how over 300 engineers did on their exam, and then the results of how those engineers did during the interview process for a variety of companies. The number one finding from [Triplebyte’s research](https://triplebyte.com/blog/who-y-combinator-companies-want) is that “the types of programmers that each company looks for often have little to do with what the company needs or does. Rather, they reflect company culture and the backgrounds of the founders.”\n",
+    "The hiring process is particularly broken in tech. One study indicative of the disfunction comes from Triplebyte, a company that helps place software engineers in companies, conducting a standardized technical interview as part of this process. They have a fascinating dataset: the results of how over 300 engineers did on their exam, coupled with the results of how those engineers did during the interview process for a variety of companies. The number one finding from [Triplebyte’s research](https://triplebyte.com/blog/who-y-combinator-companies-want) is that “the types of programmers that each company looks for often have little to do with what the company needs or does. Rather, they reflect company culture and the backgrounds of the founders.”\n",
    "\n",
    "This is a challenge for those trying to break into the world of deep learning, since most companies' deep learning groups today were founded by academics. These groups tend to look for people \"like them\"--that is, people that can solve complex math problems and understand dense jargon. They don't always know how to spot people who are actually good at solving real problems using deep learning.\n",
    "\n",
@ -832,19 +825,19 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Fairness, accountability, and transparency"
+    "### Fairness, Accountability, and Transparency"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The professional society for computer scientists, the ACM, runs a conference on data ethics called the \"Conference on Fairness, Accountability, and Transparency\". \"Fairness, Accountability, and Transparency\" sometimes goes under the acronym *FAT*, although nowadays it's changing to *FAccT*.  Microsoft has a group focused on \"Fairness, Accountability, Transparency, and Ethics\" (FATE). The various versions of this lens have resulted in the acronym \"FAT\" seeing wide usage. In this section, we'll use \"FAccT\" to refer to the concepts of *Fairness, Accountability, and Transparency*.\n",
+    "The professional society for computer scientists, the ACM, runs a data ethics conference called the Conference on Fairness, Accountability, and Transparency. \"Fairness, Accountability, and Transparency\" which used to go under the acronym *FAT* but now uses to the less objectionable *FAccT*. Microsoft has a group focused on \"Fairness, Accountability, Transparency, and Ethics\" (FATE). In this section, we'll use \"FAccT\" to refer to the concepts of *Fairness, Accountability, and Transparency*.\n",
    "\n",
-    "FAccT is another lens that you may find useful in considering ethical issues. One useful resource for this is the free online book [Fairness and machine learning; Limitations and Opportunities](https://fairmlbook.org/), which \"gives a perspective on machine learning that treats fairness as a central concern rather than an afterthought.\" It also warns, however, that it \"is intentionally narrow in scope... A narrow framing of machine learning ethics might be tempting to technologists and businesses as a way to focus on technical interventions while sidestepping deeper questions about power and accountability. We caution against this temptation.\" Rather than provide an overview of the FAccT approach to ethics (which is better done in books such as the one linked above), our focus here will be on the limitations of this kind of narrow framing.\n",
+    "FAccT is another lens that you may find useful in considering ethical issues. One useful resource for this is the free online book [*Fairness and Machine Learning: Limitations and Opportunities*](https://fairmlbook.org/) by Solon Barocas, Moritz Hardt, and Arvind Narayanan, which \"gives a perspective on machine learning that treats fairness as a central concern rather than an afterthought.\" It also warns, however, that it \"is intentionally narrow in scope... A narrow framing of machine learning ethics might be tempting to technologists and businesses as a way to focus on technical interventions while sidestepping deeper questions about power and accountability. We caution against this temptation.\" Rather than provide an overview of the FAccT approach to ethics (which is better done in books such as that one), our focus here will be on the limitations of this kind of narrow framing.\n",
    "\n",
-    "One great way to consider whether an ethical lens is complete, is to try to come up with an example where the lens and our own ethical intuitions give diverging results. Os Keyes et al. explored this in a graphic way in their paper [A Mulching Proposal\n",
-    "Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry](https://arxiv.org/abs/1908.06166). The paper's abstract says:"
+    "One great way to consider whether an ethical lens is complete is to try to come up with an example where the lens and our own ethical intuitions give diverging results. Os Keyes, Jevan Hutson, and Meredith Durbin explored this in a graphic way in their paper [\"A Mulching Proposal:\n",
+    "Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry\"](https://arxiv.org/abs/1908.06166). The paper's abstract says:"
   ]
  },
  {
@ -860,7 +853,7 @@
   "source": [
    "In this paper, the rather controversial proposal (\"Turning the Elderly into High-Nutrient Slurry\") and the results (\"drastically increase the algorithm's adherence to the FAT framework, resulting in a more ethical and beneficent system\") are at odds... to say the least!\n",
    "\n",
-    "In philosophy, and especially philosophy of ethics, this is one of the most effective tools: first, come up with a process, definition, set of questions, etc., which is designed to resolve some problem. Then try to come up with an example where that apparent solution results in a proposal that no-one would consider acceptable. This can then lead to a further refinement of the solution.\n",
+    "In philosophy, and especially philosophy of ethics, this is one of the most effective tools: first, come up with a process, definition, set of questions, etc., which is designed to resolve some problem. Then try to come up with an example where that apparent solution results in a proposal that no one would consider acceptable. This can then lead to a further refinement of the solution.\n",
    "\n",
    "So far, we've focused on things that you and your organization can do. But sometimes individual or organizational action is not enough. Sometimes, governments also need to consider policy implications."
   ]
@ -883,25 +876,27 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The effectiveness of regulation"
+    "### The Effectiveness of Regulation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "To look at what can cause companies to take concrete action, consider the following two examples of how Facebook has behaved. In 2018, a UN investigation found that Facebook had played a “determining role” in the ongoing genocide of the Rohingya, an ethnic minority in Mynamar that was described by UN Secretary-General Antonio Guterres as \"one of, if not the, most discriminated people in the world\". Local activists had been warning Facebook executives that their platform was being used to spread hate speech and incite violence since as early as 2013. In 2015, they were warned that Facebook could play the same role in Myanmar that the radio broadcasts played during the Rwandan genocide (where a million people were killed). Yet, by the end of 2015, Facebook only employed 4 contractors that spoke Burmese. As one person close to the matter said, \"That’s not 20/20 hindsight. The scale of this problem was significant and it was already apparent.\" Zuckerberg promised during the congressional hearings to hire \"dozens\" to address the genocide in Myanmar (in 2018, years after the genocide had begun, including the destruction by fire of at least 288 villages in northern Rakhine state after August 2017).\n",
+    "To look at what can cause companies to take concrete action, consider the following two examples of how Facebook has behaved. In 2018, a UN investigation found that Facebook had played a “determining role” in the ongoing genocide of the Rohingya, an ethnic minority in Mynamar described by UN Secretary-General Antonio Guterres as \"one of, if not the, most discriminated people in the world.\" Local activists had been warning Facebook executives that their platform was being used to spread hate speech and incite violence since as early as 2013. In 2015, they were warned that Facebook could play the same role in Myanmar that the radio broadcasts played during the Rwandan genocide (where a million people were killed). Yet, by the end of 2015, Facebook only employed four contractors that spoke Burmese. As one person close to the matter said, \"That’s not 20/20 hindsight. The scale of this problem was significant and it was already apparent.\" Zuckerberg promised during the congressional hearings to hire \"dozens\" to address the genocide in Myanmar (in 2018, years after the genocide had begun, including the destruction by fire of at least 288 villages in northern Rakhine state after August 2017).\n",
    "\n",
    "This stands in stark contrast to Facebook quickly [hiring 1,200 people in Germany](http://thehill.com/policy/technology/361722-facebook-opens-second-german-office-to-comply-with-hate-speech-law) to try to avoid expensive penalties (of up to 50 million euros) under a new German law against hate speech. Clearly, in this case, Facebook was more reactive to the threat of a financial penalty than to the systematic destruction of an ethnic minority.\n",
    "\n",
-    "In an [article on privacy issues](https://idlewords.com/2019/06/the_new_wilderness.htm), Maciej Ceglowski draws parallels with the environmental movement… \"This regulatory project has been so successful in the First World that we risk forgetting what life was like before it. Choking smog of the kind that today kills thousands in Jakarta and Delhi was [once emblematic of London](https://en.wikipedia.org/wiki/Pea_soup_fog). The Cuyahoga River in Ohio used to [reliably catch fire](http://www.ohiohistorycentral.org/w/Cuyahoga_River_Fire). In a particularly horrific example of unforeseen consequences, tetraethyl lead added to gasoline [raised violent crime rates](https://en.wikipedia.org/wiki/Lead%E2%80%93crime_hypothesis) worldwide for fifty years. None of these harms could have been fixed by telling people to vote with their wallet, or carefully review the environmental policies of every company they gave their business to, or to stop using the technologies in question. It took coordinated, and sometimes highly technical, regulation across jurisdictional boundaries to fix them. In some cases, like the [ban on commercial refrigerants](https://en.wikipedia.org/wiki/Montreal_Protocol) that depleted the ozone layer, that regulation required a worldwide consensus. We’re at the point where we need a similar shift in perspective in our privacy law.\""
+    "In an [article on privacy issues](https://idlewords.com/2019/06/the_new_wilderness.htm), Maciej Ceglowski draws parallels with the environmental movement: \n",
+    "\n",
+    "> : This regulatory project has been so successful in the First World that we risk forgetting what life was like before it. Choking smog of the kind that today kills thousands in Jakarta and Delhi was https://en.wikipedia.org/wiki/Pea_soup_fog[once emblematic of London]. The Cuyahoga River in Ohio used to http://www.ohiohistorycentral.org/w/Cuyahoga_River_Fire[reliably catch fire]. In a particularly horrific example of unforeseen consequences, tetraethyl lead added to gasoline https://en.wikipedia.org/wiki/Lead%E2%80%93crime_hypothesis[raised violent crime rates] worldwide for fifty years. None of these harms could have been fixed by telling people to vote with their wallet, or carefully review the environmental policies of every company they gave their business to, or to stop using the technologies in question. It took coordinated, and sometimes highly technical, regulation across jurisdictional boundaries to fix them. In some cases, like the https://en.wikipedia.org/wiki/Montreal_Protocol[ban on commercial refrigerants] that depleted the ozone layer, that regulation required a worldwide consensus. We’re at the point where we need a similar shift in perspective in our privacy law."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Rights and policy"
+    "### Rights and Policy"
   ]
  },
  {
@ -912,21 +907,23 @@
    "\n",
    "Many of the issues we are seeing in tech are actually human rights issues, such as when a biased algorithm recommends that Black defendants have longer prison sentences, when particular job ads are only shown to young people, or when police use facial recognition to identify protesters. The appropriate venue to address human rights issues is typically through the law.\n",
    "\n",
-    "We need both regulatory and legal changes, *and* the ethical behavior of individuals. Individual behavior change can’t address misaligned profit incentives, externalities (where corporations reap large profits while off-loading their costs & harms to the broader society), or systemic failures. However, the law will never cover all edge cases, and it is important that individual software developers and data scientists are equipped to make ethical decisions in practice."
+    "We need both regulatory and legal changes, *and* the ethical behavior of individuals. Individual behavior change can’t address misaligned profit incentives, externalities (where corporations reap large profits while offloading their costs and harms to the broader society), or systemic failures. However, the law will never cover all edge cases, and it is important that individual software developers and data scientists are equipped to make ethical decisions in practice."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Cars: a historical precedent"
+    "### Cars: A Historical Precedent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The problems we are facing are complex and there are no simple solutions. This can be discouraging, but we find hope in considering other large challenges that people have tackled throughout history. One example is the movement to increase car safety, covered as a case study in [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) and in the design podcast [99% Invisible](https://99percentinvisible.org/episode/nut-behind-wheel/). Early cars had no seatbelts, metal knobs on the dashboard that could lodge in people’s skulls during a crash, regular plate glass windows that shattered in dangerous ways, and non-collapsible steering columns that impaled drivers. However, car companies were incredibly resistant to even discussing the idea of safety as something they could help address, and the widespread belief was that cars are just the way they are, and that it was the people using them who caused problems. It took consumer safety activists and advocates decades of work to even change the national conversation to consider that perhaps car companies had some responsibility which should be addressed through regulation. When the collapsible steering column was invented, it was not implemented for several years as there was no financial incentive to do so. Major car company General Motors hired private detectives to try to dig up dirt on consumer safety advocate Ralph Nader. The requirement of seatbelts, crash test dummies, and collapsible steering columns were major victories. It was only in 2011 that car companies were required to start using crash test dummies that would represent the average women, and not just average men’s bodies; prior to this, women were 40% more likely to be injured in a car crash of the same impact compared to a man. This is a vivid example of the ways that bias, policy, and technology have important consequences."
+    "The problems we are facing are complex, and there are no simple solutions. This can be discouraging, but we find hope in considering other large challenges that people have tackled throughout history. One example is the movement to increase car safety, covered as a case study in [\"Datasheets for Datasets\"](https://arxiv.org/abs/1803.09010) by Timnit Gebru et al. and in the design podcast [99% Invisible](https://99percentinvisible.org/episode/nut-behind-wheel/). Early cars had no seatbelts, metal knobs on the dashboard that could lodge in people’s skulls during a crash, regular plate glass windows that shattered in dangerous ways, and non-collapsible steering columns that impaled drivers. However, car companies were incredibly resistant to even discussing the idea of safety as something they could help address, and the widespread belief was that cars are just the way they are, and that it was the people using them who caused problems.\n",
+    "\n",
+    "It took consumer safety activists and advocates decades of work to even change the national conversation to consider that perhaps car companies had some responsibility which should be addressed through regulation. When the collapsible steering column was invented, it was not implemented for several years as there was no financial incentive to do so. Major car company General Motors hired private detectives to try to dig up dirt on consumer safety advocate Ralph Nader. The requirement of seatbelts, crash test dummies, and collapsible steering columns were major victories. It was only in 2011 that car companies were required to start using crash test dummies that would represent the average woman, and not just average men’s bodies; prior to this, women were 40% more likely to be injured in a car crash of the same impact compared to a man. This is a vivid example of the ways that bias, policy, and technology have important consequences."
   ]
  },
  {
@ -940,11 +937,15 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Coming from a background of working with binary logic, the lack of clear answers in ethics can be frustrating at first.  Yet, the implications of how our work impacts the world, including unintended consequences and the work becoming weaponized by bad actors, are some of the most important questions we can (and should!) consider.  Even though there aren't any easy answers, there are definite pitfalls to avoid and practices to move towards more ethical behavior.\n",
+    "Coming from a background of working with binary logic, the lack of clear answers in ethics can be frustrating at first.  Yet, the implications of how our work impacts the world, including unintended consequences and the work becoming weaponized by bad actors, are some of the most important questions we can (and should!) consider.  Even though there aren't any easy answers, there are definite pitfalls to avoid and practices to follow to move toward more ethical behavior.\n",
    "\n",
-    "Many people (including us!) are looking for more satisfying, solid answers of how to address harmful impacts of technology. However, given the complex, far-reaching, and interdisciplinary nature of the problems we are facing, there are no simple solutions. Julia Angwin, former senior reporter at ProPublica who focuses on issues of algorithmic bias and surveillance (and one of the 2016 investigators of the COMPAS recidivism algorithm that helped spark the field of Fairness Accountability and Transparency) said in [a 2019 interview](https://www.fastcompany.com/90337954/who-cares-about-liberty-julia-angwin-and-trevor-paglen-on-privacy-surveillance-and-the-mess-were-in), “I strongly believe that in order to solve a problem, you have to diagnose it, and that we’re still in the diagnosis phase of this. If you think about the turn of the century and industrialization, we had, I don’t know, 30 years of child labor, unlimited work hours, terrible working conditions, and it took a lot of journalist muckraking and advocacy to diagnose the problem and have some understanding of what it was, and then the activism to get laws changed. I feel like we’re in a second industrialization of data information... I see my role as trying to make as clear as possible what the downsides are, and diagnosing them really accurately so that they can be solvable. That’s hard work, and lots more people need to be doing it.” It's reassuring that Angwin thinks we are largely still in the diagnosis phase: if your understanding of these problems feels incomplete, that is normal and natural. Nobody has a “cure” yet, although it is vital that we continue working to better understand and address the problems we are facing.\n",
+    "Many people (including us!) are looking for more satisfying, solid answers about how to address harmful impacts of technology. However, given the complex, far-reaching, and interdisciplinary nature of the problems we are facing, there are no simple solutions. Julia Angwin, former senior reporter at ProPublica who focuses on issues of algorithmic bias and surveillance (and one of the 2016 investigators of the COMPAS recidivism algorithm that helped spark the field of FAccT) said in [a 2019 interview](https://www.fastcompany.com/90337954/who-cares-about-liberty-julia-angwin-and-trevor-paglen-on-privacy-surveillance-and-the-mess-were-in):\n",
    "\n",
-    "One of our reviewers for this book, Fred Monroe, used to work in hedge fund trading. He told us, after reading this chapter, that many of the issues discussed here (distribution of data being dramatically different than what was trained on, impact of model and feedback loops once deployed and at scale, and so forth) were also key issues for building profitable trading models. The kinds of things you need to do to consider societal consequences are going to have a lot of overlap with things you need to do to consider organizational, market, and customer consequences too--so thinking carefully about ethics can also help you think carefully about how to make your data product successful more generally!"
+    "> : I strongly believe that in order to solve a problem, you have to diagnose it, and that we’re still in the diagnosis phase of this. If you think about the turn of the century and industrialization, we had, I don’t know, 30 years of child labor, unlimited work hours, terrible working conditions, and it took a lot of journalist muckraking and advocacy to diagnose the problem and have some understanding of what it was, and then the activism to get laws changed. I feel like we’re in a second industrialization of data information... I see my role as trying to make as clear as possible what the downsides are, and diagnosing them really accurately so that they can be solvable. That’s hard work, and lots more people need to be doing it. \n",
+    "\n",
+    "It's reassuring that Angwin thinks we are largely still in the diagnosis phase: if your understanding of these problems feels incomplete, that is normal and natural. Nobody has a “cure” yet, although it is vital that we continue working to better understand and address the problems we are facing.\n",
+    "\n",
+    "One of our reviewers for this book, Fred Monroe, used to work in hedge fund trading. He told us, after reading this chapter, that many of the issues discussed here (distribution of data being dramatically different than what a model was trained on, the impact feedback loops on a model once deployed and at scale, and so forth) were also key issues for building profitable trading models. The kinds of things you need to do to consider societal consequences are going to have a lot of overlap with things you need to do to consider organizational, market, and customer consequences--so thinking carefully about ethics can also help you think carefully about how to make your data product successful more generally!"
   ]
  },
  {
@ -960,16 +961,16 @@
   "source": [
    "1. Does ethics provide a list of \"right answers\"?\n",
    "1. How can working with people of different backgrounds help when considering ethical questions?\n",
-    "1. What was the role of IBM in Nazi Germany? Why did the company participate as they did? Why did the workers participate?\n",
-    "1. What was the role of the first person jailed in the VW diesel scandal?\n",
+    "1. What was the role of IBM in Nazi Germany? Why did the company participate as it did? Why did the workers participate?\n",
+    "1. What was the role of the first person jailed in the Volkswagen diesel scandal?\n",
    "1. What was the problem with a database of suspected gang members maintained by California law enforcement officials?\n",
-    "1. Why did YouTube's recommendation algorithm recommend videos of partially clothed children to pedophiles, even though no employee at Google programmed this feature?\n",
+    "1. Why did YouTube's recommendation algorithm recommend videos of partially clothed children to pedophiles, even though no employee at Google had programmed this feature?\n",
    "1. What are the problems with the centrality of metrics?\n",
-    "1. Why did Meetup.com not include gender in their recommendation system for tech meetups?\n",
+    "1. Why did Meetup.com not include gender in its recommendation system for tech meetups?\n",
    "1. What are the six types of bias in machine learning, according to Suresh and Guttag?\n",
    "1. Give two examples of historical race bias in the US.\n",
-    "1. Where are most images in Imagenet from?\n",
-    "1. In the paper \"Does Machine Learning Automate Moral Hazard and Error\" why is sinusitis found to be predictive of a stroke?\n",
+    "1. Where are most images in ImageNet from?\n",
+    "1. In the paper [\"Does Machine Learning Automate Moral Hazard and Error\"](https://scholar.harvard.edu/files/sendhil/files/aer.p20171084.pdf) why is sinusitis found to be predictive of a stroke?\n",
    "1. What is representation bias?\n",
    "1. How are machines and people different, in terms of their use for making decisions?\n",
    "1. Is disinformation the same as \"fake news\"?\n",
@ -982,7 +983,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research:"
+    "### Further Research:"
   ]
  },
  {
@ -990,12 +991,12 @@
   "metadata": {},
   "source": [
    "1. Read the article \"What Happens When an Algorithm Cuts Your Healthcare\". How could problems like this be avoided in the future?\n",
-    "1. Research to find out more about YouTube's recommendation system and its societal impacts. Do you think recommendation systems must always have feedback loops with negative results? What approaches could Google take? What about the government?\n",
-    "1. Read the paper \"Discrimination in Online Ad Delivery\". Do you think Google should be considered responsible for what happened to Dr Sweeney? What would be an appropriate response?\n",
+    "1. Research to find out more about YouTube's recommendation system and its societal impacts. Do you think recommendation systems must always have feedback loops with negative results? What approaches could Google take to avoid them? What about the government?\n",
+    "1. Read the paper [\"Discrimination in Online Ad Delivery\"](https://arxiv.org/abs/1301.6822). Do you think Google should be considered responsible for what happened to Dr. Sweeney? What would be an appropriate response?\n",
    "1. How can a cross-disciplinary team help avoid negative consequences?\n",
-    "1. Read the paper \"Does Machine Learning Automate Moral Hazard and Error\" in American Economic Review. What actions do you think should be taken to deal with the issues identified in this paper?\n",
+    "1. Read the paper \"Does Machine Learning Automate Moral Hazard and Error\". What actions do you think should be taken to deal with the issues identified in this paper?\n",
    "1. Read the article \"How Will We Prevent AI-Based Forgery?\" Do you think Etzioni's proposed approach could work? Why?\n",
-    "1. Complete the section \"Analyze a project you are working on\" in this chapter.\n",
+    "1. Complete the section \"Analyze a Project You Are Working On\" in this chapter.\n",
    "1. Consider whether your team could be more diverse. If so, what approaches might help?"
   ]
  },
@ -1003,26 +1004,26 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Section 1: that's a wrap!"
+    "## Section 1: That's a Wrap!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Congratulations! You've made it to the end of the first section of the book. In this section we've tried to show you what deep learning can do, and how you can use it to create real applications and products. At this point, you will get a lot more out of the book if you spend some time trying out what you've learnt. Perhaps you have already been doing this as you go along — in which case, great! But if not, that's no problem either… Now is a great time to start experimenting yourself.\n",
+    "Congratulations! You've made it to the end of the first section of the book. In this section we've tried to show you what deep learning can do, and how you can use it to create real applications and products. At this point, you will get a lot more out of the book if you spend some time trying out what you've learned. Perhaps you have already been doing this as you go along—in which case, great! If not, that's no problem either... Now is a great time to start experimenting yourself.\n",
    "\n",
-    "If you haven't been to the book website yet, head over there now. Remember, you can find it here: [book.fast.ai](https://book.fast.ai). It's really important that you have got yourself set up to run the notebooks. Becoming an effective deep learning practitioner is all about practice. So you need to be training models. So please go get the notebooks running now if you haven't already! And also have a look on the website for any important updates or notices; deep learning changes fast, and we can't change the words that are printed in this book, so the website is where you need to look to ensure you have the most up-to-date information.\n",
+    "If you haven't been to the [book's website](https://book.fast.ai) yet, head over there now. It's really important that you get yourself set up to run the notebooks. Becoming an effective deep learning practitioner is all about practice, so you need to be training models. So, please go get the notebooks running now if you haven't already! And also have a look on the website for any important updates or notices; deep learning changes fast, and we can't change the words that are printed in this book, so the website is where you need to look to ensure you have the most up-to-date information.\n",
    "\n",
    "Make sure that you have completed the following steps:\n",
    "\n",
-    "- Connected to one of the GPU Jupyter servers recommended on the book website\n",
-    "- Run the first notebook yourself\n",
-    "- Uploaded an image that you find in the first notebook; then try a few different images of different kinds to see what happens\n",
-    "- Run the second notebook, collecting your own dataset based on image search queries that you come up with\n",
-    "- Thought about how you can use deep learning to help you with your own projects, including what kinds of data you could use, what kinds of problems may come up, and how you might be able to mitigate these issues in practice.\n",
+    "- Connect to one of the GPU Jupyter servers recommended on the book's website.\n",
+    "- Run the first notebook yourself.\n",
+    "- Upload an image that you find in the first notebook; then try a few different images of different kinds to see what happens.\n",
+    "- Run the second notebook, collecting your own dataset based on image search queries that you come up with.\n",
+    "- Think about how you can use deep learning to help you with your own projects, including what kinds of data you could use, what kinds of problems may come up, and how you might be able to mitigate these issues in practice.\n",
    "\n",
-    "In the next section of the book we will learn about how and why deep learning works, instead of just seeing how we can use it in practice. Understanding the how and why is important for both practitioners and researchers, because in this fairly new field nearly every project requires some level of customisation and debugging. The better you understand the foundations of deep learning, the better your models will be. These foundations are less important for executives, product managers, and so forth (although still useful, so feel free to keep reading!), but they are critical for anybody who is actually training and deploying models themselves."
+    "In the next section of the book you will learn about how and why deep learning works, instead of just seeing how you can use it in practice. Understanding the how and why is important for both practitioners and researchers, because in this fairly new field nearly every project requires some level of customization and debugging. The better you understand the foundations of deep learning, the better your models will be. These foundations are less important for executives, product managers, and so forth (although still useful, so feel free to keep reading!), but they are critical for anybody who is actually training and deploying models themselves."
   ]
  },
  {
--- a/04_mnist_basics.ipynb
+++ b/04_mnist_basics.ipynb
--- a/05_pet_breeds.ipynb
+++ b/05_pet_breeds.ipynb
@ -21,41 +21,41 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Image classification"
+    "# Image Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now that we understand what deep learning is, what it's for, and how to create and deploy a model, it's time for us to go deeper! In an ideal world deep learning practitioners wouldn't have to know every detail of how things work under the hood… But as yet, we don't live in an ideal world. The truth is, to make your model really work, and work reliably, there's a lot of details you have to get right. And a lot of details that you have to check. This process requires being able to look inside your neural network as it trains, and as it makes predictions, find possible problems, and know how to fix them.\n",
+    "Now that you understand what deep learning is, what it's for, and how to create and deploy a model, it's time for us to go deeper! In an ideal world deep learning practitioners wouldn't have to know every detail of how things work under the hood… But as yet, we don't live in an ideal world. The truth is, to make your model really work, and work reliably, there are a lot of details you have to get right, and a lot of details that you have to check. This process requires being able to look inside your neural network as it trains, and as it makes predictions, find possible problems, and know how to fix them.\n",
    "\n",
-    "So, from here on in the book we are going to do a deep dive into the mechanics of deep learning. What is the architecture of a computer vision model, an NLP model, a tabular model, and so on. How do you create an architecture which matches the needs of your particular domain? How do you get the best possible results from the training process? How do you make things faster? What do you have to change as your datasets change?\n",
+    "So, from here on in the book we are going to do a deep dive into the mechanics of deep learning. What is the architecture of a computer vision model, an NLP model, a tabular model, and so on? How do you create an architecture that matches the needs of your particular domain? How do you get the best possible results from the training process? How do you make things faster? What do you have to change as your datasets change?\n",
    "\n",
    "We will start by repeating the same basic applications that we looked at in the first chapter, but we are going to do two things:\n",
    "\n",
-    "- make them better;\n",
-    "- apply them to a wider variety of types of data.\n",
+    "- Make them better.\n",
+    "- Apply them to a wider variety of types of data.\n",
    "\n",
-    "In order to do these two things, we will have to learn all of the pieces of the deep learning puzzle. This includes: different types of layers, regularisation methods, optimisers, putting layers together into architectures, labelling techniques, and much more. We are not just going to dump all of these things out, but we will introduce them progressively as needed, to solve an actual problem related to the project we are working on."
+    "In order to do these two things, we will have to learn all of the pieces of the deep learning puzzle. This includes different types of layers, regularization methods, optimizers, how to put layers together into architectures, labeling techniques, and much more. We are not just going to dump all of these things on you, though; we will introduce them progressively as needed, to solve actual problems related to the projects we are working on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## From dogs and cats, to pet breeds"
+    "## From Dogs and Cats to Pet Breeds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In our very first model we learnt how to classify dogs versus cats. Just a few years ago this was considered a very challenging task. But today, it is far too easy! We will not be able to show you the nuances of training models with this problem, because we get the nearly perfect result without worrying about any of the details. But it turns out that the same dataset also allows us to work on a much more challenging problem: figuring out what breed of pet is shown in each image.\n",
+    "In our very first model we learned how to classify dogs versus cats. Just a few years ago this was considered a very challenging task--but today, it's far too easy! We will not be able to show you the nuances of training models with this problem, because we get a nearly perfect result without worrying about any of the details. But it turns out that the same dataset also allows us to work on a much more challenging problem: figuring out what breed of pet is shown in each image.\n",
    "\n",
-    "In the first chapter we presented the applications as already solved problems. But this is not how things work in real life. We start with some dataset which we know nothing about. We have to understand how it is put together, how to extract the data we need from it, and what that data looks like. For the rest of this book we will be showing you how to solve these problems in practice, including all of these intermediate steps necessary to understand the data that we are working with and test our modelling as we go.\n",
+    "In <<chapter_intro>> we presented the applications as already-solved problems. But this is not how things work in real life. We start with some dataset that we know nothing about. We then have to figure out how it is put together, how to extract the data we need from it, and what that data looks like. For the rest of this book we will be showing you how to solve these problems in practice, including all of the intermediate steps necessary to understand the data that you are working with and test your modeling as you go.\n",
    "\n",
-    "We have already downloaded the pets dataset. We can get a path to this dataset using the same code we saw in <<chapter_intro>>:"
+    "We already downloaded the Pet dataset, and we can get a path to this dataset using the same code as in <<chapter_intro>>:"
   ]
  },
  {
@ -74,12 +74,12 @@
   "source": [
    "Now if we are going to understand how to extract the breed of each pet from each image we're going to need to understand how this data is laid out. Such details of data layout are a vital piece of the deep learning puzzle. Data is usually provided in one of these two ways:\n",
    "\n",
-    "- Individual files representing items of data, such as text documents or images, possibly organised into folders or with filenames representing information about those items, or\n",
-    "- A table of data, such as in CSV format, where each row is an item, each row which may include filenames providing a connection between the data in the table and data in other formats such as text documents and images.\n",
+    "- Individual files representing items of data, such as text documents or images, possibly organized into folders or with filenames representing information about those items\n",
+    "- A table of data, such as in CSV format, where each row is an item which may include filenames providing a connection between the data in the table and data in other formats, such as text documents and images\n",
    "\n",
-    "There are exceptions to these rules, particularly in domains such as genomics, where there can be binary database formats or even network streams, but overall the vast majority of the datasets you work with use some combination of the above two formats.\n",
+    "There are exceptions to these rules--particularly in domains such as genomics, where there can be binary database formats or even network streams--but overall the vast majority of the datasets you'll work with will use some combination of these two formats.\n",
    "\n",
-    "To see what is in our dataset we can use the ls method:"
+    "To see what is in our dataset we can use the `ls` method:"
   ]
  },
  {
@ -116,7 +116,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can see that this dataset provides us with \"images\" and \"annotations\" directories. The website for this dataset tells us that the annotations directory contains information about where the pets are rather than what they are. In this chapter, we will be doing classification, not localization, which is to say that we care about what the pets are not where they are. Therefore we will ignore the annotations directory for now. So let's have a look inside the images directory:"
+    "We can see that this dataset provides us with *images* and *annotations* directories. The [website](https://www.robots.ox.ac.uk/~vgg/data/pets/) for the dataset tells us that the *annotations* directory contains information about where the pets are rather than what they are. In this chapter, we will be doing classification, not localization, which is to say that we care about what the pets are, not where they are. Therefore, we will ignore the *annotations* directory for now. So, let's have a look inside the *images* directory:"
   ]
  },
  {
@ -143,9 +143,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Most functions and methods in fastai which return a collection use a class called `L`. `L` can be thought of as an enhanced version of the ordinary Python `list` type, with added conveniences for common operations. For instance, when we display an object of this class in a notebook it appears in the format you see above. The first thing that is shown is the number of items in the collection, prefixed with a `#`. You'll also see in the above output that the list is suffixed with a \"…\". This means that only the first few items are displayed — which is a good thing, because we would not want more than 7000 filenames on our screen!\n",
+    "Most functions and methods in fastai that return a collection use a class called `L`. `L` can be thought of as an enhanced version of the ordinary Python `list` type, with added conveniences for common operations. For instance, when we display an object of this class in a notebook it appears in the format shown there. The first thing that is shown is the number of items in the collection, prefixed with a `#`. You'll also see in the preceding output that the list is suffixed with an ellipsis. This means that only the first few items are displayed—which is a good thing, because we would not want more than 7,000 filenames on our screen!\n",
    "\n",
-    "By examining these filenames, we see how they appear to be structured. Each file name contains the pet breed, and then an _ character, a number, and finally the file extension. We need to create a piece of code that extracts the breed from a single `Path`. Jupyter notebook makes this easy, because we can gradually build up something that works, and then use it for the entire dataset. We do have to be careful to not make too many assumptions at this point. For instance, if you look carefully you may notice that some of the pet breeds contain multiple words, so we cannot simply break at the first `_` character that we find. To allow us to test our code, let's pick out one of these filenames:"
+    "By examining these filenames, we can see how they appear to be structured. Each filename contains the pet breed, and then an underscore (`_`), a number, and finally the file extension. We need to create a piece of code that extracts the breed from a single `Path`. Jupyter notebooks make this easy, because we can gradually build up something that works, and then use it for the entire dataset. We do have to be careful to not make too many assumptions at this point. For instance, if you look carefully you may notice that some of the pet breeds contain multiple words, so we cannot simply break at the first `_` character that we find. To allow us to test our code, let's pick out one of these filenames:"
   ]
  },
  {
@ -163,11 +163,11 @@
   "source": [
    "The most powerful and flexible way to extract information from strings like this is to use a *regular expression*, also known as a *regex*. A regular expression is a special string, written in the regular expression language, which specifies a general rule for deciding if another string passes a test (i.e., \"matches\" the regular expression), and also possibly for plucking a particular part or parts out of that other string. \n",
    "\n",
-    "In this case, we need a regular expression that extracts the pet breed from the file name.\n",
+    "In this case, we need a regular expression that extracts the pet breed from the filename.\n",
    "\n",
-    "We do not have the space to give you a complete regular expression tutorial here, particularly because there are so many excellent ones online. And we know that many of you will already be familiar with this wonderful tool. If you're not, that is totally fine — this is a great opportunity for you to rectify that! We find that regular expressions are one of the most useful tools in our programming toolkit, and many of our students tell us that it is one of the things they are most excited to learn about. So head over to Google and search for *regular expressions tutorial* now, and then come back here after you've had a good look around. The book website also provides a list of our favorites.\n",
+    "We do not have the space to give you a complete regular expression tutorial here,but there are many excellent ones online and we know that many of you will already be familiar with this wonderful tool. If you're not, that is totally fine—this is a great opportunity for you to rectify that! We find that regular expressions are one of the most useful tools in our programming toolkit, and many of our students tell us that this is one of the things they are most excited to learn about. So head over to Google and search for \"regular expressions tutorial\" now, and then come back here after you've had a good look around. The [book's website](https://book.fast.ai/) also provides a list of our favorites.\n",
    "\n",
-    "> a: Not only are regular expressions dead handy, they also have interesting roots. They are \"regular\" because they were originally examples of a \"regular\" language, the lowest rung within the \"Chomsky hierarchy\", a grammar classification due to the same linguist Noam Chomsky who wrote _Syntactic Structures_, the pioneering work searching for the formal grammar underlying human language. This is one of the charms of computing: it may be that the hammer you reach for every day in fact came from a space ship.\n",
+    "> a: Not only are regular expressions dead handy, but they also have interesting roots. They are \"regular\" because they were originally examples of a \"regular\" language, the lowest rung within the Chomsky hierarchy, a grammar classification developed by linguist Noam Chomsky, who also wrote _Syntactic Structures_, the pioneering work searching for the formal grammar underlying human language. This is one of the charms of computing: it may be that the hammer you reach for every day in fact came from a spaceship.\n",
    "\n",
    "When you are writing a regular expression, the best way to start is just to try it against one example at first. Let's use the `findall` method to try a regular expression against the filename of the `fname` object:"
   ]
@ -196,9 +196,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This regular expression plucks out all the characters leading up to the last underscore character, as long as the subsequence characters are numerical digits and then the jpeg file extension.\n",
+    "This regular expression plucks out all the characters leading up to the last underscore character, as long as the subsequence characters are numerical digits and then the JPEG file extension.\n",
    "\n",
-    "Now that we confirmed the regular expression works for the example, let's use it to label the whole dataset. Fastai comes with many classes to help you with your labelling. For labelling with regular expressions, we can use the `RegexLabeller` class. We can use this in the data block API that we saw in <<chapter_production>> (in fact, we nearly always use the data block API--it's so much more flexible than the simple factory methods we saw in <<chapter_intro>>):"
+    "Now that we confirmed the regular expression works for the example, let's use it to label the whole dataset. Fastai comes with many classes to help with labeling. For labeling with regular expressions, we can use the `RegexLabeller` class. In this example we use the data block API we saw in <<chapter_production>> (in fact, we nearly always use the data block API--it's so much more flexible than the simple factory methods we saw in <<chapter_intro>>):"
   ]
  },
  {
@ -227,7 +227,7 @@
    "batch_tfms=aug_transforms(size=224, min_scale=0.75)\n",
    "```\n",
    "\n",
-    "These lines implement a fastai data augmentation strategy which we call *presizing*. Presizing is a particular way to do image augmentation, which is designed to minimize data destruction while maintaining good performance."
+    "These lines implement a fastai data augmentation strategy which we call *presizing*. Presizing is a particular way to do image augmentation that is designed to minimize data destruction while maintaining good performance."
   ]
  },
  {
@ -241,14 +241,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We need our images to have the same dimensions, so that they can collate into tensors to be passed to the GPU. We also want to minimize the number of distinct augmentation computations we perform. So the performance requirement suggests that we should, where possible, compose our augmentation transforms into fewer transforms (to reduce the number of computations, and reduce the number of lossy operations) and transform the images into uniform sizes (to run compute efficiently on the GPU).\n",
+    "We need our images to have the same dimensions, so that they can collate into tensors to be passed to the GPU. We also want to minimize the number of distinct augmentation computations we perform. The performance requirement suggests that we should, where possible, compose our augmentation transforms into fewer transforms (to reduce the number of computations and the number of lossy operations) and transform the images into uniform sizes (for more efficient processing on the GPU).\n",
    "\n",
    "The challenge is that, if performed after resizing down to the augmented size, various common data augmentation transforms might introduce spurious empty zones, degrade data, or both. For instance, rotating an image by 45 degrees fills corner regions of the new bounds with emptyness, which will not teach the model anything. Many rotation and zooming operations will require interpolating to create pixels. These interpolated pixels are derived from the original image data but are still of lower quality.\n",
    "\n",
-    "To workaround these challenges, presizing adopts two strategies that are shown in <<presizing>>:\n",
+    "To work around these challenges, presizing adopts two strategies that are shown in <<presizing>>:\n",
    "\n",
-    "1. First, resizing images to relatively \"large dimensions\" that is, dimensions significantly larger than the target training dimensions. \n",
-    "1. Second, composing all of the common augmentation operations (including a resize to the final target size) into one, and performing the combined operation on the GPU only once at the end of processing, rather than performing them individually and interpolating multiple times.\n",
+    "1. Resize images to relatively \"large\" dimensions--that is, dimensions significantly larger than the target training dimensions. \n",
+    "1. Compose all of the common augmentation operations (including a resize to the final target size) into one, and perform the combined operation on the GPU only once at the end of processing, rather than performing the operations individually and interpolating multiple times.\n",
    "\n",
    "The first step, the resize, creates images large enough that they have spare margin to allow further augmentation transforms on their inner regions without creating empty zones. This transformation works by resizing to a square, using a large crop size. On the training set, the crop area is chosen randomly, and the size of the crop is selected to cover the entire width or height of the image, whichever is smaller.\n",
    "\n",
@ -268,19 +268,19 @@
   "source": [
    "This picture shows the two steps:\n",
    "\n",
-    "1. *Crop full width or height*: This is in `item_tfms`, so it's applied to each individual image before it is copied to the GPU. It's used to ensure all images are the same size. On the training set, the crop area is chosen randomly. On the validation set, the center square of the image is always chosen\n",
-    "2. *Random crop and augment*: This is in `batch_tfms`, so it's applied to a batch all at once on the GPU, which means it's fast. On the validation set, only the resize to the final size needed for the model is done here. On the training set, the random crop and any other augmentation is done first.\n",
+    "1. *Crop full width or height*: This is in `item_tfms`, so it's applied to each individual image before it is copied to the GPU. It's used to ensure all images are the same size. On the training set, the crop area is chosen randomly. On the validation set, the center square of the image is always chosen.\n",
+    "2. *Random crop and augment*: This is in `batch_tfms`, so it's applied to a batch all at once on the GPU, which means it's fast. On the validation set, only the resize to the final size needed for the model is done here. On the training set, the random crop and any other augmentations are done first.\n",
    "\n",
-    "To implement this process in fastai you use `Resize` as an item transform with a large size, and `RandomResizedCrop` as a batch transform with a smaller size. `RandomResizedCrop` will be added for you if you include the `min_scale` parameter in your `aug_transform` function, as you see in the `DataBlock` call above. Alternatively, you can use `pad` or `squish` instead of `crop` (the default) for the initial `Resize`.\n",
+    "To implement this process in fastai you use `Resize` as an item transform with a large size, and `RandomResizedCrop` as a batch transform with a smaller size. `RandomResizedCrop` will be added for you if you include the `min_scale` parameter in your `aug_transforms` function, as was done in the `DataBlock` call in the previous section. Alternatively, you can use `pad` or `squish` instead of `crop` (the default) for the initial `Resize`.\n",
    "\n",
-    "You can see in this example the difference between an image which has been zoomed, interpolated, rotated, and then interpolated again on the right (which is the approach used by all other deep learning libraries), compared to an image which has been zoomed and rotated as one operation, and then interpolated just once on the left (the fastai approach):"
+    "<<interpolations>> shows the difference between an image that has been zoomed, interpolated, rotated, and then interpolated again (which is the approach used by all other deep learning libraries), shown here on the right, and an image that has been zoomed and rotated as one operation and then interpolated just once on the left (the fastai approach), shown here on the left."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
-    "hide_input": true
+    "hide_input": false
   },
   "outputs": [
    {
@ -298,6 +298,8 @@
   ],
   "source": [
    "#hide_input\n",
+    "#id interpolations\n",
+    "#caption A comparison of fastai's data augmentation strategy (left) and the traditional approach (right).\n",
    "dblock1 = DataBlock(blocks=(ImageBlock(), CategoryBlock()),\n",
    "                   get_y=parent_label,\n",
    "                   item_tfms=Resize(460))\n",
@ -324,23 +326,23 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can see here that the image on the right is less well defined, and has reflection padding artifacts in the bottom left, and the grass in the top left has disappeared entirely. We find that in practice using presizing significantly improves the accuracy of models, and often results in speedups too.\n",
+    "You can see that the image on the right is less well defined and has reflection padding artifacts in the bottom-left corner; also, the grass iat the top left has disappeared entirely. We find that in practice using presizing significantly improves the accuracy of models, and often results in speedups too.\n",
    "\n",
-    "Checking your data looks right is extremely important before training a model. There are simple ways to do this (and debug if needed) in the fastai library, let's look at them now."
+    "The fastai library also provides simple ways to check your data looks right before training a model, which is an extremely important step. We'll look at those next."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Checking and debugging a DataBlock"
+    "### Checking and Debugging a DataBlock"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can never just assume that our code is working perfectly. Writing a `DataBlock` is just like writing a blueprint. You will get an error message if you have a syntax error somewhere in your code but you have no guaranty that your template is going to work on your source of data as you intend. The first thing to do before we trying to train a model is to use the `show_batch` method and have a look at your data:"
+    "We can never just assume that our code is working perfectly. Writing a `DataBlock` is just like writing a blueprint. You will get an error message if you have a syntax error somewhere in your code, but you have no guarantee that your template is going to work on your data source as you intend. So, before training a model you should always check your data. You can do this using the `show_batch` method:"
   ]
  },
  {
@ -369,9 +371,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Have a look at each image, and check that each one seems to have the correct label for that breed of pet. Often, data scientists work with data with which they are not as familiar as domain experts may be: for instance, I actually don't know what a lot of these pet breeds are. Since I am not an expert on pet breeds, I would use Google images at this point to search for a few of these breeds, and make sure the images looks similar to what I see in this output.\n",
+    "Take a look at each image, and check that each one seems to have the correct label for that breed of pet. Often, data scientists work with data with which they are not as familiar as domain experts may be: for instance, I actually don't know what a lot of these pet breeds are. Since I am not an expert on pet breeds, I would use Google images at this point to search for a few of these breeds, and make sure the images look similar to what I see in this output.\n",
    "\n",
-    "If you made a mistake while building your `DataBlock` it is very likely you won't see it before this step. To debug this, we encourage you to use the `summary` method. It will attempt to create a batch from the source you give it, with a lot of details. Also, if it fails, you will see exactly at which point the error happens, and the library will try to give you some help. For instance, one common mistake is to forget to put a `Resize` transform, ending up with pictures of different sizes and not able to batch them. Here is what the summary would look like in that case (note that the exact text may have changed since the time of writing, but it will give you an idea):"
+    "If you made a mistake while building your `DataBlock`, it is very likely you won't see it before this step. To debug this, we encourage you to use the `summary` method. It will attempt to create a batch from the source you give it, with a lot of details. Also, if it fails, you will see exactly at which point the error happens, and the library will try to give you some help. For instance, one common mistake is to forget to use a `Resize` transform, so you en up with pictures of different sizes and are not able to batch them. Here is what the summary would look like in that case (note that the exact text may have changed since the time of writing, but it will give you an idea):"
   ]
  },
  {
@ -514,9 +516,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can see exactly how we gathered the data and split it, how we went from a filename to a *sample* (the tuple image, category), then what item transforms were applied and how it failed to collate those samples in a batch (because of the different shapes). \n",
+    "You can see exactly how we gathered the data and split it, how we went from a filename to a *sample* (the tuple (image, category)), then what item transforms were applied and how it failed to collate those samples in a batch (because of the different shapes). \n",
    "\n",
-    "Once you think your data looks right, we generally recommend the next step should be creating a simple model. We often see people procrastinate the training of an actual model for far too long. As a result, they don't actually get to find out what their baseline results look like. Perhaps it doesn't need lots of fancy domain specific engineering. Or perhaps the data doesn't seem to train it all. These are things that you want to know as soon as possible. So we will use the same simple model that we used in <<chapter_intro>>:"
+    "Once you think your data looks right, we generally recommend the next step should be using to train a simple model. We often see people put off the training of an actual model for far too long. As a result, they don't actually find out what their baseline results look like. Perhaps your probem doesn't need lots of fancy domain-specific engineering. Or perhaps the data doesn't seem to train the model all. These are things that you want to know as soon as possible. For this initial test, we'll use the same simple model that we used in <<chapter_intro>>:"
   ]
  },
  {
@ -603,42 +605,42 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As we've briefly discussed before, the table shown when we fit a model shows us the results after each epoch of training. Remember, an epoch is one complete pass through all of the images in the data. The columns shown are the average loss over the items of the training set, the loss on the validation set, and any metrics that you requested — in this case, the error rate.\n",
+    "As we've briefly discussed before, the table shown when we fit a model shows us the results after each epoch of training. Remember, an epoch is one complete pass through all of the images in the data. The columns shown are the average loss over the items of the training set, the loss on the validation set, and any metrics that we requested—in this case, the error rate.\n",
    "\n",
-    "Remember that *loss* is whatever function we've decided to use to optimise the parameters of our model. But we haven't actually told fastai what loss function we want to use. So what is it doing? Fastai will generally try to select an appropriate loss function based on what kind of data and model you are using. In this case you have image data, and a categorical outcome, so fastai will default to using *cross entropy loss*."
+    "Remember that *loss* is whatever function we've decided to use to optimize the parameters of our model. But we haven't actually told fastai what loss function we want to use. So what is it doing? Fastai will generally try to select an appropriate loss function based on what kind of data and model you are using. In this case we have image data and a categorical outcome, so fastai will default to using *cross-entropy loss*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Cross entropy loss"
+    "## Cross-Entropy Loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "*Cross entropy loss* is a loss function which is similar to the loss function we used in the previous chapter, but (as we'll see) has two benefits:\n",
+    "*Cross-entropy loss* is a loss function that is similar to the one we used in the previous chapter, but (as we'll see) has two benefits:\n",
    "\n",
-    "- It works even when our dependent variable has more than two categories\n",
+    "- It works even when our dependent variable has more than two categories.\n",
    "- It results in faster and more reliable training.\n",
    "\n",
-    "In order to understand how cross entropy loss works for dependent variables with more than two categories, we first have to understand what the actual data and activations that are seen by the loss function look like."
+    "In order to understand how cross-entropy loss works for dependent variables with more than two categories, we first have to understand what the actual data and activations that are seen by the loss function look like."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Viewing activations and labels"
+    "### Viewing Activations and Labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's have a look at the activations of our model. To actually get a batch of real data from our DataLoaders, we can use the `one_batch` method:"
+    "Let's take a look at the activations of our model. To actually get a batch of real data from our `DataLoaders`, we can use the `one_batch` method:"
   ]
  },
  {
@ -654,7 +656,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As you see, this returns the dependent, and the independent variables, as a mini-batch. Let's see what is actually contained in our dependent variable:"
+    "As you see, this returns the dependent and independent variables, as a mini-batch. Let's see what is actually contained in our dependent variable:"
   ]
  },
  {
@ -682,7 +684,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Our batch size is 64, so we have 64 rows in this tensor. Each row is a single integer between zero and 36, representing our 37 possible pet breeds. We can view the predictions (that is, the activations of the final layer of our neural network) using `Learner.get_preds`. This function either takes a dataset index (0 for train and 1 for valid) or an iterator of batches. Thus, we can pass it a simple list with our batch to get our predictions. It returns predictions and targets by default, but since we already have the targets, we can effectively ignore them by assigning to the special variable `_`:"
+    "Our batch size is 64, so we have 64 rows in this tensor. Each row is a single integer between 0 and 36, representing our 37 possible pet breeds. We can view the predictions (that is, the activations of the final layer of our neural network) using `Learner.get_preds`. This function either takes a dataset index (0 for train and 1 for valid) or an iterator of batches. Thus, we can pass it a simple list with our batch to get our predictions. It returns predictions and targets by default, but since we already have the targets, we can effectively ignore them by assigning to the special variable `_`:"
   ]
  },
  {
@ -722,7 +724,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The actual predictions are 37 probabilities between zero and one, which add up to 1 in total."
+    "The actual predictions are 37 probabilities between 0 and 1, which add up to 1 in total:"
   ]
  },
  {
@ -749,7 +751,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "To transform the activations of our model into predictions like this, we used something called the softmax activation function."
+    "To transform the activations of our model into predictions like this, we used something called the *softmax* activation function."
   ]
  },
  {
@ -763,9 +765,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In our classification model, an activation function called *softmax* in the final layer is used to ensure that the activations are between zero and one, and that they sum to one.\n",
+    "In our classification model, we use the softmax activation function in the final layer to ensure that the activations are all between 0 and 1, and that they sum to 1.\n",
    "\n",
-    "Softmax is similar to the sigmoid function, which we saw earlier; sigmoid looks like this:"
+    "Softmax is similar to the sigmoid function, which we saw earlier. As a reminder sigmoid looks like this:"
   ]
  },
  {
@ -794,9 +796,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can apply this function to a single column of activations from a neural network, and get back a column of numbers between zero and one. So it's a very useful activation function for our final layer.\n",
+    "We can apply this function to a single column of activations from a neural network, and get back a column of numbers between 0 and 1, so it's a very useful activation function for our final layer.\n",
    "\n",
-    "Now think about what happens if we want to have more categories in our target (such as our 37 pet breeds). That means we'll need more activations than just a single column: we need an activation *per category*. We can create, for instance, a neural net that predicts \"3\"s and \"7\"s that returns two activations, one for each class--this will be a good first step towards creating the more general approach. Let's just use some random numbers with a standard deviation of 2 (so we multiply `randn` by 2) for this example, assuming we have six images and two possible categories (where the first columns represents \"3\"s and the second is \"7\"s):"
+    "Now think about what happens if we want to have more categories in our target (such as our 37 pet breeds). That means we'll need more activations than just a single column: we need an activation *per category*. We can create, for instance, a neural net that predicts 3s and 7s that returns two activations, one for each class--this will be a good first step toward creating the more general approach. Let's just use some random numbers with a standard deviation of 2 (so we multiply `randn` by 2) for this example, assuming we have 6 images and 2 possible categories (where the first column represents 3s and the second is 7s):"
   ]
  },
  {
@ -839,7 +841,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can't just take the sigmoid of this directly, since we don't get rows that add to one (i.e we want the probability of being a \"3\" plus the probability of being a \"7\" to add to one):"
+    "We can't just take the sigmoid of this directly, since we don't get rows that add to 1 (i.e., we want the probability of being a 3 plus the probability of being a 7 to add up to 1):"
   ]
  },
  {
@ -871,11 +873,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In <<chapter_mnist_basics>>, the neural net created a single activation per image, which we passed through the sigmoid function. That single activation represented the confidence that the input was a \"3\". Binary problems are a special case of classification problems, because the target can be treated as a single boolean value, as we did in `mnist_loss`. Binary problems can also be thought of as part of the more general group of classifiers with any number of categories--where in this case we happen to have 2 categories. As we saw in the bear classifier, our neural net will return one activation per category.\n",
+    "In <<chapter_mnist_basics>>, our neural net created a single activation per image, which we passed through the `sigmoid` function. That single activation represented the model's confidence that the input was a 3. Binary problems are a special case of classification problems, because the target can be treated as a single boolean value, as we did in `mnist_loss`. But binary problems can also be thought of in the context of the more general group of classifiers with any number of categories: in this case, we happen to have two categories. As we saw in the bear classifier, our neural net will return one activation per category.\n",
    "\n",
-    "So in the binary case, what do those activations really indicate? A single pair of activations simply indicates the *relative* confidence of being a \"3\" versus being a \"7\". The overall values, whether they are both high, or both low, don't matter--all that matters is which is higher, and by how much.\n",
+    "So in the binary case, what do those activations really indicate? A single pair of activations simply indicates the *relative* confidence of the input being a 3 versus being a 7. The overall values, whether they are both high, or both low, don't matter--all that matters is which is higher, and by how much.\n",
    "\n",
-    "We would expect that since this is just another way of representing the same problem (in the binary case) that we would be able to use sigmoid directly on the two-activation version of our neural net. And indeed we can! We can just take the *difference* between the neural net activations, because that reflects how much more sure we are of being a \"3\" vs a \"7\", and then take the sigmoid of that:"
+    "We would expect that since this is just another way of representing the same problem, that we would be able to use `sigmoid` directly on the two-activation version of our neural net. And indeed we can! We can just take the *difference* between the neural net activations, because that reflects how much more sure we are of the input being a 3 than a 7, and then take the sigmoid of that:"
   ]
  },
  {
@ -902,7 +904,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The second column (the probability of being a \"7\") will then just be that subtracted from one. We need a way to do all this that also works for more than two columns. It turns out that this function, called `softmax`, is exactly that:\n",
+    "The second column (the probability of it being a 7) will then just be that value subtracted from 1. Now, we need a way to do all this that also works for more than two columns. It turns out that this function, called `softmax`, is exactly that:\n",
    "\n",
    "``` python\n",
    "def softmax(x): return exp(x) / exp(x).sum(dim=1, keepdim=True)\n",
@ -913,14 +915,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "> jargon: Exponential function (exp): Literally defined as `e**x`, where `e` is a special number approximately equal to 2.718. It is the inverse of the natural logarithm function. Note that `exp` is always positive, and it increases *very* rapidly!"
+    "> jargon: Exponential function (exp): Literally defined as `e**x`, where `e` is a special number approximately equal to 2.718. It is the inverse of the natural logarithm function. Note that `exp` is always positive, and it increases _very_ rapidly!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's check that `softmax` returns the same values as `sigmoid` for the first column, and that subtracted from one for the second column:"
+    "Let's check that `softmax` returns the same values as `sigmoid` for the first column, and those values subtracted from 1 for the second column:"
   ]
  },
  {
@ -953,41 +955,41 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Softmax is the multi-category equivalent of sigmoid--we have to use it any time we have more than two categories, and the probabilities of the categories must add to one. (We often use it even when there's just two categories, just to make things a bit more consistent.) We could create other functions that have the properties that all activations are between zero and one, and sum to one; however, no other function has the same relationship to the sigmoid function, which we've seen is smooth and symmetric. Also, we'll see shortly that the softmax function works well hand-in-hand with the loss function we will look at in the next section.\n",
+    "`softmax` is the multi-category equivalent of `sigmoid`--we have to use it any time we have more than two categories and the probabilities of the categories must add to 1, and we often use it even when there are just two categories, just to make things a bit more consistent. We could create other functions that have the properties that all activations are between 0 and 1, and sum to 1; however, no other function has the same relationship to the sigmoid function, which we've seen is smooth and symmetric. Also, we'll see shortly that the softmax function works well hand-in-hand with the loss function we will look at in the next section.\n",
    "\n",
-    "If we have three output activations, such as in our bear classifier, calculating softmax for a single bear image would then look like something like this:"
+    "If we have three output activations, such as in our bear classifier, calculating softmax for a single bear image would then look like something like <<bear_softmax>>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<img alt=\"Bear softmax example\" width=\"280\" id=\"bear_softmax\" src=\"images/att_00062.png\">"
+    "<img alt=\"Bear softmax example\" width=\"280\" id=\"bear_softmax\" caption=\"Example of softmax on the bear classifier\" src=\"images/att_00062.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "What does this function do in practice? Taking the exponential ensures all our numbers are positive, and then dividing by the sum ensures we are going to have a bunch of numbers that add up to one. The exponential also has a nice property: if one of the numbers in our activations `x` is slightly bigger than the others, the exponential will amplify this (since it grows, well... exponentially) which means that in the softmax, that number will be closer to 1. \n",
+    "What does this function do in practice? Taking the exponential ensures all our numbers are positive, and then dividing by the sum ensures we are going to have a bunch of numbers that add up to 1. The exponential also has a nice property: if one of the numbers in our activations `x` is slightly bigger than the others, the exponential will amplify this (since it grows, well... exponentially), which means that in the softmax, that number will be closer to 1. \n",
    "\n",
-    "Intuitively, the Softmax function *really* wants to pick one class among the others, so it's ideal for training a classifier when we know each picture has a definite label. (Note that it may be less ideal during inference, as you might want your model to sometimes tell you it doesn't recognize any of the classes that it has seen during training, and not pick a class because it has a slightly bigger activation score. In this case, it might be better to train a model using multiple binary output columns, each using a sigmoid activation.)\n",
+    "Intuitively, the softmax function *really* wants to pick one class among the others, so it's ideal for training a classifier when we know each picture has a definite label. (Note that it may be less ideal during inference, as you might want your model to sometimes tell you it doesn't recognize any of the classes that it has seen during training, and not pick a class because it has a slightly bigger activation score. In this case, it might be better to train a model using multiple binary output columns, each using a sigmoid activation.)\n",
    "\n",
-    "Softmax is the first part of the cross entropy loss, the second part is log likeklihood. "
+    "Softmax is the first part of the cross-entropy loss--the second part is log likeklihood. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Log likelihood"
+    "### Log Likelihood"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "When we calculated the loss for our MNIST example in the last chapter we used.\n",
+    "When we calculated the loss for our MNIST example in the last chapter we used:\n",
    "\n",
    "```python\n",
    "def mnist_loss(inputs, targets):\n",
@ -995,9 +997,9 @@
    "    return torch.where(targets==1, 1-inputs, inputs).mean()\n",
    "```\n",
    "\n",
-    "Just like we moved from sigmoid to softmax, we need to extend the loss function to work with more than just binary classification, to classifying any number of categories (in this case, we have 37 categories). Our activations, after softmax, are between zero and one, and sum to one for each row in the batch of predictions. Our targets are integers between 0 and 36.\n",
+    "Just as we moved from sigmoid to softmax, we need to extend the loss function to work with more than just binary classification--it needs to be able to classify any number of categories (in this case, we have 37 categories). Our activations, after softmax, are between 0 and 1, and sum to 1 for each row in the batch of predictions. Our targets are integers between 0 and 36.\n",
    "\n",
-    "In the binary case, we used `torch.where` to select between `inputs` and `1-inputs`. When we treat a binary classification as a general classification problem with two categories, it actually becomes even easier, because (as we saw in the softmax section) we now have two columns, containing the equivalent of `inputs` and `1-inputs`. So all we need to do is select from the appropriate column. Let's try to implement this in PyTorch. For our synthetic \"3\"s and \"7\" example, let's say these are our labels:"
+    "In the binary case, we used `torch.where` to select between `inputs` and `1-inputs`. When we treat a binary classification as a general classification problem with two categories, it actually becomes even easier, because (as we saw in the previous section) we now have two columns, containing the equivalent of `inputs` and `1-inputs`. So, all we need to do is select from the appropriate column. Let's try to implement this in PyTorch. For our synthetic 3s and 7s example, let's say these are our labels:"
   ]
  },
  {
@ -1013,7 +1015,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "...and these are the softmax activations:"
+    "and these are the softmax activations:"
   ]
  },
  {
@ -1045,7 +1047,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Then for each item of `targ` we can use that to select that column of `sm_acts` using tensor indexing, like so:"
+    "Then for each item of `targ` we can use that to select the appropriate column of `sm_acts` using tensor indexing, like so:"
   ]
  },
  {
@ -1155,11 +1157,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Looking at this table, you can see that the final column can be calculated by taking the `targ` and `idx` columns as indices into the 2-column matrix containing the `3` and `7` columns. That's what `sm_acts[idx, targ]` is actually doing.\n",
+    "Looking at this table, you can see that the final column can be calculated by taking the `targ` and `idx` columns as indices into the two-column matrix containing the `3` and `7` columns. That's what `sm_acts[idx, targ]` is actually doing.\n",
    "\n",
-    "The really interesting thing here is that this actually works just as well with more than two columns. To see this, consider what would happen if we added an activation column above for every digit (zero through nine), and then `targ` contained a number from zero to nine. As long as the activation columns sum to one (as they will, if we use softmax), then we'll have a loss function that shows how well we're predicting each digit.\n",
+    "The really interesting thing here is that this actually works just as well with more than two columns. To see this, consider what would happen if we added an activation column for every digit (0 through 9), and then `targ` contained a number from 0 to 9. As long as the activation columns sum to 1 (as they will, if we use softmax), then we'll have a loss function that shows how well we're predicting each digit.\n",
    "\n",
-    "We're only picking the loss from the column containing the correct label. We don't need to consider the other columns, because by the definition of softmax, they add up to one minus the activation corresponding to the correct label. Therefore, making the activation for the correct label as high as possible, must mean we're also decreasing the activations of the remaining columns.\n",
+    "We're only picking the loss from the column containing the correct label. We don't need to consider the other columns, because by the definition of softmax, they add up to 1 minus the activation corresponding to the correct label. Therefore, making the activation for the correct label as high as possible must mean we're also decreasing the activations of the remaining columns.\n",
    "\n",
    "PyTorch provides a function that does exactly the same thing as `sm_acts[range(n), targ]` (except it takes the negative, because when applying the log afterward, we will have negative numbers), called `nll_loss` (*NLL* stands for *negative log likelihood*):"
   ]
@ -1208,21 +1210,21 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Despite the name being negative log likelihood, this PyTorch function does not take the log (we will see why in the next section). First, let's see why taking the logarithm can be useful."
+    "Despite its name, this PyTorch function does not take the log. We'll see why in the next section, but first, let's see why taking the logarithm can be useful."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Taking the `log`"
+    "### Taking the Log"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This does work quite well as a loss function, but we can make it a bit better. The problem is that we are using probabilities, and probabilities cannot be smaller than zero, or greater than one. But that means that our model will not care about whether it predicts 0.99 versus 0.999, because those numbers are so close together. But in another sense, 0.999 is 10 times more confident than 0.99. So we wish to transform our numbers between zero and one to instead be between negative infinity and infinity. There is a function available in maths which does exactly this: the logarithm (available as `torch.log`). It is not defined for numbers less than zero, and looks like this:"
+    "The function we saw in the previous section works quite well as a loss function, but we can make it a bit better. The problem is that we are using probabilities, and probabilities cannot be smaller than 0 or greater than 1. That means that our model will not care whether it predicts 0.99 or 0.999. Indeed, those numbers are so close together--but in another sense, 0.999 is 10 times more confident than 0.99. So, we want to transform our numbers between 0 and 1 to instead be between negative infinity and infinity. There is a mathematical function that does exactly this: the *logarithm* (available as `torch.log`). It is not defined for numbers less than 0, and looks like this:"
   ]
  },
  {
@ -1260,11 +1262,11 @@
    "\n",
    "In this case, we're assuming that `log(y,b)` returns *log y base b*. However, PyTorch actually doesn't define `log` this way: `log` in Python uses the special number `e` (2.718...) as the base.\n",
    "\n",
-    "Perhaps a logarithm is something that you have not thought about for the last 20 years or so. But it's a mathematical idea which is going to be really critical for many things in deep learning, so now would be a great time to refresh your memory. The key thing to know about logarithms is this relationship:\n",
+    "Perhaps a logarithm is something that you have not thought about for the last 20 years or so. But it's a mathematical idea that is going to be really critical for many things in deep learning, so now would be a great time to refresh your memory. The key thing to know about logarithms is this relationship:\n",
    "\n",
    "    log(a*b) = log(a)+log(b)\n",
    "\n",
-    "When we see it in that format, it looks a bit boring; but have a think about what this really means. It means that logarithms increase linearly when the underlying signal increases exponentially or multiplicatively. This is used for instance in the Richter scale of earthquake severity, and the dB scale of noise levels. It's also often used on financial charts, where we want to show compound growth rates more clearly. Computer scientists love using logarithms, because it means that modification, which can create really really large and really really small numbers, can be replaced by addition, which is much less likely to result in scales which are difficult for our computer to handle."
+    "When we see it in that format, it looks a bit boring; but think about what this really means. It means that logarithms increase linearly when the underlying signal increases exponentially or multiplicatively. This is used, for instance, in the Richter scale of earthquake severity, and the dB scale of noise levels. It's also often used on financial charts, where we want to show compound growth rates more clearly. Computer scientists love using logarithms, because it means that modification, which can create really really large and really really small numbers, can be replaced by addition, which is much less likely to result in scales that are difficult for our computers to handle."
   ]
  },
  {
@ -1285,14 +1287,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "> warning: The \"NLL\" in \"nll_loss\" stands for \"negative log likelihood\", but it doesn't actually take the log at all! It assumes you have _already_ taken the log. PyTorch has a function called \"log_softmax\" which combines \"log\" and \"softmax\" in a fast and accurate way."
+    "> warning: Confusing Name, Beware: The nll in `nll_loss` stands for \"negative log likelihood,\" but it doesn't actually take the log at all! It assumes you have _already_ taken the log. PyTorch has a function called `log_softmax` that combines `log` and `softmax` in a fast and accurate way. `nll_loss` is deigned to be used after `log_softmax`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "When we first take the softmax, and then the log likelihood of that, that combination is called *cross entropy loss*. In PyTorch, this is available as `nn.CrossEntropyLoss` (which, in practice, actually does `log_softmax` and then `nll_loss`)."
+    "When we first take the softmax, and then the log likelihood of that, that combination is called *cross-entropy loss*. In PyTorch, this is available as `nn.CrossEntropyLoss` (which, in practice, actually does `log_softmax` and then `nll_loss`):"
   ]
  },
  {
@ -1335,7 +1337,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "All PyTorch loss functions are provided in two forms: the class form seen above, and also a plain functional form, available in the `F` namespace:"
+    "All PyTorch loss functions are provided in two forms, the class just shown above, and also a plain functional form, available in the `F` namespace:"
   ]
  },
  {
@ -1362,7 +1364,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Either one works fine and can be used in any situation. We've noticed that most people tend to use the class version, and that's more often used in PyTorch official docs and examples, so we'll tend to use that too.\n",
+    "Either one works fine and can be used in any situation. We've noticed that most people tend to use the class version, and that's more often used in PyTorch's official docs and examples, so we'll tend to use that too.\n",
    "\n",
    "By default PyTorch loss functions take the mean of the loss of all items. You can use `reduction='none'` to disable that:"
   ]
@ -1391,14 +1393,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "> s: An interesting feature about cross entropy loss appears when we consider its gradient. The gradient of `cross_entropy(a,b)` is just `softmax(a)-b`. Since `softmax(a)` is just the final activation of the model, that means that the gradient is proportional to the difference between the prediction and the target. This is the same as mean squared error in regression (assuming there's no final activation function such as that added by `y_range`), since the gradient of `(a-b)**2` is `2*(a-b)`. Since the gradient is linear, that means that we won't see sudden jumps or exponential increases in gradients, which should lead to smoother training of models."
+    "> s: An interesting feature about cross-entropy loss appears when we consider its gradient. The gradient of `cross_entropy(a,b)` is just `softmax(a)-b`. Since `softmax(a)` is just the final activation of the model, that means that the gradient is proportional to the difference between the prediction and the target. This is the same as mean squared error in regression (assuming there's no final activation function such as that added by `y_range`), since the gradient of `(a-b)**2` is `2*(a-b)`. Because the gradient is linear, that means we won't see sudden jumps or exponential increases in gradients, which should lead to smoother training of models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We have now seen all the pieces hidden behind our loss function. While it gives us a number on how well (or bad) our model is doing, it does nothing to help us know if it's actually any good. Let's now see some ways to interpret our model predictions."
+    "We have now seen all the pieces hidden behind our loss function. But while this puts a number on how well (or badly) our model is doing, it does nothing to help us know if it's actually any good. Let's now see some ways to interpret our model's predictions."
   ]
  },
  {
@ -1412,7 +1414,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "It's very hard to interpret loss functions directly, because they are designed to be things which computers can differentiate and optimise, not things that people can understand. That's why we have metrics. These are not used in the optimisation process, but just used to help us poor humans understand what's going on. In this case, our accuracy is looking pretty good already! So where are we making mistakes?\n",
+    "It's very hard to interpret loss functions directly, because they are designed to be things computers can differentiate and optimize, not things that people can understand. That's why we have metrics. These are not used in the optimization process, but just to help us poor humans understand what's going on. In this case, our accuracy is looking pretty good already! So where are we making mistakes?\n",
    "\n",
    "We saw in <<chapter_intro>> that we can use a confusion matrix to see where our model is doing well, and where it's doing badly:"
   ]
@ -1455,7 +1457,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Oh dear, in this case, a confusion matrix is very hard to read. We have 37 different breeds of pet, which means we have 37×37 entries in this giant matrix! Instead, we can use the `most_confused` method, which just shows us the cells of the confusion matrix with the most incorrect predictions (here with at least 5 or more):"
+    "Oh dear--in this case, a confusion matrix is very hard to read. We have 37 different breeds of pet, which means we have 37×37 entries in this giant matrix! Instead, we can use the `most_confused` method, which just shows us the cells of the confusion matrix with the most incorrect predictions (here, with at least 5 or more):"
   ]
  },
  {
@ -1486,16 +1488,16 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Since we are not pet breed experts, it is hard for us to know whether these category errors reflect actual difficulties in recognising breeds. So again, we turn to Google. A little bit of googling tells us that the most common category errors shown here are actually breed differences which even expert breeders sometimes disagree about. So this gives us some comfort that we are on the right track.\n",
+    "Since we are not pet breed experts, it is hard for us to know whether these category errors reflect actual difficulties in recognizing breeds. So again, we turn to Google. A little bit of Googling tells us that the most common category errors shown here are actually breed differences that even expert breeders sometimes disagree about. So this gives us some comfort that we are on the right track.\n",
    "\n",
-    "So we seem to have a good baseline. What can we do now to make it even better?"
+    "We seem to have a good baseline. What can we do now to make it even better?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Improving our model"
+    "## Improving Our Model"
   ]
  },
  {
@ -1504,21 +1506,21 @@
   "source": [
    "We will now look at a range of techniques to improve the training of our model and make it better. While doing so, we will explain a little bit more about transfer learning and how to fine-tune our pretrained model as best as possible, without breaking the pretrained weights.\n",
    "\n",
-    "The first thing we need to set when training a model is the learning rate. We saw in the previous chapter that it needed to be just right to train as efficiently as possible, so how do we pick a good one? fastai provides something called the Learning rate finder for this."
+    "The first thing we need to set when training a model is the learning rate. We saw in the previous chapter that it needs to be just right to train as efficiently as possible, so how do we pick a good one? fastai provides a tool for this."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Learning rate finder"
+    "### The Learning Rate Finder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "One of the most important things we can do when training a model is to make sure that we have the right learning rate. If our learning rate is too low, it can take many many epochs. Not only does this waste time, but it also means that we may have problems with overfitting, because every time we do a complete pass through the data, we give our model a chance to memorise it.\n",
+    "One of the most important things we can do when training a model is to make sure that we have the right learning rate. If our learning rate is too low, it can take many, many epochs to train our model. Not only does this waste time, but it also means that we may have problems with overfitting, because every time we do a complete pass through the data, we give our model a chance to memorize it.\n",
    "\n",
    "So let's just make our learning rate really high, right? Sure, let's try that and see what happens:"
   ]
@ -1600,14 +1602,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "That did not look good. Here's what happened. The optimiser stepped in the correct direction, but it stepped so far that it totally overshot the minimum loss. Repeating that multiple times makes it get further and further away, not closer and closer!\n",
+    "That doesn't look good. Here's what happened. The optimizer stepped in the correct direction, but it stepped so far that it totally overshot the minimum loss. Repeating that multiple times makes it get further and further away, not closer and closer!\n",
    "\n",
-    "What do we do to find the perfect learning rate, not too high, and not too low? In 2015 the researcher Leslie Smith came up with a brilliant idea, called the *learning rate finder*. His idea was to start with a very very small learning rate, something so small that we would never expect it to be too big to handle. We use that for one mini batch, find what the losses are afterwards, and then increase the learning rate by some percentage (e.g. doubling it each time). Then we do another mini batch, track the loss, and double the learning rate again. We keep doing this until the loss gets worse, instead of better. This is the point where we know we have gone too far. We then select a learning rate a bit lower than this point. Our advice is to pick either:\n",
+    "What do we do to find the perfect learning rate--not too high, and not too low? In 2015 the researcher Leslie Smith came up with a brilliant idea, called the *learning rate finder*. His idea was to start with a very, very small learning rate, something so small that we would never expect it to be too big to handle. We use that for one mini-batch, find what the losses are afterwards, and then increase the learning rate by some percentage (e.g., doubling it each time). Then we do another mini-batch, track the loss, and double the learning rate again. We keep doing this until the loss gets worse, instead of better. This is the point where we know we have gone too far. We then select a learning rate a bit lower than this point. Our advice is to pick either:\n",
    "\n",
-    "- one order of magnitude less than where the minimum loss was achieved (i.e. the minimum divided by 10)\n",
-    "- the last point where the loss was clearly decreasing. \n",
+    "- One order of magnitude less than where the minimum loss was achieved (i.e., the minimum divided by 10)\n",
+    "- The last point where the loss was clearly decreasing \n",
    "\n",
-    "The Learning Rate Finder computes those points on the curve to help you. Both these rules usually give around the same value. In the first chapter, we didn't specify a learning rate, using the default value from the fastai library (which is 1e-3)."
+    "The learning rate finder computes those points on the curve to help you. Both these rules usually give around the same value. In the first chapter, we didn't specify a learning rate, using the default value from the fastai library (which is 1e-3):"
   ]
  },
  {
@ -1664,16 +1666,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can see on this plot that in the range 1e-6 to 1e-3, nothing really happens and the model doesn't train. Then the loss starts to decrease until it reaches a minimum and then increases again. We don't want a learning rate greater than 1e-1 as it will give a training that diverges (you can try for yourself) but 1e-1 is already too high: at this stage we left the period where the loss was decreasing steadily.\n",
+    "We can see on this plot that in the range 1e-6 to 1e-3, nothing really happens and the model doesn't train. Then the loss starts to decrease until it reaches a minimum, and then increases again. We don't want a learning rate greater than 1e-1 as it will give a training that diverges like the one before (you can try for yourself), but 1e-1 is already too high: at this stage we've left the period where the loss was decreasing steadily.\n",
    "\n",
-    "In this learning rate plot it appears that a learning rate around 3e-3 would be appropriate, so let's choose that."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "> Note: The learning rate finder plot has a logarithmic scale, which is why the middle point between 1e-3 and 1e-2 is between 3e-3 and 4e-3. This is because we care mostly about the order of magnitude of the learning rate."
+    "In this learning rate plot it appears that a learning rate around 3e-3 would be appropriate, so let's choose that:"
   ]
  },
  {
@ -1760,38 +1755,45 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Something really interesting about the learning rate finder is that it was only discovered in 2015. Neural networks have been under development since the 1950s. Throughout that time finding a good learning rate has been, perhaps, the most important and challenging issue for practitioners. The idea does not require any advanced maths, giant computing resources, huge datasets, or anything else that would make it inaccessible to any curious researcher. Furthermore, Leslie Smith, was not part of some exclusive Silicon Valley lab, but was working as a naval researcher. All of this is to say: breakthrough work in deep learning absolutely does not require access to vast resources, elite teams, or advanced mathematical ideas. There is lots of work still to be done which requires just a bit of common sense, creativity, and tenacity."
+    "> Note: Logarithmic Scale: The learning rate finder plot has a logarithmic scale, which is why the middle point between 1e-3 and 1e-2 is between 3e-3 and 4e-3. This is because we care mostly about the order of magnitude of the learning rate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now that we have a good learning rate to train our model, let's look at how we can finetune the weights of a pretrained model."
+    "It's interesting that the learning rate finder was only discovered in 2015, while neural networks have been under development since the 1950s. Throughout that time finding a good learning rate has been, perhaps, the most important and challenging issue for practitioners. The soltuon does not require any advanced maths, giant computing resources, huge datasets, or anything else that would make it inaccessible to any curious researcher. Furthermore, Leslie Smith, was not part of some exclusive Silicon Valley lab, but was working as a naval researcher. All of this is to say: breakthrough work in deep learning absolutely does not require access to vast resources, elite teams, or advanced mathematical ideas. There is lots of work still to be done that requires just a bit of common sense, creativity, and tenacity."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Unfreezing and transfer learning"
+    "Now that we have a good learning rate to train our model, let's look at how we can fine-tune the weights of a pretrained model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We discussed briefly in <<chapter_intro>> how transfer learning works. We saw that the basic idea is that a pretrained model, trained potentially on millions of data points (such as ImageNet), is fine tuned for some other task. But what does this really mean?\n",
+    "### Unfreezing and Transfer Learning"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We discussed briefly in <<chapter_intro>> how transfer learning works. We saw that the basic idea is that a pretrained model, trained potentially on millions of data points (such as ImageNet), is fine-tuned for some other task. But what does this really mean?\n",
    "\n",
-    "We now know that a convolutional neural network consists of many layers with a non-linear activation function between each and one or more final linear layers, with an activation function such as softmax at the very end. The final linear layer uses a matrix with enough columns such that the output size is the same as the number of classes in our model (assuming that we are doing classification).\n",
+    "We now know that a convolutional neural network consists of many linear layers with a nonlinear activation function between each pair, followed by one or more final linear layers with an activation function such as softmax at the very end. The final linear layer uses a matrix with enough columns such that the output size is the same as the number of classes in our model (assuming that we are doing classification).\n",
    "\n",
-    "This final linear layer is unlikely to be of any use for us, when we are fine tuning in a transfer learning setting, because it is specifically designed to classify the categories in the original pretraining dataset. So when we do transfer learning we remove it, throw it away, and replace it with a new linear layer with the correct number of outputs for our desired task (in this case, there would be 37 activations).\n",
+    "This final linear layer is unlikely to be of any use for us when we are fine-tuning in a transfer learning setting, because it is specifically designed to classify the categories in the original pretraining dataset. So when we do transfer learning we remove it, throw it away, and replace it with a new linear layer with the correct number of outputs for our desired task (in this case, there would be 37 activations).\n",
    "\n",
-    "This newly added linear layer will have entirely random weights. Therefore, our model prior to fine tuning has entirely random outputs. But that does not mean that it is an entirely random model! All of the layers prior to the last one have been carefully trained to be good at image classification tasks in general. As we saw in the images from the Zeiler and Fergus paper in <<chapter_intro>> (see <<img_layer1>> and followings), the first few layers encode very general concepts such as finding gradients and edges, and later layers encode concepts that are still very useful for us, such as finding eyeballs and fur.\n",
+    "This newly added linear layer will have entirely random weights. Therefore, our model prior to fine-tuning has entirely random outputs. But that does not mean that it is an entirely random model! All of the layers prior to the last one have been carefully trained to be good at image classification tasks in general. As we saw in the images from the [Zeiler and Fergus paper](https://arxiv.org/pdf/1311.2901.pdf) in <<chapter_intro>> (see <<img_layer1>> through <<img_layer4>>), the first few layers encode very general concepts, such as finding gradients and edges, and later layers encode concepts that are still very useful for us, such as finding eyeballs and fur.\n",
    "\n",
    "We want to train a model in such a way that we allow it to remember all of these generally useful ideas from the pretrained model, use them to solve our particular task (classify pet breeds), and only adjust them as required for the specifics of our particular task.\n",
    "\n",
-    "Our challenge when fine tuning is to replace the random weights in our added linear layers with weights that correctly achieve our desired task (classifying pet breeds) without breaking the carefully pretrained weights and the other layers. There is actually a very simple trick to allow this to happen: tell the optimiser to only update the weights in those randomly added final layers. Don't change the weights in the rest of the neural network at all. This is called *freezing* those pretrained layers."
+    "Our challenge when fine-tuning is to replace the random weights in our added linear layers with weights that correctly achieve our desired task (classifying pet breeds) without breaking the carefully pretrained weights and the other layers. There is actually a very simple trick to allow this to happen: tell the optimizer to only update the weights in those randomly added final layers. Don't change the weights in the rest of the neural network at all. This is called *freezing* those pretrained layers."
   ]
  },
  {
@ -1800,10 +1802,10 @@
   "source": [
    "When we create a model from a pretrained network fastai automatically freezes all of the pretrained layers for us. When we call the `fine_tune` method fastai does two things:\n",
    "\n",
-    "- train the randomly added layers for one epoch, with all other layers frozen ;\n",
-    "- unfreeze all of the layers, and train them all for the number of epochs requested.\n",
+    "- Trains the randomly added layers for one epoch, with all other layers frozen\n",
+    "- Unfreezes all of the layers, and trains them all for the number of epochs requested\n",
    "\n",
-    "Although this is a reasonable default approach, it is likely that for your particular dataset you may get better results by doing things slightly differently. The `fine_tune` method has a number of parameters you can use to change its behaviour, but it might be easiest for you to just call the underlying methods directly if you want to get some custom behavior. Remember that you can see the source code for the method by using the following syntax:\n",
+    "Although this is a reasonable default approach, it is likely that for your particular dataset you may get better results by doing things slightly differently. The `fine_tune` method has a number of parameters you can use to change its behavior, but it might be easiest for you to just call the underlying methods directly if you want to get some custom behavior. Remember that you can see the source code for the method by using the following syntax:\n",
    "\n",
    "    learn.fine_tune??\n",
    "\n",
@ -1879,7 +1881,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "And then we will unfreeze the model:"
+    "Then we'll unfreeze the model:"
   ]
  },
  {
@ -1895,7 +1897,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "...and run `lr_find` again, because having more layers to train, and weights that have already been trained for 3 epochs, means our previously found learning rate isn't appropriate any more:"
+    "and run `lr_find` again, because having more layers to train, and weights that have already been trained for three epochs, means our previously found learning rate isn't appropriate any more:"
   ]
  },
  {
@ -1944,7 +1946,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Note that the graph is a little different from when we had random weights: we don't have that sharp descent that indicates the model is training. That's because our model has been trained already. Here we have a somewhat flat area before a sharp increase, and we should take a point well before that sharp increase, for instance 1e-5. The point with the maximum gradient isn't what we look for here and should be ignored.\n",
+    "Note that the graph is a little different from when we had random weights: we don't have that sharp descent that indicates the model is training. That's because our model has been trained already. Here we have a somewhat flat area before a sharp increase, and we should take a point well before that sharp increase--for instance, 1e-5. The point with the maximum gradient isn't what we look for here and should be ignored.\n",
    "\n",
    "Let's train at a suitable learning rate:"
   ]
@ -2029,39 +2031,39 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This has improved our model a bit, but there's more we can do. The deepest layers of our pretrained model might not need as high a learning rate as the last ones, so we should probably use different learning rates for those, something called discriminative learning rates."
+    "This has improved our model a bit, but there's more we can do. The deepest layers of our pretrained model might not need as high a learning rate as the last ones, so we should probably use different learning rates for those--this is known as using *discriminative learning rates*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Discriminative learning rates"
+    "### Discriminative Learning Rates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Even after we unfreeze, we still care a lot about the quality of those pretrained weights. We would not expect that the best learning rate for those pretrained parameters would be as high as the randomly added parameters — even after we have tuned those randomly added parameters for a few epochs. Remember, the pretrained weights have been trained for hundreds of epochs, on millions of images.\n",
+    "Even after we unfreeze, we still care a lot about the quality of those pretrained weights. We would not expect that the best learning rate for those pretrained parameters would be as high as for the randomly added parameters, even after we have tuned those randomly added parameters for a few epochs. Remember, the pretrained weights have been trained for hundreds of epochs, on millions of images.\n",
    "\n",
-    "In addition, do you remember the images we saw in <<chapter_intro>>, showing what each layer learns? The first layer learns very simple foundations, like edge and gradient detectors; these are likely to be just as useful for nearly any task. The later layers learn much more complex concepts, like \"eye\" and \"sunset\", which might not be useful in your task at all (maybe you're classifying car models, for instance). So it makes sense to let the later layers fine-tune more quickly than earlier layers.\n",
+    "In addition, do you remember the images we saw in <<chapter_intro>>, showing what each layer learns? The first layer learns very simple foundations, like edge and gradient detectors; these are likely to be just as useful for nearly any task. The later layers learn much more complex concepts, like \"eye\" and \"sunset,\" which might not be useful in your task at all (maybe you're classifying car models, for instance). So it makes sense to let the later layers fine-tune more quickly than earlier layers.\n",
    "\n",
-    "Therefore, fastai by default does something called *discriminative learning rates*. This was originally developed in the ULMFiT approach to NLP transfer learning that we introduced in <<chapter_intro>>. Like many good ideas in deep learning, it is extremely simple: use a lower learning rate for the early layers of the neural network, and a higher learning rate for the later layers (and especially the randomly added layers). The idea is based on insights developed by Jason Yosinski, who showed in 2014 that when transfer learning different layers of a neural network should train at different speeds, as seen in <<yosinski>>."
+    "Therefore, fastai's default approach is to use discriminative learning rates. This was originally developed in the ULMFiT approach to NLP transfer learning that we will introduce in <<chapter_nlp>>. Like many good ideas in deep learning, it is extremely simple: use a lower learning rate for the early layers of the neural network, and a higher learning rate for the later layers (and especially the randomly added layers). The idea is based on insights developed by [Jason Yosinski](https://arxiv.org/abs/1411.1792), who showed in 2014 that with transfer learning different layers of a neural network should train at different speeds, as seen in <<yosinski>>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<img alt=\"Impact of different layers and training methods on transfer learning (Yosinski)\" width=\"680\" caption=\"Impact of different layers and training methods on transfer learning (curtesy of Jason Yosinski)\" id=\"yosinski\" src=\"images/att_00039.png\">"
+    "<img alt=\"Impact of different layers and training methods on transfer learning (Yosinski)\" width=\"680\" caption=\"Impact of different layers and training methods on transfer learning (courtesy of Jason Yosinski et al.)\" id=\"yosinski\" src=\"images/att_00039.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Fastai lets you pass a Python *slice* object anywhere that a learning rate is expected. The first value past will be the learning rate in the earliest layer of the neural network, and the second value will be the learning rate in the final layer. The layers in between will have learning rates that are multiplicatively equidistant throughout that range. Let's use this approach to replicate the previous training, but this time we'll only set the *lowest* layer of our net to a learning rate of `1e-6`; the other layers will scale up to `1e-4`. Let's train for a while and see what happens."
+    "Fastai lets you pass a Python `slice` object anywhere that a learning rate is expected. The first value passed will be the learning rate in the earliest layer of the neural network, and the second value will be the learning rate in the final layer. The layers in between will have learning rates that are multiplicatively equidistant throughout that range. Let's use this approach to replicate the previous training, but this time we'll only set the *lowest* layer of our net to a learning rate of 1e-6; the other layers will scale up to 1e-4. Let's train for a while and see what happens:"
   ]
  },
  {
@ -2234,7 +2236,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now the fine tuning is working great!\n",
+    "Now the fine-tuning is working great!\n",
    "\n",
    "Fastai can show us a graph of the training and validation loss:"
   ]
@ -2265,64 +2267,64 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As you can see, the training loss keeps getting better and better. But notice that eventually the validation loss improvement slows, and sometimes even gets worse! This is the point at which the model is starting to over fit. In particular, the model is becoming overconfident of its predictions. But this does *not* mean that it is getting less accurate, necessarily. Have a look at the table of training results per epoch, and you will often see that the accuracy continues improving, even as the validation loss gets worse. In the end what matters is your accuracy, or more generally your chosen metrics, not the loss. The loss is just the function we've given the computer to help us to optimise."
+    "As you can see, the training loss keeps getting better and better. But notice that eventually the validation loss improvement slows, and sometimes even gets worse! This is the point at which the model is starting to over fit. In particular, the model is becoming overconfident of its predictions. But this does *not* mean that it is getting less accurate, necessarily. Take a look at the table of training results per epoch, and you will often see that the accuracy continues improving, even as the validation loss gets worse. In the end what matters is your accuracy, or more generally your chosen metrics, not the loss. The loss is just the function we've given the computer to help us to optimize."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Another decision you have to make when training the model is for how long."
+    "Another decision you have to make when training the model is for how long to train for. We'll consider that next."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Selecting the number of epochs"
+    "### Selecting the Number of Epochs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Often you will find that you are limited by time, rather than generalisation and accuracy, when choosing how many epochs to train for. So your first approach to training should be to simply pick a number of epochs that will train in the amount of time that you are happy to wait for. Have a look at the training and validation loss plots, like showed above, and in particular your metrics, and if you see that they are still getting better even in your final epochs, then you know that you have not trained for too long.\n",
+    "Often you will find that you are limited by time, rather than generalization and accuracy, when choosing how many epochs to train for. So your first approach to training should be to simply pick a number of epochs that will train in the amount of time that you are happy to wait for. Then look at the training and validation loss plots, as shown above, and in particular your metrics, and if you see that they are still getting better even in your final epochs, then you know that you have not trained for too long.\n",
    "\n",
-    "On the other hand, you may well see that the metrics you have chosen are really getting worse at the end of training. Remember, it's not just that we're looking for the validation loss to get worse, but your actual metrics. Your validation loss will first of all during training get worse because it gets overconfident, and only later will get worse because it is incorrectly memorising the data. We only care in practice about the latter issue. Our loss function is just something, remember, that we used to allow our optimiser to have something it could differentiate and optimise; it's not actually the thing we care about in practice.\n",
+    "On the other hand, you may well see that the metrics you have chosen are really getting worse at the end of training. Remember, it's not just that we're looking for the validation loss to get worse, but the actual metrics. Your validation loss will first get worse during training because the model gets overconfident, and only later will get worse because it is incorrectly memorizing the data. We only care in practice about the latter issue. Remember, our loss function is just something that we use to allow our optimizer to have something it can differentiate and optimize; it's not actually the thing we care about in practice.\n",
    "\n",
-    "Before the days of 1cycle training it was very common to save the model at the end of each epoch, and then select whichever model had the best accuracy, out of all of the models saved in each epoch. This is known as *early stopping*. However, with one cycle training, it is very unlikely to give you the best answer, because those epochs in the middle occur before the learning rate has had a chance to reach the small values, where it can really find the best result. Therefore, if you find that you have overfit, what you should actually do is to retrain your model from scratch, and this time select a total number of epochs based on where your previous best results were found.\n",
+    "Before the days of 1cycle training it was very common to save the model at the end of each epoch, and then select whichever model had the best accuracy out of all of the models saved in each epoch. This is known as *early stopping*. However, this is very unlikely to give you the best answer, because those epochs in the middle occur before the learning rate has had a chance to reach the small values, where it can really find the best result. Therefore, if you find that you have overfit, what you should actually do is retrain your model from scratch, and this time select a total number of epochs based on where your previous best results were found.\n",
    "\n",
-    "If we've got the time to train for more epochs, we may want to instead use that time to train more parameters, that is use a deeper architecture."
+    "If you have the time to train for more epochs, you may want to instead use that time to train more parameters--that is, use a deeper architecture."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Deeper architectures"
+    "### Deeper Architectures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In general, a model with more parameters can model your data more accurately. (There are lots and lots of caveats to this generalisation, and it depends on the specifics of the architectures you are using, but it is a reasonable rule of thumb for now.) For most of the architectures that we will be seeing in this book you can create larger versions of them by simply adding more layers. However, since we want to use pretrained models, we need to make sure that we choose a number of layers that has been already pretrained for us.\n",
+    "In general, a model with more parameters can model your data more accurately. (There are lots and lots of caveats to this generalization, and it depends on the specifics of the architectures you are using, but it is a reasonable rule of thumb for now.) For most of the architectures that we will be seeing in this book, you can create larger versions of them by simply adding more layers. However, since we want to use pretrained models, we need to make sure that we choose a number of layers that have already been pretrained for us.\n",
    "\n",
-    "This is why, in practice, architectures tend to come in a small number of variants. For instance, the resnet architecture that we are using in this chapter comes in 18, 34, 50, 101, and 152 layer variants, pre-trained on ImageNet. A larger (more layers and parameters; sometimes described as the \"capacity\" of a model) version of a resnet will always be able to give us a better training loss, but it can suffer more from overfitting, because it has more parameters to over fit with.\n",
+    "This is why, in practice, architectures tend to come in a small number of variants. For instance, the ResNet architecture that we are using in this chapter comes in variants with 18, 34, 50, 101, and 152 layer, pretrained on ImageNet. A larger (more layers and parameters; sometimes described as the \"capacity\" of a model) version of a ResNet will always be able to give us a better training loss, but it can suffer more from overfitting, because it has more parameters to overfit with.\n",
    "\n",
-    "In general, a bigger model has the ability to better capture the real underlying relationships in your data, and also to capture and memorise the specific details of your individual images.\n",
+    "In general, a bigger model has the ability to better capture the real underlying relationships in your data, and also to capture and memorize the specific details of your individual images.\n",
    "\n",
-    "However, using a deeper model is going to require more GPU RAM, so we may need to lower the size of our batches to avoid *out-of-memory errors*. This happens when you try to fit too much inside your GPU and looks like:\n",
+    "However, using a deeper model is going to require more GPU RAM, so you may need to lower the size of your batches to avoid an *out-of-memory error*. This happens when you try to fit too much inside your GPU and looks like:\n",
    "\n",
    "```\n",
    "Cuda runtime error: out of memory\n",
    "```\n",
    "\n",
-    "You may have to restart your notebook when this happens, and the way to solve it is to use a smaller *batch size*, which means we will pass smaller groups of images at any given time through our model. We can pass the batch size we want to the call creating our `DataLoaders` with `bs=`.\n",
+    "You may have to restart your notebook when this happens. The way to solve it is to use a smaller batch size, which means passing smaller groups of images at any given time through your model. You can pass the batch size you want to the call creating your `DataLoaders` with `bs=`.\n",
    "\n",
-    "The other downside of deeper architectures is that they take quite a bit longer to train. One thing that can speed things up a lot is *mixed precision training*. This refers to using less precise numbers (*half precision floating point*, also called *fp16*) where possible during training. As we are writing these words (early 2020) nearly all current NVIDIA GPUs support a special feature called *tensor cores* which can dramatically (2x-3x) speed up neural network training. They also require a lot less GPU memory. To enable this feature in fastai, just add `to_fp16()` after your `Learner` creation (you also need to import the module).\n",
+    "The other downside of deeper architectures is that they take quite a bit longer to train. One technique that can speed things up a lot is *mixed-precision training*. This refers to using less-precise numbers (*half-precision floating point*, also called *fp16*) where possible during training. As we are writing these words in early 2020, nearly all current NVIDIA GPUs support a special feature called *tensor cores* that can dramatically speed up neural network training, by 2-3x. They also require a lot less GPU memory. To enable this feature in fastai, just add `to_fp16()` after your `Learner` creation (you also need to import the module).\n",
    "\n",
-    "You can't really know ahead of time what the best architecture for your particular problem is, until you try training some. So let's try a resnet 50 now with mixed precision:"
+    "You can't really know ahead of time what the best architecture for your particular problem is--you need to try training some. So let's try a ResNet-50 now with mixed precision:"
   ]
  },
  {
@ -2461,20 +2463,20 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Summary"
+    "## Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In this chapter we learned some important practical tips, both for getting our image data ready for modeling (presizing; data block summary) and for fitting the model (learning rate finder, unfreezing, discriminative learning rates, setting the number of epochs, and using deeper architectures). Using these tools will help you to build more accurate image models, more quickly.\n",
+    "In this chapter you learned some important practical tips, both for getting your image data ready for modeling (presizing, data block summary) and for fitting the model (learning rate finder, unfreezing, discriminative learning rates, setting the number of epochs, and using deeper architectures). Using these tools will help you to build more accurate image models, more quickly.\n",
    "\n",
-    "We also learned about cross entropy loss. This part of the book is worth spending plenty of time on. You aren't likely to need to actually implement cross entropy loss from scratch yourself in practice, but it's really important you understand the inputs to and output from that function, because it (or a variant of it, as we'll see in the next chapter) is used in nearly every classification model. So when you want to debug a model, or put a model in production, or improve the accuracy of a model, you're going to need to be able to look at its activations and loss, and understand what's going on, and why. You can't do that properly if you don't understand your loss function.\n",
+    "We also discussed cross-entropy loss. This part of the book is worth spending plenty of time on. You aren't likely to need to actually implement cross-entropy loss from scratch yourself in practice, but it's really important you understand the inputs to and output from that function, because it (or a variant of it, as we'll see in the next chapter) is used in nearly every classification model. So when you want to debug a model, or put a model in production, or improve the accuracy of a model, you're going to need to be able to look at its activations and loss, and understand what's going on, and why. You can't do that properly if you don't understand your loss function.\n",
    "\n",
-    "If cross entropy loss hasn't \"clicked\" for you just yet, don't worry--you'll get there! First, go back to the last chapter and make sure you really understand `mnist_loss`. Then work gradually through the cells of the notebook for this chapter, where we step through each piece of cross entropy loss. Make sure you understand what each calculation is doing, and why. Try creating some small tensors yourself and pass them into the functions, to see what they return.\n",
+    "If cross-entropy loss hasn't \"clicked\" for you just yet, don't worry--you'll get there! First, go back to the last chapter and make sure you really understand `mnist_loss`. Then work gradually through the cells of the notebook for this chapter, where we step through each piece of cross-entropy loss. Make sure you understand what each calculation is doing, and why. Try creating some small tensors yourself and pass them into the functions, to see what they return.\n",
    "\n",
-    "Remember: the choices made in cross entropy loss are not the only possible choices that could have been made. Just like when we looked at regression, we could choose between mean squared error and mean absolute difference (L1), we could change the details inside cross entropy loss too. If you have other ideas for possible functions that you think might work, feel free to give them a try in this chapter's notebook! (Fair warning though: you'll probably find that the model will be slower to train, and less accurate. That's because the gradient of cross entropy loss is proportional to the difference between the activation and the target, so SGD always gets a nicely scaled step for the weights.)"
+    "Remember: the choices made in the implementation of cross-entropy loss are not the only possible choices that could have been made. Just like when we looked at regression we could choose between mean squared error and mean absolute difference (L1). If you have other ideas for possible functions that you think might work, feel free to give them a try in this chapter's notebook! (Fair warning though: you'll probably find that the model will be slower to train, and less accurate. That's because the gradient of cross-entropy loss is proportional to the difference between the activation and the target, so SGD always gets a nicely scaled step for the weights.)"
   ]
  },
  {
@ -2489,35 +2491,35 @@
   "metadata": {},
   "source": [
    "1. Why do we first resize to a large size on the CPU, and then to a smaller size on the GPU?\n",
-    "1. If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book website for suggestions.\n",
+    "1. If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book's website for suggestions.\n",
    "1. What are the two ways in which data is most commonly provided, for most deep learning datasets?\n",
    "1. Look up the documentation for `L` and try using a few of the new methods is that it adds.\n",
-    "1. Look up the documentation for the Python pathlib module and try using a few methods of the Path class.\n",
+    "1. Look up the documentation for the Python `pathlib` module and try using a few methods of the `Path` class.\n",
    "1. Give two examples of ways that image transformations can degrade the quality of the data.\n",
-    "1. What method does fastai provide to view the data in a DataLoader?\n",
-    "1. What method does fastai provide to help you debug a DataBlock?\n",
+    "1. What method does fastai provide to view the data in a `DataLoaders`?\n",
+    "1. What method does fastai provide to help you debug a `DataBlock`?\n",
    "1. Should you hold off on training a model until you have thoroughly cleaned your data?\n",
-    "1. What are the two pieces that are combined into cross entropy loss in PyTorch?\n",
+    "1. What are the two pieces that are combined into cross-entropy loss in PyTorch?\n",
    "1. What are the two properties of activations that softmax ensures? Why is this important?\n",
    "1. When might you want your activations to not have these two properties?\n",
-    "1. Calculate the \"exp\" and \"softmax\" columns of <<bear_softmax>> yourself (i.e. in a spreadsheet, with a calculator, or in a notebook).\n",
-    "1. Why can't we use torch.where to create a loss function for datasets where our label can have more than two categories?\n",
+    "1. Calculate the `exp` and `softmax` columns of <<bear_softmax>> yourself (i.e., in a spreadsheet, with a calculator, or in a notebook).\n",
+    "1. Why can't we use `torch.where` to create a loss function for datasets where our label can have more than two categories?\n",
    "1. What is the value of log(-2)? Why?\n",
    "1. What are two good rules of thumb for picking a learning rate from the learning rate finder?\n",
-    "1. What two steps does the fine_tune method do?\n",
-    "1. In Jupyter notebook, how do you get the source code for a method or function?\n",
+    "1. What two steps does the `fine_tune` method do?\n",
+    "1. In Jupyter Notebook, how do you get the source code for a method or function?\n",
    "1. What are discriminative learning rates?\n",
-    "1. How is a Python slice object interpreted when passed as a learning rate to fastai?\n",
-    "1. Why is early stopping a poor choice when using one cycle training?\n",
-    "1. What is the difference between resnet 50 and resnet101?\n",
-    "1. What does to_fp16 do?"
+    "1. How is a Python `slice` object interpreted when passed as a learning rate to fastai?\n",
+    "1. Why is early stopping a poor choice when using 1cycle training?\n",
+    "1. What is the difference between `resnet50` and `resnet101`?\n",
+    "1. What does `to_fp16` do?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
@ -2525,7 +2527,7 @@
   "metadata": {},
   "source": [
    "1. Find the paper by Leslie Smith that introduced the learning rate finder, and read it.\n",
-    "1. See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Have a look on the forums and book website to see what other students have achieved with this dataset, and how they did it."
+    "1. See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Look on the forums and the book's website to see what other students have achieved with this dataset, and how they did it."
   ]
  },
  {
--- a/06_multicat.ipynb
+++ b/06_multicat.ipynb
--- a/07_sizing_and_tta.ipynb
+++ b/07_sizing_and_tta.ipynb
@ -21,18 +21,18 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Training a state-of-the-art model"
+    "# Training a State-of-the-Art Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This chapter introduces more advanced techniques for training an image classification model and get state-of-the-art results. You can skip it if you want to learn more about other applications of deep learning and come back to it later--nothing in this chapter will be assumed in later chapters.\n",
+    "This chapter introduces more advanced techniques for training an image classification model and getting state-of-the-art results. You can skip it if you want to learn more about other applications of deep learning and come back to it later--knowledge of this material will not be assumed in later chapters.\n",
    "\n",
-    "We will look at powerful data augmentation techniques, the *progressive resizing* approach and test time augmentation. To show all of this, we are going to train a model from scratch (not transfer learning) using a subset of ImageNet called [Imagenette](https://github.com/fastai/imagenette). It contains ten very different categories from the original ImageNet dataset, making for quicker training when we want to experiment.\n",
+    "We will look at what normalization is, a powerful data augmentation technique called mixup, the progressive resizing approach and test time augmentation. To show all of this, we are going to train a model from scratch (not using transfer learning) using a subset of ImageNet called [Imagenette](https://github.com/fastai/imagenette). It contains a subset of 10 very different categories from the original ImageNet dataset, making for quicker training when we want to experiment.\n",
    "\n",
-    "This is going to be much harder to do well than our previous datasets because we're using full-size, full-color images, which are photos of objects of different sizes, in different orientations, in different lighting, and so forth... So in this chapter we're going to introduce some important techniques for getting the most out of your dataset, especially when you're training from scratch, or transfer learning to a very different kind of dataset to what the pretrained model used."
+    "This is going to be much harder to do well than with our previous datasets because we're using full-size, full-color images, which are photos of objects of different sizes, in different orientations, in different lighting, and so forth. So, in this chapter we're going to introduce some important techniques for getting the most out of your dataset, especially when you're training from scratch, or using transfer learning to train a model on a very different kind of dataset than the pretrained model used."
   ]
  },
  {
@ -48,17 +48,17 @@
   "source": [
    "When fast.ai first started there were three main datasets that people used for building and testing computer vision models:\n",
    "\n",
-    "- *ImageNet*: 1.3 million images of various sizes around 500 pixels across, in 1000 categories, which took a few days to train\n",
-    "- *MNIST*: 50,000 28x28 pixel greyscale handwritten digits\n",
-    "- *CIFAR10*: 60,000 32x32 colour images in 10 classes\n",
+    "- ImageNet:: 1.3 million images of various sizes around 500 pixels across, in 1,000 categories, which took a few days to train\n",
+    "- MNIST:: 50,000 28\\*28-pixel grayscale handwritten digits\n",
+    "- CIFAR10:: 60,000 32\\*32-pixel color images in 10 classes\n",
    "\n",
-    "The problem is that the small datasets didn't actually generalise effectively to the large ImageNet dataset. The approaches that worked well on ImageNet generally had to be developed and trained on ImageNet. This led to many people believing that only researchers with access to giant computing resources could effectively contribute to developing image classification algorithms.\n",
+    "The problem was that the smaller datasets didn't actually generalize effectively to the large ImageNet dataset. The approaches that worked well on ImageNet generally had to be developed and trained on ImageNet. This led to many people believing that only researchers with access to giant computing resources could effectively contribute to developing image classification algorithms.\n",
    "\n",
-    "We thought that seemed very unlikely to be true. We had never actually seen a study that showed that ImageNet happen to be exactly the right size, and that other datasets could not be developed which would provide useful insights. So we thought we would try to create a new dataset which researchers could test their algorithms on quickly and cheaply, but which would also provide insights likely to work on the full ImageNet dataset.\n",
+    "We thought that seemed very unlikely to be true. We had never actually seen a study that showed that ImageNet happen to be exactly the right size, and that other datasets could not be developed which would provide useful insights. So we thought we would try to create a new dataset that researchers could test their algorithms on quickly and cheaply, but which would also provide insights likely to work on the full ImageNet dataset.\n",
    "\n",
-    "About three hours later we had created Imagenette. We selected 10 classes from the full ImageNet which look very different to each other. We hope that it would be possible to create a classifier that worked to recognise these classes quickly and cheaply. When we tried it out, we discovered we were right. We then tried out a few algorithmic tweaks to see how they impacted Imagenette, found some which worked pretty well, and tested them on ImageNet as well — we were very pleased to find that our tweaks worked well on ImageNet too!\n",
+    "About three hours later we had created Imagenette. We selected 10 classes from the full ImageNet that looked very different from one another. As we had hopep, we were able to quickly and cheaply create a classifier capable of recognizing these classes. We then tried out a few algorithmic tweaks to see how they impacted Imagenette. We found some that worked pretty well, and tested them on ImageNet as well—and we were very pleased to find that our tweaks worked well on ImageNet too!\n",
    "\n",
-    "There is an important message here: the dataset you get given is not necessarily the dataset you want; it's particularly unlikely to be the dataset that you want to do your development and prototyping in. You should aim to have an iteration speed of no more than a couple of minutes — that is, when you come up with a new idea you want to try out, you should be able to train a model and see how it goes within a couple of minutes. If it's taking longer to do an experiment, think about how you could cut down your dataset, or simplify your model, to improve your experimentation speed. The more experiments you can do, the better!\n",
+    "There is an important message here: the dataset you get given is not necessarily the dataset you want. It's particularly unlikely to be the dataset that you want to do your development and prototyping in. You should aim to have an iteration speed of no more than a couple of minutes—that is, when you come up with a new idea you want to try out, you should be able to train a model and see how it goes within a couple of minutes. If it's taking longer to do an experiment, think about how you could cut down your dataset, or simplify your model, to improve your experimentation speed. The more experiments you can do, the better!\n",
    "\n",
    "Let's get started with this dataset:"
   ]
@ -77,7 +77,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "First we'll get our dataset into a `DataLoaders` object, using the *presizing* trick we saw in <<chapter_pet_breeds>>:"
+    "First we'll get our dataset into a `DataLoaders` object, using the *presizing* trick introduced in <<chapter_pet_breeds>>:"
   ]
  },
  {
@ -98,7 +98,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    " ...and do a training that will serve as a baseline:"
+    "and do a training run that will serve as a baseline:"
   ]
  },
  {
@ -176,7 +176,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "That's a good baseline, since we are not using a pretrained model, but we can do better. When working with models that are being trained from scratch, or fine-tuned to a very different dataset to that used for the pretraining, there are additional techniques that are really important. In the rest of the chapter we'll consider some of the key approaches you'll want to be familiar with. The first one is normalizing your data."
+    "That's a good baseline, since we are not using a pretrained model, but we can do better. When working with models that are being trained from scratch, or fine-tuned to a very different dataset than the one used for the pretraining, there are some additional techniques that are really important. In the rest of the chapter we'll consider some of the key approaches you'll want to be familiar with. The first one is *normalizing* your data."
   ]
  },
  {
@ -190,7 +190,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "When training a model, it helps if your input data is normalized, that is, as a mean of 0 and a standard deviation of 1. But most images and computer vision libraries will use values between 0 and 255 for pixels, or between 0 and 1; in either case, your data is not going to have a mean of zero and a standard deviation of one.\n",
+    "When training a model, it helps if your input data is normalized--that is, has a mean of 0 and a standard deviation of 1. But most images and computer vision libraries use values between 0 and 255 for pixels, or between 0 and 1; in either case, your data is not going to have a mean of 0 and a standard deviation of 1.\n",
    "\n",
    "Let's grab a batch of our data and look at those values, by averaging over all axes except for the channel axis, which is axis 1:"
   ]
@ -221,9 +221,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As we expected, its mean and standard deviation is not very close to the desired values of zero and one. This is easy to do in fastai by adding the `Normalize` transform. This acts on a whole mini batch at once, so you can add it to the `batch_tfms` section of your data block. You need to pass to this transform the mean and standard deviation that you want to use; fastai comes with the standard ImageNet mean and standard deviation already defined. (If you do not pass any statistics to the Normalize transform, fastai will automatically calculate them from a single batch of your data.)\n",
+    "As we expected, the mean and standard deviation are not very close to the desired values. Fortunately, normalizing the data is easy to do in fastai by adding the `Normalize` transform. This acts on a whole mini-batch at once, so you can add it to the `batch_tfms` section of your data block. You need to pass to this transform the mean and standard deviation that you want to use; fastai comes with the standard ImageNet mean and standard deviation already defined. (If you do not pass any statistics to the `Normalize` transform, fastai will automatically calculate them from a single batch of your data.)\n",
    "\n",
-    "Let's add this transform (using `imagenet_stats` as Imagenette is a subset of ImageNet) and have a look at one batch now:"
+    "Let's add this transform (using `imagenet_stats` as Imagenette is a subset of ImageNet) and take a look at one batch now:"
   ]
  },
  {
@ -277,7 +277,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's check how normalization helps training our model here:"
+    "Let's check how what effet this had on training our model:"
   ]
  },
  {
@ -355,27 +355,27 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Although it only helped a little here, normalization becomes especially important when using pretrained models. The pretrained model only knows how to work with data of the type that it has seen before. If the average pixel was zero in the data it was trained with, but your data has zero as the minimum possible value of a pixel, then the model is going to be seeing something very different to what is intended! \n",
+    "Although it only helped a little here, normalization becomes especially important when using pretrained models. The pretrained model only knows how to work with data of the type that it has seen before. If the average pixel value was 0 in the data it was trained with, but your data has 0 as the minimum possible value of a pixel, then the model is going to be seeing something very different to what is intended! \n",
    "\n",
    "This means that when you distribute a model, you need to also distribute the statistics used for normalization, since anyone using it for inference, or transfer learning, will need to use the same statistics. By the same token, if you're using a model that someone else has trained, make sure you find out what normalization statistics they used, and match them.\n",
    "\n",
-    "We didn't have to handle normalization in previous chapters because when using a pretrained model through `cnn_learner`, the fastai library automatically adds the proper `Normalize` transform; the model has been pretrained with certain statistics in `Normalize` (usually coming from the ImageNet dataset), so the library can fill those for you. Note that this only applies with pretrained models, which is why we need to add it manually here, when training from scratch.\n",
+    "We didn't have to handle normalization in previous chapters because when using a pretrained model through `cnn_learner`, the fastai library automatically adds the proper `Normalize` transform; the model has been pretrained with certain statistics in `Normalize` (usually coming from the ImageNet dataset), so the library can fill those in for you. Note that this only applies with pretrained models, which is why we need to add this information manually here, when training from scratch.\n",
    "\n",
-    "All our training up until now have been done at size 224. We could have begun training at a smaller size before going to that. This is called *progressive resizing*."
+    "All our training up until now has been done at size 224. We could have begun training at a smaller size before going to that. This is called *progressive resizing*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Progressive resizing"
+    "## Progressive Resizing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "When fast.ai and its team of students [won the DAWNBench competition](https://www.theverge.com/2018/5/7/17316010/fast-ai-speed-test-stanford-dawnbench-google-intel), one of the most important innovations was something very simple: start training using small images, and end training using large images. By spending most of the epochs training with small images, training completed much faster. By completing training using large images, the final accuracy was much higher. We call this approach *progressive resizing*."
+    "When fast.ai and its team of students [won the DAWNBench competition](https://www.theverge.com/2018/5/7/17316010/fast-ai-speed-test-stanford-dawnbench-google-intel) in 2018, one of the most important innovations was something very simple: start training using small images, and end training using large images. Spending most of the epochs training with small images, helps training complete much faster. Completing training using large images makes the final accuracy much higher. We call this approach *progressive resizing*."
   ]
  },
  {
@ -389,15 +389,15 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As we have seen, the kinds of features that are learned by convolutional neural networks are not in any way specific to the size of the image — early layers find things like edges and gradients, and later layers may find things like noses and sunsets. So, when we change image size in the middle of training, it doesn't mean that we have to find totally different parameters for our model.\n",
+    "As we have seen, the kinds of features that are learned by convolutional neural networks are not in any way specific to the size of the image—early layers find things like edges and gradients, and later layers may find things like noses and sunsets. So, when we change image size in the middle of training, it doesn't mean that we have to find totally different parameters for our model.\n",
    "\n",
-    "But clearly there are some differences between small images and big ones, so we shouldn't expect our model to continue working exactly as well, with no changes at all. Does this remind you of something? When we developed this idea, it reminded us of transfer learning! We are trying to get our model to learn to do something a little bit different to what it has learned to do before. Therefore, we should be able to use the `fine_tune` method after we resize our images.\n",
+    "But clearly there are some differences between small images and big ones, so we shouldn't expect our model to continue working exactly as well, with no changes at all. Does this remind you of something? When we developed this idea, it reminded us of transfer learning! We are trying to get our model to learn to do something a little bit different from what it has learned to do before. Therefore, we should be able to use the `fine_tune` method after we resize our images.\n",
    "\n",
-    "There is an additional benefit to progressive resizing: it is another form of data augmentation. Therefore, you should expect to see better generalisation of your models that are trained with progressive resizing.\n",
+    "There is an additional benefit to progressive resizing: it is another form of data augmentation. Therefore, you should expect to see better generalization of your models that are trained with progressive resizing.\n",
    "\n",
-    "To implement progressive resizing it is most convenient if you first create a `get_dls` function which takes an image size and a batch size, and returns your `DataLoaders`:\n",
+    "To implement progressive resizing it is most convenient if you first create a `get_dls` function which takes an image size and a batch size as we did in the section before, and returns your `DataLoaders`:\n",
    "\n",
-    "Now you can create your `DataLoaders` with a small size, and `fit_one_cycle` in the usual way, for a few less epochs than you might otherwise do:"
+    "Now you can create your `DataLoaders` with a small size and use `fit_one_cycle` in the usual way, training for a few less epochs than you might otherwise do:"
   ]
  },
  {
@ -469,7 +469,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Then you can replace the DataLoaders inside the Learner, and `fine_tune`:"
+    "Then you can replace the `DataLoaders` inside the `Learner`, and fine-tune:"
   ]
  },
  {
@ -581,47 +581,47 @@
    "\n",
    "You can repeat the process of increasing size and training more epochs as many times as you like, for as big an image as you wish--but of course, you will not get any benefit by using an image size larger than the size of your images on disk.\n",
    "\n",
-    "Note that for transfer learning, progressive resizing may actually hurt performance. This would happen if your pretrained model was quite similar to your transfer learning task and dataset, and was trained on similar sized images, so the weights don't need to be changed much. In that case, training on smaller images may damage the pretrained weights.\n",
+    "Note that for transfer learning, progressive resizing may actually hurt performance. This is most likely to happen if your pretrained model was quite similar to your transfer learning task and dataset and was trained on similar-sized images, so the weights don't need to be changed much. In that case, training on smaller images may damage the pretrained weights.\n",
    "\n",
-    "On the other hand, if the transfer learning task is going to be on images that are of different sizes, shapes, or style to those used in the pretraining tasks, progressive resizing will probably help. As always, the answer to \"does it help?\" is \"try it!\".\n",
+    "On the other hand, if the transfer learning task is going to use images that are of different sizes, shapes, or styles than those used in the pretraining task, progressive resizing will probably help. As always, the answer to \"Will it help?\" is \"Try it!\"\n",
    "\n",
-    "Another thing we could try is applying data augmentation to the validation set: up until now, we have only applied it on the training set and the validation set always gets the same images. But maybe we could try to make predictions for a few augmented versions of the validation set and average them. This is called *test time augmentation*."
+    "Another thing we could try is applying data augmentation to the validation set. Up until now, we have only applied it on the training set; the validation set always gets the same images. But maybe we could try to make predictions for a few augmented versions of the validation set and average them. We'll consider this approach next."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Test time augmentation"
+    "## Test Time Augmentation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We have been using random cropping as a way to get some useful data augmentation, which leads to better generalisation, and results in a need for less training data. When we use random cropping, fastai will automatically use centre-cropping for the validation set — that is, it will select the largest square area it can in the centre of the image, such that it does not go past the image edges.\n",
+    "We have been using random cropping as a way to get some useful data augmentation, which leads to better generalization, and results in a need for less training data. When we use random cropping, fastai will automatically use center cropping for the validation set—that is, it will select the largest square area it can in the center of the image, without going past the image's edges.\n",
    "\n",
-    "This can often be problematic. For instance, in a multi-label dataset sometimes there are small objects towards the edges of an image; these could be entirely cropped out by the centre cropping. Even for datasets such as the pet breed classification data we're working on now, it's possible that some critical feature necessary for identifying the correct breed, such as the colour of the nose, could be cropped out.\n",
+    "This can often be problematic. For instance, in a multi-label dataset sometimes there are small objects toward the edges of an image; these could be entirely cropped out by center cropping. Even for problems such as our pet breed classification example, it's possible that some critical feature necessary for identifying the correct breed, such as the color of the nose, could be cropped out.\n",
    "\n",
-    "One solution to this is to avoid random cropping entirely. Instead, we could simply squish or stretch the rectangular images to fit into a square space. But then we miss out on a very useful data augmentation, and we also make the image recognition more difficult for our model, because it has to learn how to recognise squished and squeezed images, rather than just correctly proportioned images.\n",
+    "One solution to this problem is to avoid random cropping entirely. Instead, we could simply squish or stretch the rectangular images to fit into a square space. But then we miss out on a very useful data augmentation, and we also make the image recognition more difficult for our model, because it has to learn how to recognize squished and squeezed images, rather than just correctly proportioned images.\n",
    "\n",
-    "Another solution is to not just centre crop for validation, but instead to select a number of areas to crop from the original rectangular image, pass each of them through our model, and take the maximum or average of the predictions. In fact, we could do this not just for different crops, but for different values across all of our test time augmentation parameters. This is known as *test time augmentation* (TTA)."
+    "Another solution is to not just center crop for validation, but instead to select a number of areas to crop from the original rectangular image, pass each of them through our model, and take the maximum or average of the predictions. In fact, we could do this not just for different crops, but for different values across all of our test time augmentation parameters. This is known as *test time augmentation* (TTA)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "> jargon: test time augmentation (TTA): during inference or validation, creating multiple versions of each image, using data augmentation, and then taking the average or maximum of the predictions for each augmented version of the image"
+    "> jargon: test time augmentation (TTA): During inference or validation, creating multiple versions of each image, using data augmentation, and then taking the average or maximum of the predictions for each augmented version of the image."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Depending on the dataset, test time augmentation can result in dramatic improvements in accuracy. It does not change the time required to train at all, but will increase the amount of time for validation or inference by the number of test time augmented images requested. By default, fastai will use the unaugmented centre crop image, plus four randomly augmented images.\n",
+    "Depending on the dataset, test time augmentation can result in dramatic improvements in accuracy. It does not change the time required to train at all, but will increase the amount of time required for validation or inference by the number of test-time-augmented images requested. By default, fastai will use the unaugmented center crop image plus four randomly augmented images.\n",
    "\n",
-    "You can pass any DataLoader to fastai's `tta` method; by default, it will use your validation set:"
+    "You can pass any `DataLoader` to fastai's `tta` method; by default, it will use your validation set:"
   ]
  },
  {
@ -699,9 +699,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As we can see, using TTA gives us good a boost of performance, with no additional training required. However, it does make inference slower--if you're averaging 5 images for TTA, inference will be 5x slower.\n",
+    "As we can see, using TTA gives us good a boost in performance, with no additional training required. However, it does make inference slower--if you're averaging five images for TTA, inference will be five times slower.\n",
    "\n",
-    "Data augmentation helps train better models as we saw. Let's now focus on a new data augmentation technique called *Mixup*."
+    "We've seen examples of how data augmentation helps train better models. Let's now focus on a new data augmentation technique called *Mixup*."
   ]
  },
  {
@ -715,16 +715,16 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Mixup, introduced in the 2017 paper [mixup: Beyond Empirical Risk Minimization](https://arxiv.org/abs/1710.09412), is a very powerful data augmentation technique which can provide dramatically higher accuracy, especially when you don't have much data, and don't have a pretrained model that was trained on data similar to your dataset. The paper explains: \"While data augmentation consistently leads to improved generalization, the procedure is dataset-dependent, and thus requires the use of expert knowledge.\" For instance, it's common to flip images as part of data augmentation, but should you flip only horizontally, or also vertically? The answer is that it depends on your dataset. In addition, if flipping (for instance) doesn't provide enough data augmentation for you, you can't \"flip more\". It's helpful to have data augmentation techniques where you can \"dial up\" or \"dial down\" the amount of data augmentation, to see what works best for you.\n",
+    "Mixup, introduced in the 2017 paper [\"*mixup*: Beyond Empirical Risk Minimization\"](https://arxiv.org/abs/1710.09412) byHongyi Zhang et al., is a very powerful data augmentation technique that can provide dramatically higher accuracy, especially when you don't have much data and don't have a pretrained model that was trained on data similar to your dataset. The paper explains: \"While data augmentation consistently leads to improved generalization, the procedure is dataset-dependent, and thus requires the use of expert knowledge.\" For instance, it's common to flip images as part of data augmentation, but should you flip only horizontally, or also vertically? The answer is that it depends on your dataset. In addition, if flipping (for instance) doesn't provide enough data augmentation for you, you can't \"flip more.\" It's helpful to have data augmentation techniques where you can \"dial up\" or \"dial down\" the amount of change, to see what works best for you.\n",
    "\n",
    "Mixup works as follows, for each image:\n",
    "\n",
-    "1. Select another image from your dataset at random\n",
-    "1. Pick a weight at random\n",
-    "1. Take a weighted average (using the weight from step 2) of the selected image with your image; this will be your independent variable\n",
-    "1. Take a weighted average (with the same weight) of this image's labels with your image's labels; this will be your dependent variable\n",
+    "1. Select another image from your dataset at random.\n",
+    "1. Pick a weight at random.\n",
+    "1. Take a weighted average (using the weight from step 2) of the selected image with your image; this will be your independent variable.\n",
+    "1. Take a weighted average (with the same weight) of this image's labels with your image's labels; this will be your dependent variable.\n",
    "\n",
-    "In pseudo-code, we're doing (where `t` is the weight for our weighted average):\n",
+    "In pseudocode, we're doing this (where `t` is the weight for our weighted average):\n",
    "\n",
    "```\n",
    "image2,target2 = dataset[randint(0,len(dataset)]\n",
@ -733,7 +733,7 @@
    "new_target = t * target1 + (1-t) * target2\n",
    "```\n",
    "\n",
-    "For this to work, our targets need to be one-hot encoded. The paper describes this using these equations (where $\\lambda$ is the same as `t` in our code above):"
+    "For this to work, our targets need to be one-hot encoded. The paper describes this using the equations shown in <<mixup>> where $\\lambda$ is the same as `t` in our pseudocode:"
   ]
  },
  {
@ -747,16 +747,16 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: Papers and math"
+    "### Sidebar: Papers and Math"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We're going to be looking at more and more research papers from here on in the book. Now that you have the basic jargon, you might be surprised to discover how much of them you can understand, with a little practice! One issue you'll notice is that greek letters, such as $\\lambda$, appear in most papers. It's a very good idea to learn the names of all the greek letters, since otherwise it's very hard to read the papers to yourself, and remember them, and it's also hard to read code based on them (since code often uses the name of the greek letter spelled out, such as `lambda`).\n",
+    "We're going to be looking at more and more research papers from here on in the book. Now that you have the basic jargon, you might be surprised to discover how much of them you can understand, with a little practice! One issue you'll notice is that Greek letters, such as $\\lambda$, appear in most papers. It's a very good idea to learn the names of all the Greek letters, since otherwise it's very hard to read the papers to yourself, and remember them (or to read code based on them, since code often uses the names of the Greek letters spelled out, such as `lambda`).\n",
    "\n",
-    "The bigger issue with papers is that they use math, instead of code, to explain what's going on. If you don't have much of a math background, this will likely be intimidating and confusing at first. But remember: what is being shown in the math, is something that will be implemented in code. It's just another way of talking about the same thing! After reading a few papers, you'll pick up more and more of the notation. If you don't know what a symbol is, try looking it up on Wikipedia's [list of mathematical symbols](https://en.wikipedia.org/wiki/List_of_mathematical_symbols) or draw it on [Detexify](http://detexify.kirelabs.org/classify.html) which (using machine learning!) will find the name of your hard-drawn symbol. Then you can search online for that name to find out what it's for."
+    "The bigger issue with papers is that they use math, instead of code, to explain what's going on. If you don't have much of a math background, this will likely be intimidating and confusing at first. But remember: what is being shown in the math, is something that will be implemented in code. It's just another way of talking about the same thing! After reading a few papers, you'll pick up more and more of the notation. If you don't know what a symbol is, try looking it up in Wikipedia's [list of mathematical symbols](https://en.wikipedia.org/wiki/List_of_mathematical_symbols) or drawing it in [Detexify](http://detexify.kirelabs.org/classify.html), which (using machine learning!) will find the name of your hand-drawn symbol. Then you can search online for that name to find out what it's for."
   ]
  },
  {
@ -770,7 +770,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Here's what it looks like when we take a *linear combination* of images, as done in Mixup:"
+    "<<mixup_example>> shows what it looks like when we take a *linear combination* of images, as done in Mixup."
   ]
  },
  {
@ -815,11 +815,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The third image is built by adding 0.3 times the first one and 0.7 times the second. In this example, should the model predict church? gas station? The right answer is 30% church and 70% gas station since that's what we'll get if we take the linear combination of the one-hot encoded targets. For instance, if *church* has for index 2 and *gas station* as for index 7, the one-hot-encoded representations are\n",
+    "The third image is built by adding 0.3 times the first one and 0.7 times the second. In this example, should the model predict \"church\" or \"gas station\"? The right answer is 30% church and 70% gas station, since that's what we'll get if we take the linear combination of the one-hot-encoded targets. For instance, suppose we have 10 classes and \"church\" is represented by the index 2 and \"gas station\" is reprsented by the index 7, the one-hot-encoded representations are:\n",
    "```\n",
    "[0, 0, 1, 0, 0, 0, 0, 0, 0, 0] and [0, 0, 0, 0, 0, 0, 0, 1, 0, 0]\n",
    "```\n",
-    "(since we have ten classes in total) so our final target is\n",
+    "so our final target is:\n",
    "```\n",
    "[0, 0, 0.3, 0, 0, 0, 0, 0.7, 0, 0]\n",
    "```"
@ -829,9 +829,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This all done for us inside fastai by adding a `Callback` to our `Learner`. `Callback`s are what is used inside fastai to inject custom behavior in the training loop (like a learning rate schedule, or training in mixed precision). We'll be learning all about callbacks, including how to make your own, in <<chapter_accel_sgd>>. For now, all you need to know is that you use the `cbs` parameter to `Learner` to pass callbacks.\n",
+    "This all done for us inside fastai by adding a *callback* to our `Learner`. `Callback`s are what is used inside fastai to inject custom behavior in the training loop (like a learning rate schedule, or training in mixed precision). We'll be learning all about callbacks, including how to make your own, in <<chapter_accel_sgd>>. For now, all you need to know is that you use the `cbs` parameter to `Learner` to pass callbacks.\n",
    "\n",
-    "Here is how you train a model with Mixup:\n",
+    "Here is how we train a model with Mixup:\n",
    "\n",
    "```python\n",
    "model = xresnet50()\n",
@ -845,70 +845,70 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "So what happens if we train a model where our data is \"mixed up\" in this way? Clearly, it's going to be harder to train, because it's harder to see what's in each image. And the model has to predict two labels per image, rather than just one, as well as figuring out how much each one is weighted. Overfitting seems less likely to be a problem, because we're not showing the same image each epoch, but are instead showing a random combination of two images.\n",
+    "What happens when we train a model with data that's \"mixed up\" in this way? Clearly, it's going to be harder to train, because it's harder to see what's in each image. And the model has to predict two labels per image, rather than just one, as well as figuring out how much each one is weighted. Overfitting seems less likely to be a problem, however, because we're not showing the same image in each epoch, but are instead showing a random combination of two images.\n",
    "\n",
-    "Mixup requires far more epochs to train to a better accuracy, compared to other augmentation approaches we've seen. You can try training Imagenette with and without Mixup by using the `examples/train_imagenette.py` script in the fastai repo. At the time of writing, the leaderboard in the [Imagenette repo](https://github.com/fastai/imagenette/) is showing that mixup is used for all leading results for trainings of >80 epochs, and for few epochs Mixup is not being used. This is inline with our experience of using Mixup too.\n",
+    "Mixup requires far more epochs to train to get better accuracy, compared to other augmentation approaches we've seen. You can try training Imagenette with and without Mixup by using the *examples/train_imagenette.py* script in the [fastai repo](https://github.com/fastai/fastai). At the time of writing, the leaderboard in the [Imagenette repo](https://github.com/fastai/imagenette/) is showing that Mixup is used for all leading results for trainings of >80 epochs, and for fewer epochs Mixup is not being used. This is in line with our experience of using Mixup too.\n",
    "\n",
-    "One of the reasons that mixup is so exciting is that it can be applied to types of data other than photos. In fact, some people have even shown good results by using mixup on activations *inside* their model, not just on inputs--these allows Mixup to be used for NLP and other data types too.\n",
+    "One of the reasons that Mixup is so exciting is that it can be applied to types of data other than photos. In fact, some people have even shown good results by using Mixup on activations *inside* their models, not just on inputs--this allows Mixup to be used for NLP and other data types too.\n",
    "\n",
-    "There's another subtle issue that Mixup deals with for us, which is that it's not actually possible with the models we've seen before for our loss to ever be perfect. The problem is that our labels are ones and zeros, but softmax and sigmoid *never* can equal one or zero. So when we train our model, it causes it to push our activations ever closer to zero and one, such that the more epochs we do, the more extreme our activations become.\n",
+    "There's another subtle issue that Mixup deals with for us, which is that it's not actually possible with the models we've seen before for our loss to ever be perfect. The problem is that our labels are 1s and 0s, but the outputs of softmax and sigmoid can never equal 1 or 0. This means training our model pushes our activations ever closer to those values, such that the more epochs we do, the more extreme our activations become.\n",
    "\n",
-    "With Mixup, we no longer have that problem, because our labels will only be exactly one or zero if we happen to \"mix\" with another image of the same class. The rest of the time, our labels will be a linear combination, such as the 0.7 and 0.3 we got in the church and gas station example above.\n",
+    "With Mixup we no longer have that problem, because our labels will only be exactly 1 or 0 if we happen to \"mix\" with another image of the same class. The rest of the time our labels will be a linear combination, such as the 0.7 and 0.3 we got in the church and gas station example earlier.\n",
    "\n",
-    "One issue with this, however, is that Mixup is \"accidentally\" making the labels bigger than zero, or smaller than one. That is to say, we're not *explicitly* telling our model that we want to change the labels in this way. So if we want to change to make the labels closer, or further away, from zero and one, we have to change the amount of Mixup--which also changes the amount of data augmentation, which might not be what we want. There is, however, a way to handle this more directly, which is to use *label smoothing*."
+    "One issue with this, however, is that Mixup is \"accidentally\" making the labels bigger than 0, or smaller than 1. That is to say, we're not *explicitly* telling our model that we want to change the labels in this way. So, if we want to change to make the labels closer to, or further away from 0 and 1, we have to change the amount of Mixup--which also changes the amount of data augmentation, which might not be what we want. There is, however, a way to handle this more directly, which is to use *label smoothing*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Label smoothing"
+    "## Label Smoothing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In the theoretical expression of the loss, in classification problems, our targets are one-hot encoded (in practice we tend to avoid doing it to save memory, but what we compute is the same loss as if we had used one-hot encoding). That means the model is trained to return 0 for all categories but one, for which it is trained to return 1. Even 0.999 is not *good enough*, the model will get gradients and learn to predict activations that are even more confident. This encourages overfitting and gives you at inference time a model that is not going to give meaningful probabilities: it will always say 1 for the predicted category even if it's not too sure, just because it was trained this way.\n",
+    "In the theoretical expression of loss, in classification problems, our targets are one-hot encoded (in practice we tend to avoid doing this to save memory, but what we compute is the same loss as if we had used one-hot encoding). That means the model is trained to return 0 for all categories but one, for which it is trained to return 1. Even 0.999 is not \"good enough\", the model will get gradients and learn to predict activations with even higher confidence. This encourages overfitting and gives you at inference time a model that is not going to give meaningful probabilities: it will always say 1 for the predicted category even if it's not too sure, just because it was trained this way.\n",
    "\n",
-    "It can become very harmful if your data is not perfectly labeled. In the bear classifier we studied in <<chapter_production>>, we saw that some of the images were mislabeled, or contained two different kinds of bears. In general, your data will never be perfect. Even if the labels were manually produced by humans, they could make mistakes, or have differences of opinions on images harder to label.\n",
+    "This can become very harmful if your data is not perfectly labeled. In the bear classifier we studied in <<chapter_production>>, we saw that some of the images were mislabeled, or contained two different kinds of bears. In general, your data will never be perfect. Even if the labels were manually produced by humans, they could make mistakes, or have differences of opinions on images that are harder to label.\n",
    "\n",
-    "Instead, we could replace all our `1`s by a number a bit less than `1`, and our `0`s by a number a bit more than `0`, and then train. This is called *label smoothing*. By encouraging your model to be less confident, label smoothing will make your training more robust, even if there is mislabeled data, and will produce a model that generalizes better at inference.\n",
+    "Instead, we could replace all our 1s with a number a bit less than 1, and our 0s by a number a bit more than 0, and then train. This is called *label smoothing*. By encouraging your model to be less confident, label smoothing will make your training more robust, even if there is mislabeled data. The result will be a model that generalizes better.\n",
    "\n",
-    "This is how label smoothing works in practice: we start with one-hot encoded labels, then replace all zeros by $\\frac{\\epsilon}{N}$ (that's the greek letter *epsilon*, which is what was used in the [paper which introduced label smoothing](https://arxiv.org/abs/1512.00567), and is used in the fastai code) where $N$ is the number of classes and $\\epsilon$ is a parameter (usually 0.1, which would mean we are 10% unsure of our labels). Since you want the labels to add up to 1, replace the 1 by $1-\\epsilon + \\frac{\\epsilon}{N}$. This way, we don't encourage the model to predict something overconfident: in our Imagenette example where we have 10 classes, the targets become something like:\n",
+    "This is how label smoothing works in practice: we start with one-hot-encoded labels, then replace all 0s with $\\frac{\\epsilon}{N}$ (that's the Greek letter *epsilon*, which is what was used in the [paper that introduced label smoothing](https://arxiv.org/abs/1512.00567) and is used in the fastai code), where $N$ is the number of classes and $\\epsilon$ is a parameter (usually 0.1, which would mean we are 10% unsure of our labels). Since we want the labels to add up to 1, replace the 1 by $1-\\epsilon + \\frac{\\epsilon}{N}$. This way, we don't encourage the model to predict something overconfidently. In our Imagenette example where we have 10 classes, the targets become something like (here for a target that corresponds to the index 3):\n",
    "```\n",
    "[0.01, 0.01, 0.01, 0.91, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]\n",
    "```\n",
-    "(here for a target that corresponds to the index 3). In practice, we don't want to one-hot encode the labels, and fortunately we won't need too (the one-hot encoding is just good to explain what label smoothing is and visualize it)."
+    "In practice, we don't want to one-hot encode the labels, and fortunately we won't need to (the one-hot encoding is just good to explain what label smoothing is and visualize it)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: Label smoothing, the paper"
+    "### Sidebar: Label Smoothing, the Paper"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Here is how the reasoning behind label smoothing was explained in the paper:\n",
+    "Here is how the reasoning behind label smoothing was explained in the paper by Christian Szegedy et al.:\n",
    "\n",
-    "\"This maximum is not achievable for finite $z_k$ but is approached if $z_y\\gg z_k$ for all $k\\neq y$ -- that is, if the logit corresponding to the ground-truth label is much great than all other logits. This, however, can cause two problems. First, it may result in over-fitting: if the model learns to assign full probability to the ground-truth label for each training example, it is not guaranteed to generalize. Second, it encourages the differences between the largest logit and all others to become large, and this, combined with the bounded gradient $\\frac{\\partial\\ell}{\\partial z_k}$, reduces the ability of the model to adapt. Intuitively, this happens because the model becomes too confident about its predictions.\""
+    "> : This maximum is not achievable for finite $z_k$ but is approached if $z_y\\gg z_k$ for all $k\\neq y$--that is, if the logit corresponding to the ground-truth label is much great than all other logits. This, however, can cause two problems. First, it may result in over-fitting: if the model learns to assign full probability to the ground-truth label for each training example, it is not guaranteed to generalize. Second, it encourages the differences between the largest logit and all others to become large, and this, combined with the bounded gradient $\\frac{\\partial\\ell}{\\partial z_k}$, reduces the ability of the model to adapt. Intuitively, this happens because the model becomes too confident about its predictions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's practice our paper reading skills to try to interpret this. \"This maximum\" is refering to the previous section of the paper, which talked about the fact that `1` is the value of the label for the positive class. So any value (except infinity) can't result in `1` after sigmoid or softmax. In a paper, you won't normally see \"any value\" written, but instead it would get a symbol; in this case, it's $z_k$. This is helpful in a paper, because it can be refered to again later, and the reader knows what value is being discussed.\n",
+    "Let's practice our paper-reading skills to try to interpret this. \"This maximum\" is refering to the previous part of the paragraph, which talked about the fact that 1 is the value of the label for the positive class. So it's not possible for any value (except infinity) to result in 1 after sigmoid or softmax. In a paper, you won't normally see \"any value\" written; instead it will get a symbol, which in this case is $z_k$. This shorthand is helpful in a paper, because it can be refered to again later and the reader will know what value is being discussed.\n",
    "\n",
-    "Then it says: $z_y\\gg z_k$ for all $k\\neq y$. In this case, the paper immediately follows with \"that is...\", which is handy, because you can just read the English instead of the math. In the math, the $y$ is refering to the target ($y$ is defined earlier in the paper; sometimes it's hard to find where symbols are defined, but nearly all papers will define all their symbols somewhere), and $z_y$ is the activation corresponding to the target. So to get close to `1`, this activation needs to be much higher than all the others for that prediction.\n",
+    "Then it says \"if $z_y\\gg z_k$ for all $k\\neq y$.\" In this case, the paper immediately follows the math with an English description, which is handy because you can just read that. In the math, the $y$ is refering to the target ($y$ is defined earlier in the paper; sometimes it's hard to find where symbols are defined, but nearly all papers will define all their symbols somewhere), and $z_y$ is the activation corresponding to the target. So to get close to 1, this activation needs to be much higher than all the others for that prediction.\n",
    "\n",
-    "Next up is \"if the model learns to assign full probability to the ground-truth label for each training example, it is not guaranteed to generalize\". This is saying that making $z_y$ really big means we'll need large weights and large activations throughout our model. Large weights lead to \"bumpy\" functions, where a small change in input results in a big change to predictions. This is really bad for generalization, because it means just one pixel changing a bit could change our prediction entirely!\n",
+    "Next, consider the statement \"if the model learns to assign full probability to the ground-truth label for each training example, it is not guaranteed to generalize.\" This is saying that making $z_y$ really big means we'll need large weights and large activations throughout our model. Large weights lead to \"bumpy\" functions, where a small change in input results in a big change to predictions. This is really bad for generalization, because it means just one pixel changing a bit could change our prediction entirely!\n",
    "\n",
-    "Finally, we have \"it encourages the differences between the largest logit and all others to become large, and this, combined with the bounded gradient $\\frac{\\partial\\ell}{\\partial z_k}$, reduces the ability of the model to adapt\". The gradient of cross entropy, remember, is basically `output-target`, and both `output` and `target` are between zero and one. So the difference is between `-1` and `1`, which is why the paper says the gradient is \"bounded\" (it can't be infinite). Therefore our SGD steps are bounded too. \"Reduces the ability of the model to adapt\" means that it is hard for it to be updated in a transfer learning setting. This follows because the difference in loss due to incorrect predictions is unbounded, but we can only take a limited step each time."
+    "Finally, we have \"it encourages the differences between the largest logit and all others to become large, and this, combined with the bounded gradient $\\frac{\\partial\\ell}{\\partial z_k}$, reduces the ability of the model to adapt.\" The gradient of cross-entropy, remember, is basically `output - target`. Both `output` and `target` are between 0 and 1, so the difference is between `-1` and `1`, which is why the paper says the gradient is \"bounded\" (it can't be infinite). Therefore our SGD steps are bounded too. \"Reduces the ability of the model to adapt\" means that it is hard for it to be updated in a transfer learning setting. This follows because the difference in loss due to incorrect predictions is unbounded, but we can only take a limited step each time."
   ]
  },
  {
@ -922,7 +922,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "To use it in practice, we just have to change the loss function in our call to `Learner`:\n",
+    "To use this in practice, we just have to change the loss function in our call to `Learner`:\n",
    "\n",
    "```python\n",
    "model = xresnet50()\n",
@ -931,7 +931,7 @@
    "learn.fit_one_cycle(5, 3e-3)\n",
    "```\n",
    "\n",
-    "Like Mixup, you won't generally see significant improvements from label smoothing until you train more epochs. Try it yourself and see: how many epochs do you have to train before label smoothing shows an improvement?"
+    "Like with Mixup, you won't generally see significant improvements from label smoothing until you train more epochs. Try it yourself and see: how many epochs do you have to train before label smoothing shows an improvement?"
   ]
  },
  {
@ -949,7 +949,7 @@
    "\n",
    "Most importantly, remember that if your dataset is big, there is no point prototyping on the whole thing. Find a small subset that is representative of the whole, like we did with Imagenette, and experiment on it.\n",
    "\n",
-    "In the next three chapters, we will look at the other applications directly supported by fastai: collaborative filtering, tabular and text. We will go back to computer vision in the next section of the book, with a deep dive in convolutional neural networks in <<chapter_convolutions>>. "
+    "In the next three chapters, we will look at the other applications directly supported by fastai: collaborative filtering, tabular modeling and working with text. We will go back to computer vision in the next section of the book, with a deep dive into convolutional neural networks in <<chapter_convolutions>>. "
   ]
  },
  {
@ -972,23 +972,23 @@
    "1. Is using TTA at inference slower or faster than regular inference? Why?\n",
    "1. What is Mixup? How do you use it in fastai?\n",
    "1. Why does Mixup prevent the model from being too confident?\n",
-    "1. Why does a training with Mixup for 5 epochs end up worse than a training without Mixup?\n",
+    "1. Why does training with Mixup for five epochs end up worse than training without Mixup?\n",
    "1. What is the idea behind label smoothing?\n",
    "1. What problems in your data can label smoothing help with?\n",
-    "1. When using label smoothing with 5 categories, what is the target associated with the index 1?\n",
-    "1. What is the first step to take when you want to prototype quick experiments on a new dataset."
+    "1. When using label smoothing with five categories, what is the target associated with the index 1?\n",
+    "1. What is the first step to take when you want to prototype quick experiments on a new dataset?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research\n",
+    "### Further Research\n",
    "\n",
-    "1. Use the fastai documentation to build a function that crops an image to a square in the four corners, then implement a TTA method that averages the predictions on a center crop and those four crops. Did it help? Is it better than the TTA method of fastai?\n",
-    "1. Find the Mixup paper on arxiv and read it. Pick one or two more recent articles introducing variants of Mixup and read them, then try to implement them on your problem.\n",
-    "1. Find the script training Imagenette using Mixup and use it as an example to build a script for a long training on your own project. Execute it and see if it helped.\n",
-    "1. Read the sidebar on the math of label smoothing, and look at the relevant section of the original paper, and see if you can follow it. Don't be afraid to ask for help!"
+    "1. Use the fastai documentation to build a function that crops an image to a square in each of the four corners, then implement a TTA method that averages the predictions on a center crop and those four crops. Did it help? Is it better than the TTA method of fastai?\n",
+    "1. Find the Mixup paper on arXiv and read it. Pick one or two more recent articles introducing variants of Mixup and read them, then try to implement them on your problem.\n",
+    "1. Find the script training Imagenette using Mixup and use it as an example to build a script for a long training on your own project. Execute it and see if it helps.\n",
+    "1. Read the sidebar \"Label Smoothing, the Paper\", look at the relevant section of the original paper and see if you can follow it. Don't be afraid to ask for help!"
   ]
  },
  {
--- a/08_collab.ipynb
+++ b/08_collab.ipynb
@ -21,7 +21,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Collaborative filtering deep dive"
+    "# Collaborative Filtering Deep Dive"
   ]
  },
  {
@ -48,7 +48,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## A first look at the data"
+    "## A First Look at the Data"
   ]
  },
  {
@ -318,7 +318,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Learning the latent factors"
+    "## Learning the Latent Factors"
   ]
  },
  {
@ -816,7 +816,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Collaborative filtering from scratch"
+    "## Collaborative Filtering from Scratch"
   ]
  },
  {
@ -1214,7 +1214,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Weight decay"
+    "### Weight Decay"
   ]
  },
  {
@ -1272,7 +1272,7 @@
    "In practice though, it would be very inefficient (and maybe numerically unstable) to compute that big sum and add it to the loss. If you remember a little bit of high schoool math, you might recall that the derivative of `p**2` with respect to `p` is `2*p`, so adding that big sum to our loss is exactly the same as doing:\n",
    "\n",
    "``` python\n",
-    "weight.grad += wd * 2 * weight\n",
+    "parameters.grad += wd * 2 * parameters\n",
    "```\n",
    "\n",
    "In practice, since `wd` is a parameter that we choose, we can just make it twice as big, so we don't even need the `*2` in the above equation. To use weight decay in fastai, just pass `wd` in your call to fit:"
@ -1354,7 +1354,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating our own Embedding module"
+    "### Creating Our Own Embedding Module"
   ]
  },
  {
@ -1601,7 +1601,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Interpreting embeddings and biases"
+    "## Interpreting Embeddings and Biases"
   ]
  },
  {
@ -1903,7 +1903,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Embedding distance"
+    "### Embedding Distance"
   ]
  },
  {
@ -1950,7 +1950,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Boot strapping a collaborative filtering model"
+    "## Boot Strapping a Collaborative Filtering Model"
   ]
  },
  {
@ -1986,7 +1986,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Deep learning for collaborative filtering"
+    "## Deep Learning for Collaborative Filtering"
   ]
  },
  {
@ -2238,7 +2238,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: kwargs and delegates"
+    "### Sidebar: Kwargs and Delegates"
   ]
  },
  {
@ -2330,7 +2330,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research\n",
+    "### Further Research\n",
    "\n",
    "1. Take a look at all the differences between the `Embedding` version of `DotProductBias` and the `create_params` version, and try to understand why each of those changes is required. If you're not sure, try reverting each change, to see what happens. (NB: even the type of brackets used in `forward` has changed!)\n",
    "1. Find three other areas where collaborative filtering is being used, and find out what pros and cons of this approach in those areas.\n",
--- a/09_tabular.ipynb
+++ b/09_tabular.ipynb
@ -4,7 +4,7 @@
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
-    "hide_input": true
+    "hide_input": false
   },
   "outputs": [
    {
@ -41,7 +41,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Tabular modelling deep dive"
+    "# Tabular Modeling Deep Dive"
   ]
  },
  {
@ -57,7 +57,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Categorical embeddings"
+    "## Categorical Embeddings"
   ]
  },
  {
@ -187,7 +187,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Beyond deep learning"
+    "## Beyond Deep Learning"
   ]
  },
  {
@ -239,7 +239,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The dataset"
+    "## The Dataset"
   ]
  },
  {
@ -390,7 +390,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Look at the data"
+    "### Look at the Data"
   ]
  },
  {
@ -542,7 +542,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Decision trees"
+    "## Decision Trees"
   ]
  },
  {
@ -591,7 +591,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Handling dates"
+    "### Handling Dates"
   ]
  },
  {
@ -1405,7 +1405,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating the decision tree"
+    "### Creating the Decision Tree"
   ]
  },
  {
@ -7418,7 +7418,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Categorical variables"
+    "### Categorical Variables"
   ]
  },
  {
@ -7454,7 +7454,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Random forests"
+    "## Random Forests"
   ]
  },
  {
@ -7493,7 +7493,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating a random forest"
+    "### Creating a Random Forest"
   ]
  },
  {
@ -7662,7 +7662,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Out-of-bag error"
+    "### Out-of-Bag Error"
   ]
  },
  {
@ -7721,7 +7721,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Model interpretation"
+    "## Model Interpretation"
   ]
  },
  {
@ -7743,7 +7743,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Tree variance for prediction confidence"
+    "### Tree Variance for Prediction Confidence"
   ]
  },
  {
@ -7840,7 +7840,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Feature importance"
+    "### Feature Importance"
   ]
  },
  {
@ -8020,7 +8020,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Removing low-importance variables"
+    "### Removing Low-Importance Variables"
   ]
  },
  {
@ -8173,7 +8173,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Removing redundant features"
+    "### Removing Redundant Features"
   ]
  },
  {
@ -8396,14 +8396,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Tk add transition"
+    "Now that we know which variable influence the most our predictions, we can have a look at how they affect the results using partial dependence plots."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Partial dependence"
+    "### Partial Dependence"
   ]
  },
  {
@ -8526,7 +8526,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Data leakage"
+    "### Data Leakage"
   ]
  },
  {
@ -8570,7 +8570,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Tree interpreter"
+    "### Tree Interpreter"
   ]
  },
  {
@ -8716,7 +8716,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Extrapolation and neural networks"
+    "## Extrapolation and Neural Networks"
   ]
  },
  {
@ -8730,7 +8730,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The extrapolation problem"
+    "### The Extrapolation Problem"
   ]
  },
  {
@ -8890,7 +8890,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Finding out of domain data"
+    "### Finding out of Domain Data"
   ]
  },
  {
@ -9139,7 +9139,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Using a neural network"
+    "### Using a Neural Network"
   ]
  },
  {
@ -9560,7 +9560,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: fastai's Tabular classes"
+    "### Sidebar: fastai's Tabular Classes"
   ]
  },
  {
@ -9700,7 +9700,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Combining embeddings with other methods"
+    "### Combining Embeddings with Other Methods"
   ]
  },
  {
@ -9732,7 +9732,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Conclusion: our advice for tabular modeling"
+    "## Conclusion: Our Advice for Tabular Modeling"
   ]
  },
  {
@ -9802,7 +9802,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/10_nlp.ipynb
+++ b/10_nlp.ipynb
@ -22,7 +22,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# NLP deep dive: RNNs"
+    "# NLP Deep Dive: RNNs"
   ]
  },
  {
@ -74,7 +74,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Text preprocessing"
+    "## Text Preprocessing"
   ]
  },
  {
@ -142,7 +142,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Word tokenization with fastai"
+    "### Word Tokenization with fastai"
   ]
  },
  {
@ -395,7 +395,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Subword tokenization"
+    "### Subword Tokenization"
   ]
  },
  {
@ -732,7 +732,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Putting our texts into batches for a language model"
+    "### Putting Our Texts Into Batches for a Language Model"
   ]
  },
  {
@ -1271,7 +1271,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Training a text classifier"
+    "## Training a Text Classifier"
   ]
  },
  {
@ -1287,7 +1287,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Language model using DataBlock"
+    "### Language Model Using DataBlock"
   ]
  },
  {
@ -1380,7 +1380,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Fine tuning the language model"
+    "### Fine Tuning the Language Model"
   ]
  },
  {
@ -1478,7 +1478,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Saving and loading models"
+    "### Saving and Loading Models"
   ]
  },
  {
@ -1670,7 +1670,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Text generation"
+    "### Text Generation"
   ]
  },
  {
@ -1745,7 +1745,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating the classifier DataLoaders"
+    "### Creating the Classifier DataLoaders"
   ]
  },
  {
@ -1918,7 +1918,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Fine tuning the classifier"
+    "### Fine Tuning the Classifier"
   ]
  },
  {
@ -2136,7 +2136,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Disinformation and language models"
+    "## Disinformation and Language Models"
   ]
  },
  {
@ -2249,7 +2249,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/11_midlevel_data.ipynb
+++ b/11_midlevel_data.ipynb
@ -22,7 +22,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Data munging with fastai's mid-level API"
+    "# Data Munging With fastai's mid-Level API"
   ]
  },
  {
@ -36,7 +36,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Going deeper into fastai's layered API"
+    "## Going Deeper into fastai's Layered API"
   ]
  },
  {
@ -273,7 +273,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Writing your own Transform"
+    "### Writing Your Own Transform"
   ]
  },
  {
@ -480,7 +480,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## TfmdLists and Datasets: Transformed collections"
+    "## TfmdLists and Datasets: Transformed Collections"
   ]
  },
  {
@ -909,7 +909,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Applying the mid-tier data API: SiamesePair"
+    "## Applying the mid-Tier Data API: SiamesePair"
   ]
  },
  {
@ -1246,7 +1246,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
@ -1261,7 +1261,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Becoming a deep learning practitioner"
+    "## Becoming a Deep Learning Practitioner"
   ]
  },
  {
--- a/12_nlp_dive.ipynb
+++ b/12_nlp_dive.ipynb
@ -21,7 +21,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# A language model from scratch"
+    "# A Language Model from Scratch"
   ]
  },
  {
@ -37,7 +37,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The data"
+    "## The Data"
   ]
  },
  {
@ -255,7 +255,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Our first language model from scratch"
+    "## Our First Language Model from Scratch"
   ]
  },
  {
@ -350,7 +350,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Our language model in PyTorch"
+    "### Our Language Model in PyTorch"
   ]
  },
  {
@ -559,7 +559,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Our first recurrent neural network"
+    "### Our First Recurrent Neural Network"
   ]
  },
  {
@ -726,7 +726,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Maintaining the state of an RNN"
+    "### Maintaining the State of an RNN"
   ]
  },
  {
@ -990,7 +990,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating more signal"
+    "### Creating More Signal"
   ]
  },
  {
@ -1309,7 +1309,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The model"
+    "## The Model"
   ]
  },
  {
@ -1493,7 +1493,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Exploding or disappearing activations"
+    "### Exploding or Disappearing Activations"
   ]
  },
  {
@ -1552,7 +1552,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Building an LSTM from scratch"
+    "### Building an LSTM from Scratch"
   ]
  },
  {
@ -1702,7 +1702,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Training a language model using LSTMs"
+    "### Training a Language Model Using LSTMs"
   ]
  },
  {
@ -1977,7 +1977,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### AR and TAR regularization"
+    "### AR and TAR Regularization"
   ]
  },
  {
@ -2010,7 +2010,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Training a weight-tied regularized LSTM"
+    "### Training a Weight-Tied Regularized LSTM"
   ]
  },
  {
@ -2318,7 +2318,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/13_convolutions.ipynb
+++ b/13_convolutions.ipynb
@ -24,7 +24,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Convolutional neural networks"
+    "# Convolutional Neural Networks"
   ]
  },
  {
@ -40,7 +40,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The magic of convolutions"
+    "## The Magic of Convolutions"
   ]
  },
  {
@ -1378,7 +1378,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Mapping a convolution kernel"
+    "### Mapping a Convolution Kernel"
   ]
  },
  {
@ -1743,7 +1743,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Strides and padding"
+    "### Strides and Padding"
   ]
  },
  {
@ -1808,7 +1808,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Understanding the convolution equations"
+    "### Understanding the Convolution Equations"
   ]
  },
  {
@ -1929,7 +1929,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Our first convolutional neural network"
+    "## Our First Convolutional Neural Network"
   ]
  },
  {
@ -2292,7 +2292,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Understanding convolution arithmetic"
+    "### Understanding Convolution Arithmetic"
   ]
  },
  {
@ -2402,7 +2402,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Receptive fields"
+    "### Receptive Fields"
   ]
  },
  {
@ -2453,7 +2453,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### A note about Twitter"
+    "### A Note about Twitter"
   ]
  },
  {
@ -2536,7 +2536,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Colour images"
+    "## Colour Images"
   ]
  },
  {
@ -2687,7 +2687,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Improving training stability"
+    "## Improving Training Stability"
   ]
  },
  {
@ -2801,7 +2801,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### A simple baseline"
+    "### A Simple Baseline"
   ]
  },
  {
@ -3003,7 +3003,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Increase batch size"
+    "### Increase Batch Size"
   ]
  },
  {
@ -3103,7 +3103,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### 1cycle training"
+    "### 1cycle Training"
   ]
  },
  {
@ -3362,7 +3362,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Batch normalization"
+    "### Batch Normalization"
   ]
  },
  {
@ -3723,7 +3723,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/14_resnet.ipynb
+++ b/14_resnet.ipynb
@ -39,7 +39,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Going back to Imagenette"
+    "## Going Back to Imagenette"
   ]
  },
  {
@ -319,7 +319,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Building a modern CNN: ResNet"
+    "## Building a Modern CNN: ResNet"
   ]
  },
  {
@ -333,7 +333,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Skip-connections"
+    "### Skip-Connections"
   ]
  },
  {
@ -689,7 +689,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### A state-of-the-art ResNet"
+    "### A State-of-the-Art ResNet"
   ]
  },
  {
@ -935,7 +935,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Bottleneck layers"
+    "### Bottleneck Layers"
   ]
  },
  {
@ -1244,7 +1244,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/15_arch_details.ipynb
+++ b/15_arch_details.ipynb
@ -21,7 +21,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Application architectures deep dive"
+    "# Application Architectures Deep Dive"
   ]
  },
  {
@ -39,7 +39,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Computer vision"
+    "## Computer Vision"
   ]
  },
  {
@ -242,7 +242,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### A Siamese network"
+    "### A Siamese Network"
   ]
  },
  {
@ -579,7 +579,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Natural language processing"
+    "## Natural Language Processing"
   ]
  },
  {
@ -707,7 +707,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Wrapping up architectures"
+    "## Wrapping up Architectures"
   ]
  },
  {
@ -778,7 +778,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/16_accel_sgd.ipynb
+++ b/16_accel_sgd.ipynb
@ -23,7 +23,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# The training process"
+    "# The Training Process"
   ]
  },
  {
@ -47,7 +47,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Let's start with SGD"
+    "## Let's Start with SGD"
   ]
  },
  {
@ -305,7 +305,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## A generic optimizer"
+    "## A Generic Optimizer"
   ]
  },
  {
@ -871,7 +871,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Decoupled weight_decay"
+    "## Decoupled Weight Decay"
   ]
  },
  {
@ -1010,7 +1010,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating a callback"
+    "### Creating a Callback"
   ]
  },
  {
@ -1146,7 +1146,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Callback ordering and exceptions"
+    "### Callback Ordering and Exceptions"
   ]
  },
  {
@ -1269,7 +1269,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/17_foundations.ipynb
+++ b/17_foundations.ipynb
@ -23,7 +23,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# A neural net from the foundations"
+    "# A Neural Net from the Foundations"
   ]
  },
  {
@ -39,7 +39,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## A neural net layer from scratch"
+    "## A Neural Net Layer from Scratch"
   ]
  },
  {
@ -53,7 +53,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Modeling a neuron"
+    "### Modeling a Neuron"
   ]
  },
  {
@ -117,7 +117,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Matrix multiplication from scratch"
+    "### Matrix Multiplication from Scratch"
   ]
  },
  {
@ -242,8 +242,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "\n",
-    "### Elementwise arithmetic"
+    "### Elementwise Arithmetic"
   ]
  },
  {
@ -1091,7 +1090,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Einstein summation"
+    "### Einstein Summation"
   ]
  },
  {
@ -1182,7 +1181,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The forward and backward passes"
+    "## The Forward and Backward Passes"
   ]
  },
  {
@ -1196,7 +1195,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Defining and initializing a layer"
+    "### Defining and Initializing a Layer"
   ]
  },
  {
@ -1766,7 +1765,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Gradients and backward pass"
+    "### Gradients and Backward Pass"
   ]
  },
  {
@ -1971,7 +1970,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Refactor the model"
+    "### Refactor the Model"
   ]
  },
  {
@ -2421,7 +2420,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/18_CAM.ipynb
+++ b/18_CAM.ipynb
@ -23,7 +23,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# CNN interpretation with CAM"
+    "# CNN Interpretation with CAM"
   ]
  },
  {
@ -39,7 +39,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## CAM and hooks"
+    "## CAM and Hooks"
   ]
  },
  {
@ -633,7 +633,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/19_learner.ipynb
+++ b/19_learner.ipynb
@ -14,7 +14,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# fastai Learner from scratch"
+    "# fastai Learner from Scratch"
   ]
  },
  {
@ -1548,7 +1548,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Scheduling the learning rate"
+    "### Scheduling the Learning Rate"
   ]
  },
  {
@ -1861,7 +1861,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/20_conclusion.ipynb
+++ b/20_conclusion.ipynb
@ -11,7 +11,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Concluding thoughts"
+    "# Concluding Thoughts"
   ]
  },
  {
--- a/app_blog.ipynb
+++ b/app_blog.ipynb
@ -4,6 +4,7 @@
   "cell_type": "raw",
   "metadata": {},
   "source": [
+    "[[appendix_blog]]\n",
    "[appendix]\n",
    "[role=\"Creating a blog\"]"
   ]
@ -23,7 +24,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Creating a blog"
+    "# Creating a Blog"
   ]
  },
  {
@ -48,7 +49,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating the repository"
+    "### Creating the Repository"
   ]
  },
  {
@ -72,7 +73,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Setting up your homepage"
+    "### Setting up Your Homepage"
   ]
  },
  {
@ -136,7 +137,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating posts"
+    "### Creating Posts"
   ]
  },
  {
@ -226,7 +227,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Synchronizing GitHub and your computer"
+    "### Synchronizing GitHub and Your Computer"
   ]
  },
  {
@ -262,7 +263,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Jupyter for blogging"
+    "### Jupyter for Blogging"
   ]
  },
  {
--- a/clean/01_intro.ipynb
+++ b/clean/01_intro.ipynb
@ -14,70 +14,70 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Your deep learning journey"
+    "# Your Deep Learning Journey"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Deep learning is for everyone"
+    "## Deep Learning Is for Everyone"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Neural networks: a brief history"
+    "## Neural Networks: A Brief History"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Who we are"
+    "## Who We Are"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## How to learn deep learning"
+    "## How to Learn Deep Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Your projects and your mindset"
+    "### Your Projects and Your Mindset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The software: PyTorch, fastai, and Jupyter"
+    "## The Software: PyTorch, fastai, and Jupyter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Your first model"
+    "## Your First Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Getting a GPU deep learning server"
+    "### Getting a GPU Deep Learning Server"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Running your first notebook"
+    "### Running Your First Notebook"
   ]
  },
  {
@ -166,7 +166,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: This book was written in Jupyter Notebooks"
+    "### Sidebar: This Book Was Written in Jupyter Notebooks"
   ]
  },
  {
@ -291,7 +291,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### What is machine learning?"
+    "### What Is Machine Learning?"
   ]
  },
  {
@ -627,14 +627,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### What is a neural network?"
+    "### What Is a Neural Network?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### A bit of deep learning jargon"
+    "### A Bit of Deep Learning Jargon"
   ]
  },
  {
@ -757,53 +757,53 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Limitations inherent to machine learning\n",
+    "### Limitations Inherent To Machine Learning\n",
    "\n",
    "From this picture we can now see some fundamental things about training a deep learning model:\n",
    "\n",
-    "- A model cannot be created without data ;\n",
-    "- A model can only learn to operate on the patterns seen in the input data used to train it ;\n",
-    "- This learning approach only creates *predictions*, not recommended *actions* ;\n",
-    "- It's not enough to just have examples of input data; we need *labels* for that data too (e.g. pictures of dogs and cats aren't enough to train a model; we need a label for each one, saying which ones are dogs, and which are cats).\n",
+    "- A model cannot be created without data.\n",
+    "- A model can only learn to operate on the patterns seen in the input data used to train it.\n",
+    "- This learning approach only creates *predictions*, not recommended *actions*.\n",
+    "- It's not enough to just have examples of input data; we need *labels* for that data too (e.g., pictures of dogs and cats aren't enough to train a model; we need a label for each one, saying which ones are dogs, and which are cats).\n",
    "\n",
-    "Generally speaking, we've seen that most organizations that think they don't have enough data, actually mean they don't have enough *labeled* data. If any organization is interested in doing something in practice with a model, then presumably they have some inputs they plan to run their model against. And presumably they've been doing that some other way for a while (e.g. manually, or with some heuristic program), so they have data from those processes! For instance, a radiology practice will almost certainly have an archive of medical scans (since they need to be able to check how their patients are progressing over time), but those scans may not have structured labels containing a list of diagnoses or interventions (since radiologists generally create free text natural language reports, not structured data). We'll be discussing labeling approaches a lot in this book, since it's such an important issue in practice.\n",
+    "Generally speaking, we've seen that most organizations that say they don't have enough data, actually mean they don't have enough *labeled* data. If any organization is interested in doing something in practice with a model, then presumably they have some inputs they plan to run their model against. And presumably they've been doing that some other way for a while (e.g., manually, or with some heuristic program), so they have data from those processes! For instance, a radiology practice will almost certainly have an archive of medical scans (since they need to be able to check how their patients are progressing over time), but those scans may not have structured labels containing a list of diagnoses or interventions (since radiologists generally create free-text natural language reports, not structured data). We'll be discussing labeling approaches a lot in this book, because it's such an important issue in practice.\n",
    "\n",
-    "Since these kinds of machine learning models can only make *predictions* (i.e. attempt to replicate labels), this can result in a significant gap between organizational goals and model capabilities. For instance, in this book you'll learn how to create a *recommendation system* that can predict what products a user might purchase. This is often used in e-commerce, such as to customize products shown on a home page, by showing the highest-ranked items. But such a model is generally created by looking at a user and their buying history (*inputs*) and what they went on to buy or look at (*labels*), which means that the model is likely to tell you about products they already have, or already know about, rather than new products that they are most likely to be interested in hearing about. That's very different to what, say, an expert at your local bookseller might do, where they ask questions to figure out your taste, and then tell you about authors or series that you've never heard of before."
+    "Since these kinds of machine learning models can only make *predictions* (i.e., attempt to replicate labels), this can result in a significant gap between organizational goals and model capabilities. For instance, in this book you'll learn how to create a *recommendation system* that can predict what products a user might purchase. This is often used in e-commerce, such as to customize products shown on a home page by showing the highest-ranked items. But such a model is generally created by looking at a user and their buying history (*inputs*) and what they went on to buy or look at (*labels*), which means that the model is likely to tell you about products the user already has or already knows about, rather than new products that they are most likely to be interested in hearing about. That's very different to what, say, an expert at your local bookseller might do, where they ask questions to figure out your taste, and then tell you about authors or series that you've never heard of before."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### How our image recognizer works"
+    "### How Our Image Recognizer Works"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### What our image recognizer learned"
+    "### What Our Image Recognizer Learned"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Image recognizers can tackle non-image tasks"
+    "### Image Recognizers Can Tackle Non-Image Tasks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Jargon recap"
+    "### Jargon Recap"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Deep learning is not just for image classification"
+    "## Deep Learning Is Not Just for Image Classification"
   ]
  },
  {
@ -1114,7 +1114,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: The order matters"
+    "### Sidebar: The Order Matters"
   ]
  },
  {
@ -1441,7 +1441,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: Datasets: food for models"
+    "### Sidebar: Datasets: Food for Models"
   ]
  },
  {
@ -1455,14 +1455,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Validation sets and test sets"
+    "## Validation Sets and Test Sets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Use judgment in defining test sets"
+    "### Use Judgment in Defining Test Sets"
   ]
  },
  {
@ -1483,7 +1483,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "It can be hard to know in pages and pages of prose what are the key things you really need to focus on and remember. So we've prepared a list of questions and suggested steps to complete at the end of each chapter. All the answers are in the text of the chapter, so if you're not sure about anything here, re-read that part of the text and make sure you understand it. Answers to all these questions are also available on the [book website](https://book.fast.ai). You can also visit [the forums](https://forums.fast.ai) if you get stuck to get help from other folks studying this material."
+    "It can be hard to know in pages and pages of prose what the key things are that you really need to focus on and remember. So, we've prepared a list of questions and suggested steps to complete at the end of each chapter. All the answers are in the text of the chapter, so if you're not sure about anything here, reread that part of the text and make sure you understand it. Answers to all these questions are also available on the [book's website](https://book.fast.ai). You can also visit [the forums](https://forums.fast.ai) if you get stuck to get help from other folks studying this material."
   ]
  },
  {
@ -1491,33 +1491,35 @@
   "metadata": {},
   "source": [
    "1. Do you need these for deep learning?\n",
+    "\n",
    "   - Lots of math T / F\n",
    "   - Lots of data T / F\n",
    "   - Lots of expensive computers T / F\n",
    "   - A PhD T / F\n",
+    "   \n",
    "1. Name five areas where deep learning is now the best in the world.\n",
    "1. What was the name of the first device that was based on the principle of the artificial neuron?\n",
-    "1. Based on the book of the same name, what are the requirements for \"Parallel Distributed Processing\"?\n",
+    "1. Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?\n",
    "1. What were the two theoretical misunderstandings that held back the field of neural networks?\n",
    "1. What is a GPU?\n",
    "1. Open a notebook and execute a cell containing: `1+1`. What happens?\n",
    "1. Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.\n",
    "1. Complete the Jupyter Notebook online appendix.\n",
    "1. Why is it hard to use a traditional computer program to recognize images in a photo?\n",
-    "1. What did Samuel mean by \"Weight Assignment\"?\n",
-    "1. What term do we normally use in deep learning for what Samuel called \"Weights\"?\n",
-    "1. Draw a picture that summarizes Arthur Samuel's view of a machine learning model\n",
+    "1. What did Samuel mean by \"weight assignment\"?\n",
+    "1. What term do we normally use in deep learning for what Samuel called \"weights\"?\n",
+    "1. Draw a picture that summarizes Samuel's view of a machine learning model.\n",
    "1. Why is it hard to understand why a deep learning model makes a particular prediction?\n",
-    "1. What is the name of the theorem that a neural network can solve any mathematical problem to any level of accuracy?\n",
+    "1. What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy?\n",
    "1. What do you need in order to train a model?\n",
    "1. How could a feedback loop impact the rollout of a predictive policing model?\n",
-    "1. Do we always have to use 224x224 pixel images with the cat recognition model?\n",
+    "1. Do we always have to use 224\\*224-pixel images with the cat recognition model?\n",
    "1. What is the difference between classification and regression?\n",
    "1. What is a validation set? What is a test set? Why do we need them?\n",
    "1. What will fastai do if you don't provide a validation set?\n",
    "1. Can we always use a random sample for a validation set? Why or why not?\n",
    "1. What is overfitting? Provide an example.\n",
-    "1. What is a metric? How does it differ to \"loss\"?\n",
+    "1. What is a metric? How does it differ from \"loss\"?\n",
    "1. How can pretrained models help?\n",
    "1. What is the \"head\" of a model?\n",
    "1. What kinds of features do the early layers of a CNN find? How about the later layers?\n",
@ -1533,14 +1535,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Each chapter also has a \"further research\" with questions that aren't fully answered in the text, or include more advanced assignments. Answers to these questions aren't on the book website--you'll need to do your own research!"
+    "Each chapter also has a \"Further Research\" section that poses questions that aren't fully answered in the text, or gives more advanced assignments. Answers to these questions aren't on the book's website; you'll need to do your own research!"
   ]
  },
  {
@ -1548,8 +1550,15 @@
   "metadata": {},
   "source": [
    "1. Why is a GPU useful for deep learning? How is a CPU different, and why is it less effective for deep learning?\n",
-    "1. Try to think of three areas where feedback loops might impact use of machine learning. See if you can find documented examples of that happening in practice."
+    "1. Try to think of three areas where feedback loops might impact the use of machine learning. See if you can find documented examples of that happening in practice."
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
--- a/clean/02_production.ipynb
+++ b/clean/02_production.ipynb
@ -15,28 +15,28 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# From model to production"
+    "# From Model to Production"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The practice of deep learning"
+    "## The Practice of Deep Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Starting your project"
+    "### Starting Your Project"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The state of deep learning"
+    "### The State of Deep Learning"
   ]
  },
  {
@ -78,21 +78,28 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The Drivetrain approach"
+    "#### Other data types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Gathering data"
+    "### The Drivetrain Approach"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "To download images with Bing Image Search, you should sign up at Microsoft for *Bing Image Search*. You will be given a key, which you can either paste here, replacing \"XXX\":"
+    "## Gathering Data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To download images with Bing Image Search, sign up at Microsoft for a free account. You will be given a key, which you can copy and enter in a cell as follows (replacing 'XXX' with your key and executing it):"
   ]
  },
  {
@ -280,7 +287,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: Getting help in Jupyter notebooks"
+    "### Sidebar: Getting Help in Jupyter Notebooks"
   ]
  },
  {
@ -294,7 +301,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## From data to DataLoaders"
+    "## From Data to DataLoaders"
   ]
  },
  {
@ -306,7 +313,7 @@
    "bears = DataBlock(\n",
    "    blocks=(ImageBlock, CategoryBlock), \n",
    "    get_items=get_image_files, \n",
-    "    splitter=RandomSplitter(valid_pct=0.3, seed=42),\n",
+    "    splitter=RandomSplitter(valid_pct=0.2, seed=42),\n",
    "    get_y=parent_label,\n",
    "    item_tfms=Resize(128))"
   ]
@ -418,7 +425,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Data augmentation"
+    "### Data Augmentation"
   ]
  },
  {
@ -449,7 +456,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Training your model, and using it to clean your data"
+    "## Training Your Model, and Using It to Clean Your Data"
   ]
  },
  {
@ -673,14 +680,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Turning your model into an online application"
+    "## Turning Your Model into an Online Application"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Using the model for inference"
+    "### Using the Model for Inference"
   ]
  },
  {
@ -776,7 +783,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating a Notebook app from the model"
+    "### Creating a Notebook App from the Model"
   ]
  },
  {
@ -965,7 +972,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Turning your notebook into a real app"
+    "### Turning Your Notebook into a Real App"
   ]
  },
  {
@ -990,21 +997,21 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## How to avoid disaster"
+    "## How to Avoid Disaster"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Unforeseen consequences and feedback loops"
+    "### Unforeseen Consequences and Feedback Loops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Get writing!"
+    "## Get Writing!"
   ]
  },
  {
@ -1018,21 +1025,21 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "1. Provide an example of where the bear classification model might work poorly, due to structural or style differences to the training data.\n",
+    "1. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.\n",
    "1. Where do text models currently have a major deficiency?\n",
    "1. What are possible negative societal implications of text generation models?\n",
    "1. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?\n",
    "1. What kind of tabular data is deep learning particularly good at?\n",
    "1. What's a key downside of directly using a deep learning model for recommendation systems?\n",
-    "1. What are the steps of the Drivetrain approach?\n",
-    "1. How do the steps of the Drivetrain approach map to a recommendation system?\n",
+    "1. What are the steps of the Drivetrain Approach?\n",
+    "1. How do the steps of the Drivetrain Approach map to a recommendation system?\n",
    "1. Create an image recognition model using data you curate, and deploy it on the web.\n",
    "1. What is `DataLoaders`?\n",
    "1. What four things do we need to tell fastai to create `DataLoaders`?\n",
    "1. What does the `splitter` parameter to `DataBlock` do?\n",
    "1. How do we ensure a random split always gives the same validation set?\n",
    "1. What letters are often used to signify the independent and dependent variables?\n",
-    "1. What's the difference between crop, pad, and squish resize approaches? When might you choose one over the other?\n",
+    "1. What's the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?\n",
    "1. What is data augmentation? Why is it needed?\n",
    "1. What is the difference between `item_tfms` and `batch_tfms`?\n",
    "1. What is a confusion matrix?\n",
@ -1041,29 +1048,29 @@
    "1. What are IPython widgets?\n",
    "1. When might you want to use CPU for deployment? When might GPU be better?\n",
    "1. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?\n",
-    "1. What are 3 examples of problems that could occur when rolling out a bear warning system in practice?\n",
-    "1. What is \"out of domain data\"?\n",
+    "1. What are three examples of problems that could occur when rolling out a bear warning system in practice?\n",
+    "1. What is \"out-of-domain data\"?\n",
    "1. What is \"domain shift\"?\n",
-    "1. What are the 3 steps in the deployment process?\n",
-    "1. For a project you're interested in applying deep learning to, consider the thought experiment \"what would happen if it went really, really well?\"\n",
+    "1. What are the three steps in the deployment process?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Further Research"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "1. Consider how the Drivetrain Approach maps to a project or problem you're interested in.\n",
+    "1. When might it be best to avoid certain types of data augmentation?\n",
+    "1. For a project you're interested in applying deep learning to, consider the thought experiment \"What would happen if it went really, really well?\"\n",
    "1. Start a blog, and write your first blog post. For instance, write about what you think deep learning might be useful for in a domain you're interested in."
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Further research"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "1. Consider how the Drivetrain approach maps to a project or problem you're interested in.\n",
-    "1. When might it be best to avoid certain types of data augmentation?"
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": null,
--- a/clean/03_ethics.ipynb
+++ b/clean/03_ethics.ipynb
@ -11,7 +11,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: Acknowledgement: Dr Rachel Thomas"
+    "### Sidebar: Acknowledgement: Dr. Rachel Thomas"
   ]
  },
  {
@ -25,42 +25,42 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Key examples for data ethics"
+    "## Key Examples for Data Ethics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Bugs and recourse: Buggy algorithm used for healthcare benefits"
+    "### Bugs and Recourse: Buggy Algorithm Used for Healthcare Benefits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Feedback loops: YouTube's recommendation system"
+    "### Feedback Loops: YouTube's Recommendation System"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Bias: Professor Lantanya Sweeney \"arrested\""
+    "### Bias: Professor Lantanya Sweeney \"Arrested\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Why does this matter?"
+    "### Why Does This Matter?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Integrating machine learning with product design"
+    "## Integrating Machine Learning with Product Design"
   ]
  },
  {
@ -74,14 +74,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Recourse and accountability"
+    "### Recourse and Accountability"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Feedback loops"
+    "### Feedback Loops"
   ]
  },
  {
@ -109,77 +109,70 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Aggregation Bias"
+    "#### Aggregation bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Representation Bias"
+    "#### Representation bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Addressing different types of bias"
+    "### Addressing different types of bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Humans are biased, so does algorithmic bias matter?"
+    "### Disinformation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Disinformation"
+    "## Identifying and Addressing Ethical Issues"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Identifying and addressing ethical issues"
+    "### Analyze a Project You Are Working On"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Analyze a project you are working on"
+    "### Processes to Implement"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Processes to implement"
+    "#### Ethical lenses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Ethical Lenses"
+    "### The Power of Diversity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The power of diversity"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Fairness, accountability, and transparency"
+    "### Fairness, Accountability, and Transparency"
   ]
  },
  {
@ -193,21 +186,21 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The effectiveness of regulation"
+    "### The Effectiveness of Regulation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Rights and policy"
+    "### Rights and Policy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Cars: a historical precedent"
+    "### Cars: A Historical Precedent"
   ]
  },
  {
@ -230,16 +223,16 @@
   "source": [
    "1. Does ethics provide a list of \"right answers\"?\n",
    "1. How can working with people of different backgrounds help when considering ethical questions?\n",
-    "1. What was the role of IBM in Nazi Germany? Why did the company participate as they did? Why did the workers participate?\n",
-    "1. What was the role of the first person jailed in the VW diesel scandal?\n",
+    "1. What was the role of IBM in Nazi Germany? Why did the company participate as it did? Why did the workers participate?\n",
+    "1. What was the role of the first person jailed in the Volkswagen diesel scandal?\n",
    "1. What was the problem with a database of suspected gang members maintained by California law enforcement officials?\n",
-    "1. Why did YouTube's recommendation algorithm recommend videos of partially clothed children to pedophiles, even though no employee at Google programmed this feature?\n",
+    "1. Why did YouTube's recommendation algorithm recommend videos of partially clothed children to pedophiles, even though no employee at Google had programmed this feature?\n",
    "1. What are the problems with the centrality of metrics?\n",
-    "1. Why did Meetup.com not include gender in their recommendation system for tech meetups?\n",
+    "1. Why did Meetup.com not include gender in its recommendation system for tech meetups?\n",
    "1. What are the six types of bias in machine learning, according to Suresh and Guttag?\n",
    "1. Give two examples of historical race bias in the US.\n",
-    "1. Where are most images in Imagenet from?\n",
-    "1. In the paper \"Does Machine Learning Automate Moral Hazard and Error\" why is sinusitis found to be predictive of a stroke?\n",
+    "1. Where are most images in ImageNet from?\n",
+    "1. In the paper [\"Does Machine Learning Automate Moral Hazard and Error\"](https://scholar.harvard.edu/files/sendhil/files/aer.p20171084.pdf) why is sinusitis found to be predictive of a stroke?\n",
    "1. What is representation bias?\n",
    "1. How are machines and people different, in terms of their use for making decisions?\n",
    "1. Is disinformation the same as \"fake news\"?\n",
@ -252,7 +245,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research:"
+    "### Further Research:"
   ]
  },
  {
@ -260,12 +253,12 @@
   "metadata": {},
   "source": [
    "1. Read the article \"What Happens When an Algorithm Cuts Your Healthcare\". How could problems like this be avoided in the future?\n",
-    "1. Research to find out more about YouTube's recommendation system and its societal impacts. Do you think recommendation systems must always have feedback loops with negative results? What approaches could Google take? What about the government?\n",
-    "1. Read the paper \"Discrimination in Online Ad Delivery\". Do you think Google should be considered responsible for what happened to Dr Sweeney? What would be an appropriate response?\n",
+    "1. Research to find out more about YouTube's recommendation system and its societal impacts. Do you think recommendation systems must always have feedback loops with negative results? What approaches could Google take to avoid them? What about the government?\n",
+    "1. Read the paper [\"Discrimination in Online Ad Delivery\"](https://arxiv.org/abs/1301.6822). Do you think Google should be considered responsible for what happened to Dr. Sweeney? What would be an appropriate response?\n",
    "1. How can a cross-disciplinary team help avoid negative consequences?\n",
-    "1. Read the paper \"Does Machine Learning Automate Moral Hazard and Error\" in American Economic Review. What actions do you think should be taken to deal with the issues identified in this paper?\n",
+    "1. Read the paper \"Does Machine Learning Automate Moral Hazard and Error\". What actions do you think should be taken to deal with the issues identified in this paper?\n",
    "1. Read the article \"How Will We Prevent AI-Based Forgery?\" Do you think Etzioni's proposed approach could work? Why?\n",
-    "1. Complete the section \"Analyze a project you are working on\" in this chapter.\n",
+    "1. Complete the section \"Analyze a Project You Are Working On\" in this chapter.\n",
    "1. Consider whether your team could be more diverse. If so, what approaches might help?"
   ]
  },
@ -273,26 +266,26 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Section 1: that's a wrap!"
+    "## Section 1: That's a Wrap!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Congratulations! You've made it to the end of the first section of the book. In this section we've tried to show you what deep learning can do, and how you can use it to create real applications and products. At this point, you will get a lot more out of the book if you spend some time trying out what you've learnt. Perhaps you have already been doing this as you go along — in which case, great! But if not, that's no problem either… Now is a great time to start experimenting yourself.\n",
+    "Congratulations! You've made it to the end of the first section of the book. In this section we've tried to show you what deep learning can do, and how you can use it to create real applications and products. At this point, you will get a lot more out of the book if you spend some time trying out what you've learned. Perhaps you have already been doing this as you go along—in which case, great! If not, that's no problem either... Now is a great time to start experimenting yourself.\n",
    "\n",
-    "If you haven't been to the book website yet, head over there now. Remember, you can find it here: [book.fast.ai](https://book.fast.ai). It's really important that you have got yourself set up to run the notebooks. Becoming an effective deep learning practitioner is all about practice. So you need to be training models. So please go get the notebooks running now if you haven't already! And also have a look on the website for any important updates or notices; deep learning changes fast, and we can't change the words that are printed in this book, so the website is where you need to look to ensure you have the most up-to-date information.\n",
+    "If you haven't been to the [book's website](https://book.fast.ai) yet, head over there now. It's really important that you get yourself set up to run the notebooks. Becoming an effective deep learning practitioner is all about practice, so you need to be training models. So, please go get the notebooks running now if you haven't already! And also have a look on the website for any important updates or notices; deep learning changes fast, and we can't change the words that are printed in this book, so the website is where you need to look to ensure you have the most up-to-date information.\n",
    "\n",
    "Make sure that you have completed the following steps:\n",
    "\n",
-    "- Connected to one of the GPU Jupyter servers recommended on the book website\n",
-    "- Run the first notebook yourself\n",
-    "- Uploaded an image that you find in the first notebook; then try a few different images of different kinds to see what happens\n",
-    "- Run the second notebook, collecting your own dataset based on image search queries that you come up with\n",
-    "- Thought about how you can use deep learning to help you with your own projects, including what kinds of data you could use, what kinds of problems may come up, and how you might be able to mitigate these issues in practice.\n",
+    "- Connect to one of the GPU Jupyter servers recommended on the book's website.\n",
+    "- Run the first notebook yourself.\n",
+    "- Upload an image that you find in the first notebook; then try a few different images of different kinds to see what happens.\n",
+    "- Run the second notebook, collecting your own dataset based on image search queries that you come up with.\n",
+    "- Think about how you can use deep learning to help you with your own projects, including what kinds of data you could use, what kinds of problems may come up, and how you might be able to mitigate these issues in practice.\n",
    "\n",
-    "In the next section of the book we will learn about how and why deep learning works, instead of just seeing how we can use it in practice. Understanding the how and why is important for both practitioners and researchers, because in this fairly new field nearly every project requires some level of customisation and debugging. The better you understand the foundations of deep learning, the better your models will be. These foundations are less important for executives, product managers, and so forth (although still useful, so feel free to keep reading!), but they are critical for anybody who is actually training and deploying models themselves."
+    "In the next section of the book you will learn about how and why deep learning works, instead of just seeing how you can use it in practice. Understanding the how and why is important for both practitioners and researchers, because in this fairly new field nearly every project requires some level of customization and debugging. The better you understand the foundations of deep learning, the better your models will be. These foundations are less important for executives, product managers, and so forth (although still useful, so feel free to keep reading!), but they are critical for anybody who is actually training and deploying models themselves."
   ]
  },
  {
--- a/clean/04_mnist_basics.ipynb
+++ b/clean/04_mnist_basics.ipynb
@ -17,21 +17,21 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Under the hood: training a digit classifier"
+    "# Under the Hood: Training a Digit Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Pixels: the foundations of computer vision"
+    "## Pixels: The Foundations of Computer Vision"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Sidebar: Tenacity and deep learning"
+    "## Sidebar: Tenacity and Deep Learning"
   ]
  },
  {
@ -1249,7 +1249,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## First try: pixel similarity"
+    "## First Try: Pixel Similarity"
   ]
  },
  {
@ -1495,7 +1495,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### NumPy arrays and PyTorch tensors"
+    "### NumPy Arrays and PyTorch Tensors"
   ]
  },
  {
@ -1677,7 +1677,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Computing metrics using broadcasting"
+    "## Computing Metrics Using Broadcasting"
   ]
  },
  {
@ -2039,7 +2039,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The gradient"
+    "### Calculating Gradients"
   ]
  },
  {
@ -2170,14 +2170,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Stepping with a learning rate"
+    "### Stepping With a Learning Rate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### An end-to-end SGD example"
+    "### An End-to-End SGD Example"
   ]
  },
  {
@ -2243,6 +2243,13 @@
    "def mse(preds, targets): return ((preds-targets)**2).mean()"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Step 1: Initialize the parameters"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -2262,6 +2269,13 @@
    "orig_params = params.clone()"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Step 2: Calculate the predictions"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -2306,6 +2320,13 @@
    "show_preds(preds)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Step 3: Calculate the loss"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -2327,6 +2348,13 @@
    "loss"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Step 4: Calculate the gradients"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -2388,6 +2416,13 @@
    "params"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Step 5: Step the weights. "
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -2458,6 +2493,13 @@
    "    return preds"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Step 6: Repeat the process "
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -2522,7 +2564,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Summarizing gradient descent"
+    "#### Step 7: stop"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Summarizing Gradient Descent"
   ]
  },
  {
@ -2642,7 +2691,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## MNIST loss function"
+    "## The MNIST Loss Function"
   ]
  },
  {
@ -2993,7 +3042,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### SGD and mini-batches"
+    "### SGD and Mini-Batches"
   ]
  },
  {
@ -3070,7 +3119,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Putting it all together"
+    "## Putting It All Together"
   ]
  },
  {
@ -3411,7 +3460,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating an optimizer"
+    "### Creating an Optimizer"
   ]
  },
  {
@ -3677,7 +3726,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Adding a non-linearity"
+    "## Adding a Nonlinearity"
   ]
  },
  {
@ -4106,6 +4155,13 @@
    "learn.recorder.values[-1][2]"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Going Deeper"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -4154,14 +4210,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Jargon recap"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### _Choose Your Own Adventure_ reminder"
+    "## Jargon Recap"
   ]
  },
  {
@ -4175,20 +4224,20 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "1. How is a greyscale image represented on a computer? How about a color image?\n",
+    "1. How is a grayscale image represented on a computer? How about a color image?\n",
    "1. How are the files and folders in the `MNIST_SAMPLE` dataset structured? Why?\n",
    "1. Explain how the \"pixel similarity\" approach to classifying digits works.\n",
    "1. What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.\n",
-    "1. What is a \"rank 3 tensor\"?\n",
+    "1. What is a \"rank-3 tensor\"?\n",
    "1. What is the difference between tensor rank and shape? How do you get the rank from the shape?\n",
    "1. What are RMSE and L1 norm?\n",
    "1. How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?\n",
-    "1. Create a 3x3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom right 4 numbers.\n",
+    "1. Create a 3\\*3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom-right four numbers.\n",
    "1. What is broadcasting?\n",
    "1. Are metrics generally calculated using the training set, or the validation set? Why?\n",
    "1. What is SGD?\n",
-    "1. Why does SGD use mini batches?\n",
-    "1. What are the 7 steps in SGD for machine learning?\n",
+    "1. Why does SGD use mini-batches?\n",
+    "1. What are the seven steps in SGD for machine learning?\n",
    "1. How do we initialize the weights in a model?\n",
    "1. What is \"loss\"?\n",
    "1. Why can't we always use a high learning rate?\n",
@ -4196,18 +4245,18 @@
    "1. Do you need to know how to calculate gradients yourself?\n",
    "1. Why can't we use accuracy as a loss function?\n",
    "1. Draw the sigmoid function. What is special about its shape?\n",
-    "1. What is the difference between loss and metric?\n",
+    "1. What is the difference between a loss function and a metric?\n",
    "1. What is the function to calculate new weights using a learning rate?\n",
    "1. What does the `DataLoader` class do?\n",
-    "1. Write pseudo-code showing the basic steps taken each epoch for SGD.\n",
-    "1. Create a function which, if passed two arguments `[1,2,3,4]` and `'abcd'`, returns `[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]`. What is special about that output data structure?\n",
+    "1. Write pseudocode showing the basic steps taken in each epoch for SGD.\n",
+    "1. Create a function that, if passed two arguments `[1,2,3,4]` and `'abcd'`, returns `[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]`. What is special about that output data structure?\n",
    "1. What does `view` do in PyTorch?\n",
    "1. What are the \"bias\" parameters in a neural network? Why do we need them?\n",
-    "1. What does the `@` operator do in python?\n",
+    "1. What does the `@` operator do in Python?\n",
    "1. What does the `backward` method do?\n",
    "1. Why do we have to zero the gradients?\n",
    "1. What information do we have to pass to `Learner`?\n",
-    "1. Show python or pseudo-code for the basic steps of a training loop.\n",
+    "1. Show Python or pseudocode for the basic steps of a training loop.\n",
    "1. What is \"ReLU\"? Draw a plot of it for values from `-2` to `+2`.\n",
    "1. What is an \"activation function\"?\n",
    "1. What's the difference between `F.relu` and `nn.ReLU`?\n",
@ -4218,7 +4267,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
@ -4226,7 +4275,7 @@
   "metadata": {},
   "source": [
    "1. Create your own implementation of `Learner` from scratch, based on the training loop shown in this chapter.\n",
-    "1. Complete all the steps in this chapter using the full MNIST datasets (that is, for all digits, not just threes and sevens). This is a significant project and will take you quite a bit of time to complete! You'll need to do some of your own research to figure out how to overcome some obstacles you'll meet on the way."
+    "1. Complete all the steps in this chapter using the full MNIST datasets (that is, for all digits, not just 3s and 7s). This is a significant project and will take you quite a bit of time to complete! You'll need to do some of your own research to figure out how to overcome some obstacles you'll meet on the way."
   ]
  },
  {
--- a/clean/05_pet_breeds.ipynb
+++ b/clean/05_pet_breeds.ipynb
@ -14,14 +14,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Image classification"
+    "# Image Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## From dogs and cats, to pet breeds"
+    "## From Dogs and Cats to Pet Breeds"
   ]
  },
  {
@ -139,7 +139,7 @@
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
-    "hide_input": true
+    "hide_input": false
   },
   "outputs": [
    {
@ -182,7 +182,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Checking and debugging a DataBlock"
+    "### Checking and Debugging a DataBlock"
   ]
  },
  {
@ -373,14 +373,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Cross entropy loss"
+    "## Cross-Entropy Loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Viewing activations and labels"
+    "### Viewing Activations and Labels"
   ]
  },
  {
@ -606,7 +606,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Log likelihood"
+    "### Log Likelihood"
   ]
  },
  {
@ -782,7 +782,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Taking the `log`"
+    "### Taking the Log"
   ]
  },
  {
@ -944,14 +944,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Improving our model"
+    "## Improving Our Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Learning rate finder"
+    "### The Learning Rate Finder"
   ]
  },
  {
@ -1161,7 +1161,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Unfreezing and transfer learning"
+    "### Unfreezing and Transfer Learning"
   ]
  },
  {
@ -1360,7 +1360,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Discriminative learning rates"
+    "### Discriminative Learning Rates"
   ]
  },
  {
@ -1555,14 +1555,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Selecting the number of epochs"
+    "### Selecting the Number of Epochs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Deeper architectures"
+    "### Deeper Architectures"
   ]
  },
  {
@ -1692,7 +1692,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Summary"
+    "## Conclusion"
   ]
  },
  {
@ -1707,35 +1707,35 @@
   "metadata": {},
   "source": [
    "1. Why do we first resize to a large size on the CPU, and then to a smaller size on the GPU?\n",
-    "1. If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book website for suggestions.\n",
+    "1. If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book's website for suggestions.\n",
    "1. What are the two ways in which data is most commonly provided, for most deep learning datasets?\n",
    "1. Look up the documentation for `L` and try using a few of the new methods is that it adds.\n",
-    "1. Look up the documentation for the Python pathlib module and try using a few methods of the Path class.\n",
+    "1. Look up the documentation for the Python `pathlib` module and try using a few methods of the `Path` class.\n",
    "1. Give two examples of ways that image transformations can degrade the quality of the data.\n",
-    "1. What method does fastai provide to view the data in a DataLoader?\n",
-    "1. What method does fastai provide to help you debug a DataBlock?\n",
+    "1. What method does fastai provide to view the data in a `DataLoaders`?\n",
+    "1. What method does fastai provide to help you debug a `DataBlock`?\n",
    "1. Should you hold off on training a model until you have thoroughly cleaned your data?\n",
-    "1. What are the two pieces that are combined into cross entropy loss in PyTorch?\n",
+    "1. What are the two pieces that are combined into cross-entropy loss in PyTorch?\n",
    "1. What are the two properties of activations that softmax ensures? Why is this important?\n",
    "1. When might you want your activations to not have these two properties?\n",
-    "1. Calculate the \"exp\" and \"softmax\" columns of <<bear_softmax>> yourself (i.e. in a spreadsheet, with a calculator, or in a notebook).\n",
-    "1. Why can't we use torch.where to create a loss function for datasets where our label can have more than two categories?\n",
+    "1. Calculate the `exp` and `softmax` columns of <<bear_softmax>> yourself (i.e., in a spreadsheet, with a calculator, or in a notebook).\n",
+    "1. Why can't we use `torch.where` to create a loss function for datasets where our label can have more than two categories?\n",
    "1. What is the value of log(-2)? Why?\n",
    "1. What are two good rules of thumb for picking a learning rate from the learning rate finder?\n",
-    "1. What two steps does the fine_tune method do?\n",
-    "1. In Jupyter notebook, how do you get the source code for a method or function?\n",
+    "1. What two steps does the `fine_tune` method do?\n",
+    "1. In Jupyter Notebook, how do you get the source code for a method or function?\n",
    "1. What are discriminative learning rates?\n",
-    "1. How is a Python slice object interpreted when passed as a learning rate to fastai?\n",
-    "1. Why is early stopping a poor choice when using one cycle training?\n",
-    "1. What is the difference between resnet 50 and resnet101?\n",
-    "1. What does to_fp16 do?"
+    "1. How is a Python `slice` object interpreted when passed as a learning rate to fastai?\n",
+    "1. Why is early stopping a poor choice when using 1cycle training?\n",
+    "1. What is the difference between `resnet50` and `resnet101`?\n",
+    "1. What does `to_fp16` do?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
@ -1743,7 +1743,7 @@
   "metadata": {},
   "source": [
    "1. Find the paper by Leslie Smith that introduced the learning rate finder, and read it.\n",
-    "1. See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Have a look on the forums and book website to see what other students have achieved with this dataset, and how they did it."
+    "1. See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Look on the forums and the book's website to see what other students have achieved with this dataset, and how they did it."
   ]
  },
  {
--- a/clean/06_multicat.ipynb
+++ b/clean/06_multicat.ipynb
--- a/clean/07_sizing_and_tta.ipynb
+++ b/clean/07_sizing_and_tta.ipynb
@ -14,7 +14,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Training a state-of-the-art model"
+    "# Training a State-of-the-Art Model"
   ]
  },
  {
@ -270,7 +270,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Progressive resizing"
+    "## Progressive Resizing"
   ]
  },
  {
@ -443,7 +443,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Test time augmentation"
+    "## Test Time Augmentation"
   ]
  },
  {
@ -528,7 +528,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: Papers and math"
+    "### Sidebar: Papers and Math"
   ]
  },
  {
@ -576,14 +576,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Label smoothing"
+    "## Label Smoothing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: Label smoothing, the paper"
+    "### Sidebar: Label Smoothing, the Paper"
   ]
  },
  {
@ -620,23 +620,23 @@
    "1. Is using TTA at inference slower or faster than regular inference? Why?\n",
    "1. What is Mixup? How do you use it in fastai?\n",
    "1. Why does Mixup prevent the model from being too confident?\n",
-    "1. Why does a training with Mixup for 5 epochs end up worse than a training without Mixup?\n",
+    "1. Why does training with Mixup for five epochs end up worse than training without Mixup?\n",
    "1. What is the idea behind label smoothing?\n",
    "1. What problems in your data can label smoothing help with?\n",
-    "1. When using label smoothing with 5 categories, what is the target associated with the index 1?\n",
-    "1. What is the first step to take when you want to prototype quick experiments on a new dataset."
+    "1. When using label smoothing with five categories, what is the target associated with the index 1?\n",
+    "1. What is the first step to take when you want to prototype quick experiments on a new dataset?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research\n",
+    "### Further Research\n",
    "\n",
-    "1. Use the fastai documentation to build a function that crops an image to a square in the four corners, then implement a TTA method that averages the predictions on a center crop and those four crops. Did it help? Is it better than the TTA method of fastai?\n",
-    "1. Find the Mixup paper on arxiv and read it. Pick one or two more recent articles introducing variants of Mixup and read them, then try to implement them on your problem.\n",
-    "1. Find the script training Imagenette using Mixup and use it as an example to build a script for a long training on your own project. Execute it and see if it helped.\n",
-    "1. Read the sidebar on the math of label smoothing, and look at the relevant section of the original paper, and see if you can follow it. Don't be afraid to ask for help!"
+    "1. Use the fastai documentation to build a function that crops an image to a square in each of the four corners, then implement a TTA method that averages the predictions on a center crop and those four crops. Did it help? Is it better than the TTA method of fastai?\n",
+    "1. Find the Mixup paper on arXiv and read it. Pick one or two more recent articles introducing variants of Mixup and read them, then try to implement them on your problem.\n",
+    "1. Find the script training Imagenette using Mixup and use it as an example to build a script for a long training on your own project. Execute it and see if it helps.\n",
+    "1. Read the sidebar \"Label Smoothing, the Paper\", look at the relevant section of the original paper and see if you can follow it. Don't be afraid to ask for help!"
   ]
  },
  {
--- a/clean/08_collab.ipynb
+++ b/clean/08_collab.ipynb
@ -14,14 +14,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Collaborative filtering deep dive"
+    "# Collaborative Filtering Deep Dive"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## A first look at the data"
+    "## A First Look at the Data"
   ]
  },
  {
@ -198,7 +198,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Learning the latent factors"
+    "## Learning the Latent Factors"
   ]
  },
  {
@ -587,7 +587,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Collaborative filtering from scratch"
+    "## Collaborative Filtering from Scratch"
   ]
  },
  {
@ -907,7 +907,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Weight decay"
+    "### Weight Decay"
   ]
  },
  {
@ -1009,7 +1009,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating our own Embedding module"
+    "### Creating Our Own Embedding Module"
   ]
  },
  {
@ -1207,7 +1207,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Interpreting embeddings and biases"
+    "## Interpreting Embeddings and Biases"
   ]
  },
  {
@ -1433,7 +1433,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Embedding distance"
+    "### Embedding Distance"
   ]
  },
  {
@ -1464,14 +1464,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Boot strapping a collaborative filtering model"
+    "## Boot Strapping a Collaborative Filtering Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Deep learning for collaborative filtering"
+    "## Deep Learning for Collaborative Filtering"
   ]
  },
  {
@ -1670,7 +1670,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: kwargs and delegates"
+    "### Sidebar: Kwargs and Delegates"
   ]
  },
  {
@ -1735,7 +1735,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research\n",
+    "### Further Research\n",
    "\n",
    "1. Take a look at all the differences between the `Embedding` version of `DotProductBias` and the `create_params` version, and try to understand why each of those changes is required. If you're not sure, try reverting each change, to see what happens. (NB: even the type of brackets used in `forward` has changed!)\n",
    "1. Find three other areas where collaborative filtering is being used, and find out what pros and cons of this approach in those areas.\n",
--- a/clean/09_tabular.ipynb
+++ b/clean/09_tabular.ipynb
@ -4,7 +4,7 @@
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
-    "hide_input": true
+    "hide_input": false
   },
   "outputs": [
    {
@ -34,28 +34,28 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Tabular modelling deep dive"
+    "# Tabular Modeling Deep Dive"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Categorical embeddings"
+    "## Categorical Embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Beyond deep learning"
+    "## Beyond Deep Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The dataset"
+    "## The Dataset"
   ]
  },
  {
@ -147,7 +147,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Look at the data"
+    "### Look at the Data"
   ]
  },
  {
@ -253,14 +253,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Decision trees"
+    "## Decision Trees"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Handling dates"
+    "### Handling Dates"
   ]
  },
  {
@ -945,7 +945,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating the decision tree"
+    "### Creating the Decision Tree"
   ]
  },
  {
@ -6841,14 +6841,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Categorical variables"
+    "### Categorical Variables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Random forests"
+    "## Random Forests"
   ]
  },
  {
@ -6865,7 +6865,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating a random forest"
+    "### Creating a Random Forest"
   ]
  },
  {
@ -6965,7 +6965,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Out-of-bag error"
+    "### Out-of-Bag Error"
   ]
  },
  {
@ -6992,14 +6992,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Model interpretation"
+    "## Model Interpretation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Tree variance for prediction confidence"
+    "### Tree Variance for Prediction Confidence"
   ]
  },
  {
@ -7064,7 +7064,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Feature importance"
+    "### Feature Importance"
   ]
  },
  {
@ -7216,7 +7216,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Removing low-importance variables"
+    "### Removing Low-Importance Variables"
   ]
  },
  {
@ -7325,7 +7325,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Removing redundant features"
+    "### Removing Redundant Features"
   ]
  },
  {
@ -7490,7 +7490,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Partial dependence"
+    "### Partial Dependence"
   ]
  },
  {
@ -7569,14 +7569,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Data leakage"
+    "### Data Leakage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Tree interpreter"
+    "### Tree Interpreter"
   ]
  },
  {
@ -7658,14 +7658,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Extrapolation and neural networks"
+    "## Extrapolation and Neural Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### The extrapolation problem"
+    "### The Extrapolation Problem"
   ]
  },
  {
@ -7779,7 +7779,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Finding out of domain data"
+    "### Finding out of Domain Data"
   ]
  },
  {
@ -7978,7 +7978,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Using a neural network"
+    "### Using a Neural Network"
   ]
  },
  {
@ -8297,7 +8297,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Sidebar: fastai's Tabular classes"
+    "### Sidebar: fastai's Tabular Classes"
   ]
  },
  {
@ -8355,14 +8355,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Combining embeddings with other methods"
+    "### Combining Embeddings with Other Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Conclusion: our advice for tabular modeling"
+    "## Conclusion: Our Advice for Tabular Modeling"
   ]
  },
  {
@ -8415,7 +8415,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/10_nlp.ipynb
+++ b/clean/10_nlp.ipynb
@ -15,14 +15,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# NLP deep dive: RNNs"
+    "# NLP Deep Dive: RNNs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Text preprocessing"
+    "## Text Preprocessing"
   ]
  },
  {
@ -36,7 +36,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Word tokenization with fastai"
+    "### Word Tokenization with fastai"
   ]
  },
  {
@ -186,7 +186,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Subword tokenization"
+    "### Subword Tokenization"
   ]
  },
  {
@ -412,7 +412,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Putting our texts into batches for a language model"
+    "### Putting Our Texts Into Batches for a Language Model"
   ]
  },
  {
@ -849,14 +849,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Training a text classifier"
+    "## Training a Text Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Language model using DataBlock"
+    "### Language Model Using DataBlock"
   ]
  },
  {
@ -919,7 +919,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Fine tuning the language model"
+    "### Fine Tuning the Language Model"
   ]
  },
  {
@ -980,7 +980,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Saving and loading models"
+    "### Saving and Loading Models"
   ]
  },
  {
@ -1130,7 +1130,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Text generation"
+    "### Text Generation"
   ]
  },
  {
@ -1189,7 +1189,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating the classifier DataLoaders"
+    "### Creating the Classifier DataLoaders"
   ]
  },
  {
@ -1305,7 +1305,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Fine tuning the classifier"
+    "### Fine Tuning the Classifier"
   ]
  },
  {
@ -1486,7 +1486,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Disinformation and language models"
+    "## Disinformation and Language Models"
   ]
  },
  {
@ -1535,7 +1535,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/11_midlevel_data.ipynb
+++ b/clean/11_midlevel_data.ipynb
@ -15,14 +15,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Data munging with fastai's mid-level API"
+    "# Data Munging With fastai's mid-Level API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Going deeper into fastai's layered API"
+    "## Going Deeper into fastai's Layered API"
   ]
  },
  {
@ -179,7 +179,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Writing your own Transform"
+    "### Writing Your Own Transform"
   ]
  },
  {
@ -315,7 +315,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## TfmdLists and Datasets: Transformed collections"
+    "## TfmdLists and Datasets: Transformed Collections"
   ]
  },
  {
@ -599,7 +599,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Applying the mid-tier data API: SiamesePair"
+    "## Applying the mid-Tier Data API: SiamesePair"
   ]
  },
  {
@ -836,7 +836,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
@ -851,7 +851,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Becoming a deep learning practitioner"
+    "## Becoming a Deep Learning Practitioner"
   ]
  },
  {
--- a/clean/12_nlp_dive.ipynb
+++ b/clean/12_nlp_dive.ipynb
@ -14,14 +14,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# A language model from scratch"
+    "# A Language Model from Scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The data"
+    "## The Data"
   ]
  },
  {
@ -176,7 +176,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Our first language model from scratch"
+    "## Our First Language Model from Scratch"
   ]
  },
  {
@ -235,7 +235,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Our language model in PyTorch"
+    "### Our Language Model in PyTorch"
   ]
  },
  {
@ -352,7 +352,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Our first recurrent neural network"
+    "### Our First Recurrent Neural Network"
   ]
  },
  {
@ -450,7 +450,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Maintaining the state of an RNN"
+    "### Maintaining the State of an RNN"
   ]
  },
  {
@ -634,7 +634,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating more signal"
+    "### Creating More Signal"
   ]
  },
  {
@ -860,7 +860,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The model"
+    "## The Model"
   ]
  },
  {
@ -1030,7 +1030,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Exploding or disappearing activations"
+    "### Exploding or Disappearing Activations"
   ]
  },
  {
@ -1044,7 +1044,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Building an LSTM from scratch"
+    "### Building an LSTM from Scratch"
   ]
  },
  {
@ -1140,7 +1140,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Training a language model using LSTMs"
+    "### Training a Language Model Using LSTMs"
   ]
  },
  {
@ -1339,14 +1339,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### AR and TAR regularization"
+    "### AR and TAR Regularization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Training a weight-tied regularized LSTM"
+    "### Training a Weight-Tied Regularized LSTM"
   ]
  },
  {
@ -1597,7 +1597,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/13_convolutions.ipynb
+++ b/clean/13_convolutions.ipynb
@ -17,14 +17,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Convolutional neural networks"
+    "# Convolutional Neural Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The magic of convolutions"
+    "## The Magic of Convolutions"
   ]
  },
  {
@ -1253,7 +1253,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Mapping a convolution kernel"
+    "### Mapping a Convolution Kernel"
   ]
  },
  {
@ -1479,21 +1479,21 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Strides and padding"
+    "### Strides and Padding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Understanding the convolution equations"
+    "### Understanding the Convolution Equations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Our first convolutional neural network"
+    "## Our First Convolutional Neural Network"
   ]
  },
  {
@ -1737,7 +1737,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Understanding convolution arithmetic"
+    "### Understanding Convolution Arithmetic"
   ]
  },
  {
@ -1808,21 +1808,21 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Receptive fields"
+    "### Receptive Fields"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### A note about Twitter"
+    "### A Note about Twitter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Colour images"
+    "## Colour Images"
   ]
  },
  {
@ -1896,7 +1896,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Improving training stability"
+    "## Improving Training Stability"
   ]
  },
  {
@ -1982,7 +1982,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### A simple baseline"
+    "### A Simple Baseline"
   ]
  },
  {
@ -2125,7 +2125,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Increase batch size"
+    "### Increase Batch Size"
   ]
  },
  {
@ -2204,7 +2204,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### 1cycle training"
+    "### 1cycle Training"
   ]
  },
  {
@ -2353,7 +2353,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Batch normalization"
+    "### Batch Normalization"
   ]
  },
  {
@ -2634,7 +2634,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/14_resnet.ipynb
+++ b/clean/14_resnet.ipynb
@ -23,7 +23,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Going back to Imagenette"
+    "## Going Back to Imagenette"
   ]
  },
  {
@ -230,14 +230,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Building a modern CNN: ResNet"
+    "## Building a Modern CNN: ResNet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Skip-connections"
+    "### Skip-Connections"
   ]
  },
  {
@ -446,7 +446,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### A state-of-the-art ResNet"
+    "### A State-of-the-Art ResNet"
   ]
  },
  {
@ -602,7 +602,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Bottleneck layers"
+    "### Bottleneck Layers"
   ]
  },
  {
@ -856,7 +856,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/15_arch_details.ipynb
+++ b/clean/15_arch_details.ipynb
@ -14,14 +14,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Application architectures deep dive"
+    "# Application Architectures Deep Dive"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Computer vision"
+    "## Computer Vision"
   ]
  },
  {
@ -97,7 +97,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### A Siamese network"
+    "### A Siamese Network"
   ]
  },
  {
@ -353,7 +353,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Natural language processing"
+    "## Natural Language Processing"
   ]
  },
  {
@ -367,7 +367,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Wrapping up architectures"
+    "## Wrapping up Architectures"
   ]
  },
  {
@ -405,7 +405,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/16_accel_sgd.ipynb
+++ b/clean/16_accel_sgd.ipynb
@ -16,14 +16,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# The training process"
+    "# The Training Process"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Let's start with SGD"
+    "## Let's Start with SGD"
   ]
  },
  {
@ -229,7 +229,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## A generic optimizer"
+    "## A Generic Optimizer"
   ]
  },
  {
@ -591,7 +591,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Decoupled weight_decay"
+    "## Decoupled Weight Decay"
   ]
  },
  {
@ -605,7 +605,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating a callback"
+    "### Creating a Callback"
   ]
  },
  {
@ -647,7 +647,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Callback ordering and exceptions"
+    "### Callback Ordering and Exceptions"
   ]
  },
  {
@ -714,7 +714,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/17_foundations.ipynb
+++ b/clean/17_foundations.ipynb
@ -16,28 +16,28 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# A neural net from the foundations"
+    "# A Neural Net from the Foundations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## A neural net layer from scratch"
+    "## A Neural Net Layer from Scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Modeling a neuron"
+    "### Modeling a Neuron"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Matrix multiplication from scratch"
+    "### Matrix Multiplication from Scratch"
   ]
  },
  {
@ -112,6 +112,13 @@
    "%timeit -n 20 t2=m1@m2"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Elementwise Arithmetic"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -710,7 +717,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Einstein summation"
+    "### Einstein Summation"
   ]
  },
  {
@ -743,14 +750,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The forward and backward passes"
+    "## The Forward and Backward Passes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Defining and initializing a layer"
+    "### Defining and Initializing a Layer"
   ]
  },
  {
@ -1149,7 +1156,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Gradients and backward pass"
+    "### Gradients and Backward Pass"
   ]
  },
  {
@ -1251,7 +1258,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Refactor the model"
+    "### Refactor the Model"
   ]
  },
  {
@ -1573,7 +1580,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/18_CAM.ipynb
+++ b/clean/18_CAM.ipynb
@ -16,14 +16,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# CNN interpretation with CAM"
+    "# CNN Interpretation with CAM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## CAM and hooks"
+    "## CAM and Hooks"
   ]
  },
  {
@ -450,7 +450,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/19_learner.ipynb
+++ b/clean/19_learner.ipynb
@ -14,7 +14,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# fastai Learner from scratch"
+    "# fastai Learner from Scratch"
   ]
  },
  {
@ -1079,7 +1079,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Scheduling the learning rate"
+    "### Scheduling the Learning Rate"
   ]
  },
  {
@ -1335,7 +1335,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Further research"
+    "### Further Research"
   ]
  },
  {
--- a/clean/20_conclusion.ipynb
+++ b/clean/20_conclusion.ipynb
@ -4,7 +4,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Concluding thoughts"
+    "# Concluding Thoughts"
   ]
  },
  {
--- a/clean/app_blog.ipynb
+++ b/clean/app_blog.ipynb
@ -15,7 +15,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Creating a blog"
+    "# Creating a Blog"
   ]
  },
  {
@ -29,35 +29,35 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating the repository"
+    "### Creating the Repository"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Setting up your homepage"
+    "### Setting up Your Homepage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creating posts"
+    "### Creating Posts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Synchronizing GitHub and your computer"
+    "### Synchronizing GitHub and Your Computer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Jupyter for blogging"
+    "### Jupyter for Blogging"
   ]
  },
  {