diff --git a/clean/01_intro.ipynb b/clean/01_intro.ipynb index b3d3a39..9bf7c8d 100644 --- a/clean/01_intro.ipynb +++ b/clean/01_intro.ipynb @@ -1471,12 +1471,77 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It can be hard to know in pages and pages of prose what are the key things you really need to focus on and remember. So we've prepared a list of questions and suggested steps to complete at the end of each chapter. All the answers are in the text of the chapter, so if you're not sure about anything here, re-read that part of the text and make sure you understand it. Answers to all these questions are also available on the [book website](https://book.fast.ai). You can also visit [the forums](https://forums.fast.ai) if you get stuck to get help from other folks studying this material." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Do you need these for deep learning?\n", + " - Lots of math T / F\n", + " - Lots of data T / F\n", + " - Lots of expensive computers T / F\n", + " - A PhD T / F\n", + "1. Name five areas where deep learning is now the best in the world.\n", + "1. What was the name of the first device that was based on the principle of the artificial neuron?\n", + "1. Based on the book of the same name, what are the requirements for \"Parallel Distributed Processing\"?\n", + "1. What were the two theoretical misunderstandings that held back the field of neural networks?\n", + "1. What is a GPU?\n", + "1. Open a notebook and execute a cell containing: `1+1`. What happens?\n", + "1. Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.\n", + "1. Complete the Jupyter Notebook online appendix.\n", + "1. Why is it hard to use a traditional computer program to recognize images in a photo?\n", + "1. What did Samuel mean by \"Weight Assignment\"?\n", + "1. What term do we normally use in deep learning for what Samuel called \"Weights\"?\n", + "1. Draw a picture that summarizes Arthur Samuel's view of a machine learning model\n", + "1. Why is it hard to understand why a deep learning model makes a particular prediction?\n", + "1. What is the name of the theorem that a neural network can solve any mathematical problem to any level of accuracy?\n", + "1. What do you need in order to train a model?\n", + "1. How could a feedback loop impact the rollout of a predictive policing model?\n", + "1. Do we always have to use 224x224 pixel images with the cat recognition model?\n", + "1. What is the difference between classification and regression?\n", + "1. What is a validation set? What is a test set? Why do we need them?\n", + "1. What will fastai do if you don't provide a validation set?\n", + "1. Can we always use a random sample for a validation set? Why or why not?\n", + "1. What is overfitting? Provide an example.\n", + "1. What is a metric? How does it differ to \"loss\"?\n", + "1. How can pretrained models help?\n", + "1. What is the \"head\" of a model?\n", + "1. What kinds of features do the early layers of a CNN find? How about the later layers?\n", + "1. Are image models only useful for photos?\n", + "1. What is an \"architecture\"?\n", + "1. What is segmentation?\n", + "1. What is `y_range` used for? When do we need it?\n", + "1. What are \"hyperparameters\"?\n", + "1. What's the best way to avoid failures when using AI in an organization?" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Further research" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each chapter also has a \"further research\" with questions that aren't fully answered in the text, or include more advanced assignments. Answers to these questions aren't on the book website--you'll need to do your own research!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Why is a GPU useful for deep learning? How is a CPU different, and why is it less effective for deep learning?\n", + "1. Try to think of three areas where feedback loops might impact use of machine learning. See if you can find documented examples of that happening in practice." + ] } ], "metadata": { diff --git a/clean/02_production.ipynb b/clean/02_production.ipynb index 553b1fa..0f6da4d 100644 --- a/clean/02_production.ipynb +++ b/clean/02_production.ipynb @@ -987,6 +987,41 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Provide an example of where the bear classification model might work poorly, due to structural or style differences to the training data.\n", + "1. Where do text models currently have a major deficiency?\n", + "1. What are possible negative societal implications of text generation models?\n", + "1. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?\n", + "1. What kind of tabular data is deep learning particularly good at?\n", + "1. What's a key downside of directly using a deep learning model for recommendation systems?\n", + "1. What are the steps of the Drivetrain approach?\n", + "1. How do the steps of the Drivetrain approach map to a recommendation system?\n", + "1. Create an image recognition model using data you curate, and deploy it on the web.\n", + "1. What is `DataLoaders`?\n", + "1. What four things do we need to tell fastai to create `DataLoaders`?\n", + "1. What does the `splitter` parameter to `DataBlock` do?\n", + "1. How do we ensure a random split always gives the same validation set?\n", + "1. What letters are often used to signify the independent and dependent variables?\n", + "1. What's the difference between crop, pad, and squish resize approaches? When might you choose one over the other?\n", + "1. What is data augmentation? Why is it needed?\n", + "1. What is the difference between `item_tfms` and `batch_tfms`?\n", + "1. What is a confusion matrix?\n", + "1. What does `export` save?\n", + "1. What is it called when we use a model for getting predictions, instead of training?\n", + "1. What are IPython widgets?\n", + "1. When might you want to use CPU for deployment? When might GPU be better?\n", + "1. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?\n", + "1. What are 3 examples of problems that could occur when rolling out a bear warning system in practice?\n", + "1. What is \"out of domain data\"?\n", + "1. What is \"domain shift\"?\n", + "1. What are the 3 steps in the deployment process?\n", + "1. For a project you're interested in applying deep learning to, consider the thought experiment \"what would happen if it went really, really well?\"\n", + "1. Start a blog, and write your first blog post. For instance, write about what you think deep learning might be useful for in a domain you're interested in." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -994,6 +1029,14 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Consider how the Drivetrain approach maps to a project or problem you're interested in.\n", + "1. When might it be best to avoid certain types of data augmentation?" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/03_ethics.ipynb b/clean/03_ethics.ipynb index 5da9fb4..10571fa 100644 --- a/clean/03_ethics.ipynb +++ b/clean/03_ethics.ipynb @@ -224,6 +224,30 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Does ethics provide a list of \"right answers\"?\n", + "1. How can working with people of different backgrounds help when considering ethical questions?\n", + "1. What was the role of IBM in Nazi Germany? Why did the company participate as they did? Why did the workers participate?\n", + "1. What was the role of the first person jailed in the VW diesel scandal?\n", + "1. What was the problem with a database of suspected gang members maintained by California law enforcement officials?\n", + "1. Why did YouTube's recommendation algorithm recommend videos of partially clothed children to pedophiles, even although no employee at Google programmed this feature?\n", + "1. What are the problems with the centrality of metrics?\n", + "1. Why did Meetup.com not include gender in their recommendation system for tech meetups?\n", + "1. What are the six types of bias in machine learning, according to Suresh and Guttag?\n", + "1. Give two examples of historical race bias in the US\n", + "1. Where are most images in Imagenet from?\n", + "1. In the paper \"Does Machine Learning Automate Moral Hazard and Error\" why is sinusitis found to be predictive of a stroke?\n", + "1. What is representation bias?\n", + "1. How are machines and people different, in terms of their use for making decisions?\n", + "1. Is disinformation the same as \"fake news\"?\n", + "1. Why is disinformation through auto-generated text a particularly significant issue?\n", + "1. What are the five ethical lenses described by the Markkula Center?\n", + "1. Where is policy an appropriate tool for addressing data ethics issues?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -231,6 +255,20 @@ "### Further research:" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Read the article \"What Happens When an Algorithm Cuts Your Healthcare\". How could problems like this be avoided in the future?\n", + "1. Research to find out more about YouTube's recommendation system and its societal impacts. Do you think recommendation systems must always have feedback loops with negative results? What approaches could Google take? What about the government?\n", + "1. Read the paper \"Discrimination in Online Ad Delivery\". Do you think Google should be considered responsible for what happened to Dr Sweeney? What would be an appropriate response?\n", + "1. How can a cross-disciplinary team help avoid negative consequences?\n", + "1. Read the paper \"Does Machine Learning Automate Moral Hazard and Error\" in American Economic Review. What actions do you think should be taken to deal with the issues identified in this paper?\n", + "1. Read the article \"How Will We Prevent AI-Based Forgery?\" Do you think Etzioni's proposed approach could work? Why?\n", + "1. Complete the section \"Analyze a project you are working on\" in this chapter.\n", + "1. Consider whether your team could be more diverse. If so, what approaches might help?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -238,6 +276,25 @@ "## Section 1: that's a wrap!" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Congratulations! You've made it to the end of the first section of the book. In this section we've tried to show you what deep learning can do, and how you can use it to create real applications and products. At this point, you will get a lot more out of the book if you spend some time trying out what you've learnt. Perhaps you have already been doing this as you go along — in which case, great! But if not, that's no problem either… Now is a great time to start experimenting yourself.\n", + "\n", + "If you haven't been to the book website yet, head over there now. Remember, you can find it here: [book.fast.ai](https://book.fast.ai). It's really important that you have got yourself set up to run the notebooks. Becoming an effective deep learning practitioner is all about practice. So you need to be training models. So please go get the notebooks running now if you haven't already! And also have a look on the website for any important updates or notices; deep learning changes fast, and we can't change the words that are printed in this book, so the website is where you need to look to ensure you have the most up-to-date information.\n", + "\n", + "Make sure that you have completed the following steps:\n", + "\n", + "- Connected to one of the GPU Jupyter servers recommended on the book website\n", + "- Run the first notebook yourself\n", + "- Uploaded an image that you find in the first notebook; then try a few different images of different kinds to see what happens\n", + "- Run the second notebook, collecting your own dataset based on image search queries that you come up with\n", + "- Thought about how you can use deep learning to help you with your own projects, including what kinds of data you could use, what kinds of problems may come up, and how you might be able to mitigate these issues in practice.\n", + "\n", + "In the next section of the book we will learn about how and why deep learning works, instead of just seeing how we can use it in practice. Understanding the how and why is important for both practitioners and researchers, because in this fairly new field nearly every project requires some level of customisation and debugging. The better you understand the foundations of deep learning, the better your models will be. These foundations are less important for executives, product managers, and so forth (although still useful, so feel free to keep reading!), but they are critical for anybody who is actually training and deploying models themselves." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/04_mnist_basics.ipynb b/clean/04_mnist_basics.ipynb index 9b38374..d8c8ef7 100644 --- a/clean/04_mnist_basics.ipynb +++ b/clean/04_mnist_basics.ipynb @@ -3936,6 +3936,49 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. How is a greyscale image represented on a computer? How about a color image?\n", + "1. How are the files and folders in the `MNIST_SAMPLE` dataset structured? Why?\n", + "1. Explain how the \"pixel similarity\" approach to classifying digits works.\n", + "1. What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.\n", + "1. What is a \"rank 3 tensor\"?\n", + "1. What is the difference between tensor rank and shape? How do you get the rank from the shape?\n", + "1. What are RMSE and L1 norm?\n", + "1. How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?\n", + "1. Create a 3x3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom right 4 numbers.\n", + "1. What is broadcasting?\n", + "1. Are metrics generally calculated using the training set, or the validation set? Why?\n", + "1. What is SGD?\n", + "1. Why does SGD use mini batches?\n", + "1. What are the 7 steps in SGD for machine learning?\n", + "1. How do we initialize the weights in a model?\n", + "1. What is \"loss\"?\n", + "1. Why can't we always use a high learning rate?\n", + "1. What is a \"gradient\"?\n", + "1. Do you need to know how to calculate gradients yourself?\n", + "1. Why can't we use accuracy as a loss function?\n", + "1. Draw the sigmoid function. What is special about its shape?\n", + "1. What is the difference between loss and metric?\n", + "1. What is the function to calculate new weights using a learning rate?\n", + "1. What does the `DataLoader` class do?\n", + "1. Write pseudo-code showing the basic steps taken each epoch for SGD.\n", + "1. Create a function which, if passed two arguments `[1,2,3,4]` and `'abcd'`, returns `[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]`. What is special about that output data structure?\n", + "1. What does `view` do in PyTorch?\n", + "1. What are the \"bias\" parameters in a neural network? Why do we need them?\n", + "1. What does the `@` operator do in python?\n", + "1. What does the `backward` method do?\n", + "1. Why do we have to zero the gradients?\n", + "1. What information do we have to pass to `Learner`?\n", + "1. Show python or pseudo-code for the basic steps of a training loop.\n", + "1. What is \"ReLU\"? Draw a plot of it for values from `-2` to `+2`.\n", + "1. What is an \"activation function\"?\n", + "1. What's the difference between `F.relu` and `nn.ReLU`?\n", + "1. The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -3943,6 +3986,14 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Create your own implementation of `Learner` from scratch, based on the training loop shown in this chapter.\n", + "1. Complete all the steps in this chapter using the full MNIST datasets (that is, for all digits, not just threes and sevens). This is a significant project and will take you quite a bit of time to complete! You'll need to do some of your own research to figure out how to overcome some obstacles you'll meet on the way." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/05_pet_breeds.ipynb b/clean/05_pet_breeds.ipynb index 6ba19fc..4eae173 100644 --- a/clean/05_pet_breeds.ipynb +++ b/clean/05_pet_breeds.ipynb @@ -1690,6 +1690,35 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Why do we first resize to a large size on the CPU, and then to a smaller size on the GPU?\n", + "1. If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book website for suggestions.\n", + "1. What are the two ways in which data is most commonly provided, for most deep learning datasets?\n", + "1. Look up the documentation for `L` and try using a few of the new methods is that it adds.\n", + "1. Look up the documentation for the Python pathlib module and try using a few methods of the Path class.\n", + "1. Give two examples of ways that image transformations can degrade the quality of the data.\n", + "1. What method does fastai provide to view the data in a DataLoader?\n", + "1. What method does fastai provide to help you debug a DataBlock?\n", + "1. Should you hold off on training a model until you have thoroughly cleaned your data?\n", + "1. What are the two pieces that are combined into cross entropy loss in PyTorch?\n", + "1. What are the two properties of activations that softmax ensures? Why is this important?\n", + "1. When might you want your activations to not have these two properties?\n", + "1. Calculate the \"exp\" and \"softmax\" columns of <> yourself (i.e. in a spreadsheet, with a calculator, or in a notebook).\n", + "1. Why can't we use torch.where to create a loss function for datasets where our label can have more than two categories?\n", + "1. What is the value of log(-2)? Why?\n", + "1. What are two good rules of thumb for picking a learning rate from the learning rate finder?\n", + "1. What two steps does the fine_tune method do?\n", + "1. In Jupyter notebook, how do you get the source code for a method or function?\n", + "1. What are discriminative learning rates?\n", + "1. How is a Python slice object interpreted when past as a learning rate to fastai?\n", + "1. Why is early stopping a poor choice when using one cycle training?\n", + "1. What is the difference between resnet 50 and resnet101?\n", + "1. What does to_fp16 do?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1697,6 +1726,14 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Find the paper by Leslie Smith that introduced the learning rate finder, and read it.\n", + "1. See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Have a look on the forums and book website to see what other students have achieved with this dataset, and how they did it." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/06_multicat.ipynb b/clean/06_multicat.ipynb index 6325e8d..48a6079 100644 --- a/clean/06_multicat.ipynb +++ b/clean/06_multicat.ipynb @@ -1290,6 +1290,29 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. how could multi-label classification improve the usability of the bear classifier?\n", + "1. How do we encode the dependent variable in a multi-label classification problem?\n", + "1. How do you access the rows and columns of a DataFrame as if it was a matrix?\n", + "1. How do you get a column by name from a DataFrame?\n", + "1. What is the difference between a dataset and DataLoader?\n", + "1. What does a Datasets object normally contain?\n", + "1. What does a DataLoaders object normally contain?\n", + "1. What does lambda do in Python?\n", + "1. What are the methods to customise how the independent and dependent variables are created with the data block API?\n", + "1. Why is softmax not an appropriate output activation function when using a one hot encoded target?\n", + "1. Why is nll_loss not an appropriate loss function when using a one hot encoded target?\n", + "1. What is the difference between `nn.BCELoss` and `nn.BCEWithLogitsLoss`?\n", + "1. Why can't we use regular accuracy in a multi-label problem?\n", + "1. When is it okay to tune an hyper-parameter on the validation set?\n", + "1. How is `y_range` implemented in fastai? (See if you can implement it yourself and test it without peaking!)\n", + "1. What is a regression problem? What loss function should you use for such a problem?\n", + "1. What do you need to do to make sure the fastai library applies the same data augmentation to your inputs images and your target point coordinates?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1297,6 +1320,14 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Read a tutorial about pandas DataFrames and experiment with a few methods that look interesting to you. Have a look at the book website for recommended tutorials.\n", + "1. Retrain the bear classifier using multi-label classification. See if you can make it work effectively with images that don't contain any bears, including showing that information in the web application. Try an image with two different kinds of bears. Check whether the accuracy on the single label dataset is impacted using multi-label classification." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/07_sizing_and_tta.ipynb b/clean/07_sizing_and_tta.ipynb index c6fddb0..553a6e0 100644 --- a/clean/07_sizing_and_tta.ipynb +++ b/clean/07_sizing_and_tta.ipynb @@ -607,6 +607,26 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What is the difference between ImageNet and Imagenette? When is it better to experiment on one versus the other?\n", + "1. What is normalization?\n", + "1. Why didn't we have to care about normalization when using a pretrained model?\n", + "1. What is progressive resizing?\n", + "1. Implement progressive resizing in your own project. Did it help?\n", + "1. What is test time augmentation? How do you use it in fastai?\n", + "1. Is using TTA at inference slower or faster than regular inference? Why?\n", + "1. What is Mixup? How do you use it in fastai?\n", + "1. Why does Mixup prevent the model from being too confident?\n", + "1. Why does a training with Mixup for 5 epochs end up worse than a training without Mixup?\n", + "1. What is the idea behind label smoothing?\n", + "1. What problems in your data can label smoothing help with?\n", + "1. When using label smoothing with 5 categories, what is the target associated with the index 1?\n", + "1. What is the first step to take when you want to prototype quick experiments on a new dataset." + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/clean/08_collab.ipynb b/clean/08_collab.ipynb index 0387131..fba96fa 100644 --- a/clean/08_collab.ipynb +++ b/clean/08_collab.ipynb @@ -1638,6 +1638,43 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What problem does collaborative filtering solve?\n", + "1. How does it solve it?\n", + "1. Why might a collaborative filtering predictive model fail to be a very useful recommendation system?\n", + "1. What does a crosstab representation of collaborative filtering data look like?\n", + "1. Write the code to create a crosstab representation of the MovieLens data (you might need to do some web searching!)\n", + "1. What is a latent factor? Why is it \"latent\"?\n", + "1. What is a dot product? Calculate a dot product manually using pure python with lists.\n", + "1. What does `pandas.DataFrame.merge` do?\n", + "1. What is an embedding matrix?\n", + "1. What is the relationship between an embedding and a matrix of one-hot encoded vectors?\n", + "1. Why do we need `Embedding` if we could use one-hot encoded vectors for the same thing?\n", + "1. What does an embedding contain before we start training (assuming we're not using a prertained model)?\n", + "1. Create a class (without peeking, if possible!) and use it.\n", + "1. What does `x[:,0]` return?\n", + "1. Rewrite the `DotProduct` class (without peeking, if possible!) and train a model with it\n", + "1. What is a good loss function to use for MovieLens? Why? \n", + "1. What would happen if we used `CrossEntropy` loss with MovieLens? How would we need to change the model?\n", + "1. What is the use of bias in a dot product model?\n", + "1. What is another name for weight decay?\n", + "1. Write the equation for weight decay (without peeking!)\n", + "1. Write the equation for the gradient of weight decay. Why does it help reduce weights?\n", + "1. Why does reducing weights lead to better generalization?\n", + "1. What does `argsort` do in PyTorch?\n", + "1. Does sorting the movie biases give the same result as averaging overall movie ratings by movie? Why / why not?\n", + "1. How do you print the names and details of the layers in a model?\n", + "1. What is the \"bootstrapping problem\" in collaborative filtering?\n", + "1. How could you deal with the bootstrapping problem for new users? For new movies?\n", + "1. How can feedback loops impact collaborative filtering systems?\n", + "1. When using a neural network in collaborative filtering, why can we have different number of factors for movie and user?\n", + "1. Why is there a `nn.Sequential` in the `CollabNN` model?\n", + "1. What kind of model should be use if we want to add metadata about users and items, or information such as date and time, to a collaborative filter model?" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/clean/09_tabular.ipynb b/clean/09_tabular.ipynb index 908862b..7825e04 100644 --- a/clean/09_tabular.ipynb +++ b/clean/09_tabular.ipynb @@ -8271,6 +8271,45 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What is a continuous variable?\n", + "1. What is a categorical variable?\n", + "1. Provide 2 of the words that are used for the possible values of a categorical variable.\n", + "1. What is a \"dense layer\"?\n", + "1. How do entity embeddings reduce memory usage and speed up neural networks?\n", + "1. What kind of datasets are entity embeddings especially useful for?\n", + "1. What are the two main families of machine learning algorithms?\n", + "1. Why do some categorical columns need a special ordering in their classes? How do you do this in pandas?\n", + "1. Summarize what a decision tree algorithm does.\n", + "1. Why is a date different from a regular categorical or continuous variable, and how can you preprocess it to allow it to be used in a model?\n", + "1. Should you pick a random validation set in the bulldozer competition? If no, what kind of validation set should you pick?\n", + "1. What is pickle and what is it useful for?\n", + "1. How are `mse`, `samples`, and `values` calculated in the decision tree drawn in this chapter?\n", + "1. How do we deal with outliers, before building a decision tree?\n", + "1. How do we handle categorical variables in a decision tree?\n", + "1. What is bagging?\n", + "1. What is the difference between `max_samples` and `max_features` when creating a random forest?\n", + "1. If you increase `n_estimators` to a very high value, can that lead to overfitting? Why or why not?\n", + "1. What is *out of bag error*?\n", + "1. Make a list of reasons why a model's validation set error might be worse than the OOB error. How could you test your hypotheses?\n", + "1. How can you answer each of these things with a random forest? How do they work?:\n", + " - How confident are we in our projections using a particular row of data?\n", + " - For predicting with a particular row of data, what were the most important factors, and how did they influence that prediction?\n", + " - Which columns are the strongest predictors?\n", + " - How do predictions vary, as we vary these columns?\n", + "1. What's the purpose of removing unimportant variables?\n", + "1. What's a good type of plot for showing tree interpreter results?\n", + "1. What is the *extrapolation problem*?\n", + "1. How can you tell if your test or validation set is distributed in a different way to your training set?\n", + "1. Why do we make `saleElapsed` a continuous variable, even although it has less than 9000 distinct values?\n", + "1. What is boosting?\n", + "1. How could we use embeddings with a random forest? Would we expect this to help?\n", + "1. Why might we not always use a neural net for tabular modeling?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -8278,6 +8317,16 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Pick a competition on Kaggle with tabular data (current or past) and try to adapt the techniques seen in this chapter to get the best possible results. Compare yourself to the private leaderboard.\n", + "1. Implement the decision tree algorithm in this chapter from scratch yourself, and try it on this dataset.\n", + "1. Use the embeddings from the neural net in this chapter in a random forest, and see if you can improve on the random forest results we saw.\n", + "1. Explain what each line of the source of `TabularModel` does (with the exception of the `BatchNorm1d` and `Dropout` layers)." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/10_nlp.ipynb b/clean/10_nlp.ipynb index 8250c10..cd60fb0 100644 --- a/clean/10_nlp.ipynb +++ b/clean/10_nlp.ipynb @@ -1496,12 +1496,48 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What is self-supervised learning?\n", + "1. What is a language model?\n", + "1. Why is a language model considered self-supervised learning?\n", + "1. What are self-supervised models usually used for?\n", + "1. What do we fine-tune language models?\n", + "1. What are the three steps to create a state-of-the-art text classifier?\n", + "1. How do the 50,000 unlabeled movie reviews help create a better text classifier for the IMDb dataset?\n", + "1. What are the three steps to prepare your data for a language model?\n", + "1. What is tokenization? Why do we need it?\n", + "1. Name three different approaches to tokenization.\n", + "1. What is 'xxbos'?\n", + "1. List 4 rules that fastai applies to text during tokenization.\n", + "1. Why are repeated characters replaced with a token showing the number of repetitions, and the character that's repeated?\n", + "1. What is numericalization?\n", + "1. Why might there be words that are replaced with the \"unknown word\" token?\n", + "1. With a batch size of 64, the first row of the tensor representing the first batch contains the first 64 tokens for the dataset. What does the second row of that tensor contain? What does the first row of the second batch contain? (Careful—students often get this one wrong! Be sure to check your answer against the book website.)\n", + "1. Why do we need padding for text classification? Why don't we need it for language modeling?\n", + "1. What does an embedding matrix for NLP contain? What is its shape?\n", + "1. What is perplexity?\n", + "1. Why do we have to pass the vocabulary of the language model to the classifier data block?\n", + "1. What is gradual unfreezing?\n", + "1. Why is text generation always likely to be ahead of automatic identification of machine generated texts?" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Further research" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. See what you can learn about language models and disinformation. What are the best language models today? Have a look at some of their outputs. Do you find them convincing? How could a bad actor best use this to create conflict and uncertainty?\n", + "1. Given the limitation that models are unlikely to be able to consistently recognise machine generated texts, what other approaches may be needed to handle large-scale disinformation campaigns that leveraged deep learning?" + ] } ], "metadata": { diff --git a/clean/12_nlp_dive.ipynb b/clean/12_nlp_dive.ipynb index ecd4052..5eb2470 100644 --- a/clean/12_nlp_dive.ipynb +++ b/clean/12_nlp_dive.ipynb @@ -1548,6 +1548,51 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. If the dataset for your project is so big and complicated that working with it takes a significant amount of time, what should you do?\n", + "1. Why do we concatenating the documents in our dataset before creating a language model?\n", + "1. To use a standard fully connected network to predict the fourth word given the previous three words, what two tweaks do we need to make?\n", + "1. How can we share a weight matrix across multiple layers in PyTorch?\n", + "1. Write a module which predicts the third word given the previous two words of a sentence, without peeking.\n", + "1. What is a recurrent neural network?\n", + "1. What is hidden state?\n", + "1. What is the equivalent of hidden state in ` LMModel1`?\n", + "1. To maintain the state in an RNN why is it important to pass the text to the model in order?\n", + "1. What is an unrolled representation of an RNN?\n", + "1. Why can maintaining the hidden state in an RNN lead to memory and performance problems? How do we fix this problem?\n", + "1. What is BPTT?\n", + "1. Write code to print out the first few batches of the validation set, including converting the token IDs back into English strings, as we showed for batches of IMDb data in <>.\n", + "1. What does the `ModelReseter` callback do? Why do we need it?\n", + "1. What are the downsides of predicting just one output word for each three input words?\n", + "1. Why do we need a custom loss function for `LMModel4`?\n", + "1. Why is the training of `LMModel4` unstable?\n", + "1. In the unrolled representation, we can see that a recurrent neural network actually has many layers. So why do we need to stack RNNs to get better results?\n", + "1. Draw a representation of a stacked (multilayer) RNN.\n", + "1. Why should we get better results in an RNN if we call `detach` less often? Why might this not happen in practice with a simple RNN?\n", + "1. Why can a deep network result in very large or very small activations? Why does this matter?\n", + "1. In a computer's floating point representation of numbers, which numbers are the most precise?\n", + "1. Why do vanishing gradients prevent training?\n", + "1. Why does it help to have two hidden states in the LSTM architecture? What is the purpose of each one?\n", + "1. What are these two states called in an LSTM?\n", + "1. What is tanh, and how is it related to sigmoid?\n", + "1. What is the purpose of this code in `LSTMCell`?: `h = torch.stack([h, input], dim=1)`\n", + "1. What does `chunk` to in PyTorch?\n", + "1. Study the refactored version of `LSTMCell` carefully to ensure you understand how and why it does the same thing as the non-refactored version.\n", + "1. Why can we use a higher learning rate for `LMModel6`?\n", + "1. What are the three regularisation techniques used in an AWD-LSTM model?\n", + "1. What is dropout?\n", + "1. Why do we scale the weights with dropout? Is this applied during training, inference, or both?\n", + "1. What is the purpose of this line from `Dropout`?: `if not self.training: return x`\n", + "1. Experiment with `bernoulli_` to understand how it works.\n", + "1. How do you set your model in training mode in PyTorch? In evaluation mode?\n", + "1. Write the equation for activation regularization (in maths or code, as you prefer). How is it different to weight decay?\n", + "1. Write the equation for temporal activation regularization (in maths or code, as you prefer). Why wouldn't we use this for computer vision problems?\n", + "1. What is \"weight tying\" in a language model?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1555,6 +1600,16 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. In ` LMModel2` why can `forward` start with `h=0`? Why don't we need to say `h=torch.zeros(…)`?\n", + "1. Write the code for an LSTM from scratch (but you may refer to <>).\n", + "1. Search on the Internet for the GRU architecture and implement it from scratch, and try training a model. See if you can get the similar results as we saw in this chapter. Compare it to the results of PyTorch's built in GRU module.\n", + "1. Have a look at the source code for AWD-LSTM in fastai, and try to map each of the lines of code to the concepts shown in this chapter." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/13_convolutions.ipynb b/clean/13_convolutions.ipynb index d591fe1..e554616 100644 --- a/clean/13_convolutions.ipynb +++ b/clean/13_convolutions.ipynb @@ -2584,6 +2584,52 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What is a \"feature\"?\n", + "1. Write out the convolutional kernel matrix for a top edge detector.\n", + "1. Write out the mathematical operation applied by a 3 x 3 kernel to a single pixel in an image.\n", + "1. What is the value of a convolutional kernel apply to a 3 x 3 matrix of zeros?\n", + "1. What is padding?\n", + "1. What is stride?\n", + "1. Create a nested list comprehension to complete any task that you choose.\n", + "1. What are the shapes of the input and weight parameters to PyTorch's 2D convolution?\n", + "1. What is a channel?\n", + "1. What is the relationship between a convolution and a matrix multiplication?\n", + "1. What is a convolutional neural network?\n", + "1. What is the benefit of refactoring parts of your neural network definition?\n", + "1. What is `Flatten`? Where does it need to be included in the MNIST CNN? Why?\n", + "1. What does \"NCHW\" mean?\n", + "1. Why does the third layer of the MNIST CNN have `7*7*(1168-16)` multiplications?\n", + "1. What is a receptive field?\n", + "1. What is the size of the receptive field of an activation after two stride 2 convolutions? Why?\n", + "1. Run conv-example.xlsx yourself and experiment with \"trace precedents\".\n", + "1. Have a look at Jeremy or Sylvain's list of recent Twitter \"like\"s, and see if you find any interesting resources or ideas there.\n", + "1. How is a color image represented as a tensor?\n", + "1. How does a convolution work with a color input?\n", + "1. What method can we use to see that data in DataLoaders?\n", + "1. Why do we double the number of filters after each stride 2 conv?\n", + "1. Why do we use a larger kernel in the first conv with MNIST (with `simple_cnn`)?\n", + "1. What information does `ActivationStats` save for each layer?\n", + "1. How can we access a learner's callback after training?\n", + "1. What are the three statistics plotted by `plot_layer_stats`? What does the x-axis represent?\n", + "1. Why are activations near zero problematic?\n", + "1. What are the upsides and downsides of training with a larger batch size?\n", + "1. Why should we avoid using a high learning rate at the start of training?\n", + "1. What is 1cycle training?\n", + "1. What are the benefits of training with a high learning rate?\n", + "1. Why do we want to use a low learning rate at the end of training?\n", + "1. What is cyclical momentum?\n", + "1. What callback tracks hyperparameter values during training (along with other information)?\n", + "1. What does one column of pixels in the `color_dim` plot represent?\n", + "1. What does \"bad training\" look like in `color_dim`? Why?\n", + "1. What trainable parameters does a batch normalization layer contain?\n", + "1. What statistics are used to normalize in batch normalization during training? How about during validation?\n", + "1. Why do models with batch normalization layers generalize better?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -2591,6 +2637,16 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What features other than edge detectors have been used in computer vision (especially before deep learning became popular)?\n", + "1. There are other normalization layers available in PyTorch. Try them out and see what works best. Learn about why other normalization layers have been developed, and how they differ from batch normalization.\n", + "1. Try moving the activation function after the batch normalization layer in `conv`. Does it make a difference? See what you can find out about what order is recommended, and why.\n", + "1. Batch normalization isn't defined for a batch size of one, since the standard deviation isn't defined for a single item. " + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/14_resnet.ipynb b/clean/14_resnet.ipynb index afbf0d1..70214eb 100644 --- a/clean/14_resnet.ipynb +++ b/clean/14_resnet.ipynb @@ -818,6 +818,33 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. How did we get to a single vector of activations in the convnets used for MNIST in previous chapters? Why isn't that suitable for Imagenette?\n", + "1. What do we do for Imagenette instead?\n", + "1. What is adaptive pooling?\n", + "1. What is average pooling?\n", + "1. Why do we need `Flatten` after an adaptive average pooling layer?\n", + "1. What is a skip connection?\n", + "1. Why do skip connections allow us to train deeper models?\n", + "1. What does <> show? How did that lead to the idea of skip connections?\n", + "1. What is an identity mapping?\n", + "1. What is the basic equation for a ResNet block (ignoring batchnorm and relu layers)?\n", + "1. What do ResNets have to do with \"residuals\"?\n", + "1. How do we deal with the skip connection when there is a stride 2 convolution? How about when the number of filters changes?\n", + "1. How can we express a 1x1 convolution in terms of a vector dot product?\n", + "1. What does the `noop` function return?\n", + "1. Explain what is shown in <>.\n", + "1. When is top-5 accuracy a better metric than top-1 accuracy?\n", + "1. What is the stem of a CNN?\n", + "1. Why use plain convs in the CNN stem, instead of ResNet blocks?\n", + "1. How does a bottleneck block differ from a plain ResNet block?\n", + "1. Why is a bottleneck block faster?\n", + "1. How do fully convolution nets (and nets with adaptive pooling in general) allow for progressive resizing?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -825,6 +852,16 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Try creating a fully convolutional net with adaptive average pooling for MNIST (note that you'll need fewer stride 2 layers). How does it compare to a network without such a pooling layer?\n", + "1. In <> we introduce *Einstein summation notation*. Skip ahead to see how this works, and then write an implementation of the 1x1 convolution operation using `torch.einsum`. Compare it to the same operation using `torch.conv2d`.\n", + "1. Write a \"top 5 accuracy\" function using plain PyTorch or plain Python.\n", + "1. Train a model on Imagenette for more epochs, with and without label smoothing. Take a look at the Imagenette leaderboards and see how close you can get to the best results shown. Read the linked pages describing the leading approaches." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/15_arch_details.ipynb b/clean/15_arch_details.ipynb index d06cf59..7fab841 100644 --- a/clean/15_arch_details.ipynb +++ b/clean/15_arch_details.ipynb @@ -379,6 +379,30 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What is the head of a neural net?\n", + "1. What is the body of a neural net?\n", + "1. What is \"cutting\" a neural net? Why do we need to do this for transfer learning?\n", + "1. What is \"model_meta\"? Try printing it to see what's inside.\n", + "1. Read the source code for `create_head` and make sure you understand what each line does.\n", + "1. Look at the output of create_head and make sure you understand why each layer is there, and how the create_head source created it.\n", + "1. Figure out how to change the dropout, layer size, and number of layers created by create_cnn, and see if you can find values that result in better accuracy from the pet recognizer.\n", + "1. What does AdaptiveConcatPool2d do?\n", + "1. What is nearest neighbor interpolation? How can it be used to upsample convolutional activations?\n", + "1. What is a transposed convolution? What is another name for it?\n", + "1. Create a conv layer with `transpose=True` and apply it to an image. Check the output shape.\n", + "1. Draw the u-net architecture.\n", + "1. What is BPTT for Text Classification (BPT3C)?\n", + "1. How do we handle different length sequences in BPT3C?\n", + "1. Try to run each line of `TabularModel.forward` separately, one line per cell, in a notebook, and look at the input and output shapes at each step.\n", + "1. How is `self.layers` defined in `TabularModel`?\n", + "1. What are the five steps for preventing over-fitting?\n", + "1. Why don't we reduce architecture complexity before trying other approaches to preventing over-fitting?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -386,6 +410,17 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Write your own custom head and try training the pet recognizer with it. See if you can get a better result than fastai's default.\n", + "1. Try switching between AdaptiveConcatPool2d and AdaptiveAvgPool2d in a CNN head and see what difference it makes.\n", + "1. Write your own custom splitter to create a separate parameter group for every resnet block, and a separate group for the stem. Try training with it, and see if it improves the pet recognizer.\n", + "1. Read the online chapter about generative image models, and create your own colorizer, super resolution model, or style transfer model.\n", + "1. Create a custom head using nearest neighbor interpolation and use it to do segmentation on Camvid." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/16_accel_sgd.ipynb b/clean/16_accel_sgd.ipynb index 4155e7b..e8217f8 100644 --- a/clean/16_accel_sgd.ipynb +++ b/clean/16_accel_sgd.ipynb @@ -670,6 +670,39 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What is the equation for a step of SGD, in math or code (as you prefer)?\n", + "1. What do we pass to `cnn_learner` to use a non-default optimizer?\n", + "1. What are optimizer callbacks?\n", + "1. What does `zero_grad` do in an optimizer?\n", + "1. What does `step` do in an optimizer? How is it implemented in the general optimizer?\n", + "1. Rewrite `sgd_cb` to use the `+=` operator, instead of `add_`.\n", + "1. What is momentum? Write out the equation.\n", + "1. What's a physical analogy for momentum? How does it apply in our model training settings?\n", + "1. What does a bigger value for momentum do to the gradients?\n", + "1. What are the default values of momentum for 1cycle training?\n", + "1. What is RMSProp? Write out the equation.\n", + "1. What do the squared values of the gradients indicate?\n", + "1. How does Adam differ from momentum and RMSProp?\n", + "1. Write out the equation for Adam.\n", + "1. Calculate the value of `unbias_avg` and `w.avg` for a few batches of dummy values.\n", + "1. What's the impact of having a high eps in Adam?\n", + "1. Read through the optimizer notebook in fastai's repo, and execute it.\n", + "1. In what situations do dynamic learning rate methods like Adam change the behaviour of weight decay?\n", + "1. What are the four steps of a training loop?\n", + "1. Why is the use of callbacks better than writing a new training loop for each tweak you want to add?\n", + "1. What are the necessary points in the design of the fastai's callback system that make it as flexible as copying and pasting bits of code?\n", + "1. How can you get the list of events available to you when writing a callback?\n", + "1. Write the `ModelResetter` callback (without peeking).\n", + "1. How can you access the necessary attributes of the training loop inside a callback? When can you use or not use the shortcut that goes with it?\n", + "1. How can a callback influence the control flow of the training loop.\n", + "1. Write the `TerminateOnNaN` callback (without peeking if possible).\n", + "1. How do you make sure your callback runs after or before another callback?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -677,6 +710,16 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Look up the \"rectified Adam\" paper and implement it using the general optimizer framework, and try it out. Search for other recent optimizers that work well in practice, and pick one to implement.\n", + "1. Look at the mixed precision callback with the documentation. Try to understand what each event and line of code does.\n", + "1. Implement your own version of ther learning rate finder from scratch. Compare it with fastai's version.\n", + "1. Look at the source code of the callbacks that ship with fastai. See if you can find one that's similar to what you're looking to do, to get some inspiration." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -684,6 +727,17 @@ "## Foundations of Deep Learning: Wrap up" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Congratulations, you have made it to the end of the \"foundations of deep learning\" section. You now understand how all of fastai's applications and most important architectures are built, and the recommended ways to train them, and have all the information you need to build these from scratch. Whilst you probably won't need to create your own training loop, or batchnorm layer, for instance, knowing what is going on behind the scenes is very helpful for debugging, profiling, and deploying your solutions.\n", + "\n", + "Since you understand all of the foundations of fastai's applications now, be sure to spend some time digging through fastai's source notebooks, and running and experimenting with parts of them, since you can and see exactly how everything in fastai is developed.\n", + "\n", + "In the next section, we will be looking even further under the covers, to see how the actual forward and backward passes of a neural network are done, and we will see what tools are at our disposal to get better performance. We will then finish up with a project that brings together everything we have learned throughout the book, which we will use to build a method for interpreting convolutional neural networks." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/17_foundations.ipynb b/clean/17_foundations.ipynb index 88072b7..4230adc 100644 --- a/clean/17_foundations.ipynb +++ b/clean/17_foundations.ipynb @@ -1523,6 +1523,52 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Write the Python code to implement a single neuron.\n", + "1. Write the Python code to implement ReLU.\n", + "1. Write the Python code for a dense layer in terms of matrix multiplication.\n", + "1. Write the Python code for a dense layer in plain Python (that is with list comprehensions and functionality built into Python).\n", + "1. What is the hidden size of a layer?\n", + "1. What does the `t` method to in PyTorch?\n", + "1. Why is matrix multiplication written in plain Python very slow?\n", + "1. In matmul, why is `ac==br`?\n", + "1. In Jupyter notebook, how do you measure the time taken for a single cell to execute?\n", + "1. What is elementwise arithmetic?\n", + "1. Write the PyTorch code to test whether every element of `a` is greater than the corresponding element of `b`.\n", + "1. What is a rank-0 tensor? How do you convert it to a plain Python data type?\n", + "1. What does this return, and why?: `tensor([1,2]) + tensor([1])`\n", + "1. What does this return, and why?: `tensor([1,2]) + tensor([1,2,3])`\n", + "1. How does elementwise arithmetic help us speed up matmul?\n", + "1. What are the broadcasting rules?\n", + "1. What is `expand_as`? Show an example of how it can be used to match the results of broadcasting.\n", + "1. How does `unsqueeze` help us to solve certain broadcasting problems?\n", + "1. How can you use indexing to do the same operation as `unsqueeze`?\n", + "1. How do we show the actual contents of the memory used for a tensor?\n", + "1. When adding a vector of size 3 to a matrix of size 3 x 3, are the elements of the vector added to each row, or each column of the matrix? (Be sure to check your answer by running this code in a notebook.)\n", + "1. Do broadcasting and `expand_as` result in increased memory use? Why or why not?\n", + "1. Implement matmul using Einstein summation.\n", + "1. What does a repeated index letter represent on the left-hand side of einsum?\n", + "1. What are the three rules of Einstein summation notation? Why?\n", + "1. What is the forward pass, and the backward pass, of a neural network?\n", + "1. Why do we need to store some of the activations calculated for intermediate layers in the forward pass?\n", + "1. What is the downside of having activations with a standard deviation too far away from one?\n", + "1. How can weight initialisation help avoid this problem?\n", + "1. What is the formula to initialise weights such that we get a standard deviation of one, for a plain linear layer; for a linear layer followed by ReLU?\n", + "1. Why do we sometimes have to use the `squeeze` method in loss functions?\n", + "1. What does the argument to the squeeze method do? Why might it be important to include this argument, even though PyTorch does not require it?\n", + "1. What is the chain rule? Show the equation in either of the two forms shown in this chapter.\n", + "1. Show how to calculate the gradients of `mse(lin(l2, w2, b2), y)` using the chain rule.\n", + "1. What is the gradient of relu? Show in math or code. (You shouldn't need to commit this to memory—try to figure it using your knowledge of the shape of the function.)\n", + "1. In what order do we need to call the `*_grad` functions in the backward pass? Why?\n", + "1. What is `__call__`?\n", + "1. What methods do we need to implement when writing a `torch.autograd.Function`?\n", + "1. Write `nn.Linear` from scratch, and test it works.\n", + "1. What is the difference between `nn.Module` and fastai's `Module`?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1530,6 +1576,16 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Implement relu as a `torch.autograd.Function` and train a model with it.\n", + "1. If you are mathematically inclined, find out what the gradients of a linear layer are in maths notation. Map that to the implementation we saw in this chapter.\n", + "1. Learn about the `unfold` method in PyTorch, and use it along with matrix multiplication to implement your own 2d convolution function, and train a CNN that uses it.\n", + "1. Implement all what is in this chapter using numpy instead of PyTorch. " + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/18_CAM.ipynb b/clean/18_CAM.ipynb index ea7fc1f..925b961 100644 --- a/clean/18_CAM.ipynb +++ b/clean/18_CAM.ipynb @@ -420,6 +420,25 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What is a hook in PyTorch?\n", + "1. Which layer does CAM use the outputs of?\n", + "1. Why does CAM require a hook?\n", + "1. Look at the source code of `ActivationStats` class and see how it uses hooks.\n", + "1. Write a hook that stores the activation of a given layer in a model (without peaking, if possible).\n", + "1. Why do we call `eval` before getting the activations? Why do we use `no_grad`?\n", + "1. Use `torch.einsum` to compute the \"dog\" or \"cat\" score of each of the locations in the last activation of the body of the model.\n", + "1. How do you check which orders the categories are in (i.e. the correspondence of index->category)?\n", + "1. Why are we using `decode` when displaying the input image?\n", + "1. What is a \"context manager\"? What special methods need to be defined to create one?\n", + "1. Why can't we use plain CAM for the inner layers of a network?\n", + "1. Why do we need to hook the backward pass in order to do GradCAM?\n", + "1. Why can't we call `output.backward()` when `output` is a rank-2 tensor of output activations per image per class?" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -427,6 +446,14 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Try removing `keepdim` and see what happens. Look up this parameter in the PyTorch docs. Why do we need it in this notebook?\n", + "1. Create a notebook like this one, but for NLP, and use it to find which words in a movie review are most significant in assessing sentiment of a particular movie review." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/clean/19_learner.ipynb b/clean/19_learner.ipynb index 5a6c5d6..fdaff28 100644 --- a/clean/19_learner.ipynb +++ b/clean/19_learner.ipynb @@ -1277,6 +1277,53 @@ "## Questionnaire" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For the questions here that ask you to explain what some function or class is, you should also complete your own code experiments." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. What is glob?\n", + "1. How do you open an image with the Python imaging library?\n", + "1. What does L.map do?\n", + "1. What does Self do?\n", + "1. What is L.val2idx?\n", + "1. What methods do you need to implement to create your own Dataset?\n", + "1. Why do we call `convert` when we open an image from Imagenette?\n", + "1. What does `~` do? How is it useful for splitting training and validation sets?\n", + "1. Which of these classes does `~` work with: `L`, `Tensor`, numpy array, Python `list`, pandas `DataFrame`?\n", + "1. What is ProcessPoolExecutor?\n", + "1. How does `L.range(self.ds)` work?\n", + "1. What is `__iter__`?\n", + "1. What is `first`?\n", + "1. What is `permute`? Why is it needed?\n", + "1. What is a recursive function? How does it help us define the `parameters` method?\n", + "1. Write a recursive function which returns the first 20 items of the Fibonacci sequence.\n", + "1. What is `super`?\n", + "1. Why do subclasses of Module need to override `forward` instead of defining `__call__`?\n", + "1. In `ConvLayer` why does `init` depend on `act`?\n", + "1. Why does `Sequential` need to call `register_modules`?\n", + "1. Write a hook that prints the shape of every layers activations.\n", + "1. What is LogSumExp?\n", + "1. Why is log_softmax useful?\n", + "1. What is GetAttr? How is it helpful for callbacks?\n", + "1. Reimplement one of the callbacks in this chapter without inheriting from `Callback` or `GetAttr`.\n", + "1. What does `Learner.__call__` do?\n", + "1. What is `getattr`? (Note the case difference to `GetAttr`!)\n", + "1. Why is there a `try` block in `fit`?\n", + "1. Why do we check for `model.training` in `one_batch`?\n", + "1. What is `store_attr`?\n", + "1. What is the purpose of `TrackResults.before_epoch`?\n", + "1. What does `model.cuda()` do? How does it work?\n", + "1. Why do we need to check `model.training` in `LRFinder` and `OneCycle`?\n", + "1. Use cosine annealing in `OneCycle`." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1284,6 +1331,21 @@ "### Further research" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. Write `resnet18` from scratch (refer to <> as needed), and train it with the Learner in this chapter.\n", + "1. Implement a batchnorm layer from scratch and use it in your resnet18.\n", + "1. Write a mixup callback for use in this chapter.\n", + "1. Add momentum to `SGD`.\n", + "1. Pick a few features that you're interested in from fastai (or any other library) and implement them in this chapter.\n", + "1. Pick a research paper that's not yet implemented in fastai or PyTorch and implement it in this chapter.\n", + " - Port it over to fastai.\n", + " - Submit a PR to fastai, or create your own extension module and release it. \n", + " - Hint: you may find it helpful to use [nbdev](https://nbdev.fast.ai/) to create and deploy your package." + ] + }, { "cell_type": "code", "execution_count": null,