{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide\n",
"from utils import *"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"[[chapter_multicat]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Other computer vision problems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous chapter we learnt some important practical techniques for training models in practice. Issues like selecting learning rates and the number of epochs are very important to getting good results.\n",
"\n",
"In this chapter we are going to look at other types of computer vision problems, multi-label classification and regression. In the process will study more deeply the output activations, targets, and loss functions in deep learning models."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-label classification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Multi-label classification refers to the problem of identifying the categories of objects in an image, where you may not have exactly one type of object in the image. There may be more than one kind of object, or there may be no objects at all in the classes that you are looking for.\n",
"\n",
"For instance, this would have been a great approach for our bear classifier. One problem with the bear classifier that we rolled out before is that if a user uploaded something that wasn't any kind of bear, the model would still say it was either a grizzly, black, or teddy bear — it had no ability to predict \"not a bear at all\". In fact, after we have completed this chapter, it would be a great exercise for you to go back to your image classifier application, and try to retrain it using the multi-label technique. And then, tested by passing in an image which is not of any of your recognised classes.\n",
"\n",
"In practice, we have not seen many examples of people training multi-label classifiers for this purpose. But we very often see both users and developers complaining about this problem. It appears that this simple solution is not at all widely understood or appreciated. Because in practice it is probably more common to have some images with zero matches or more than one match, we should probably expect in practice that multi-label classifiers are more widely applicable than single label classifiers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For our example we are going to use the *Pascal* dataset, which can have more than one kind of classified object per image.\n",
"\n",
"We begin by downloading and extracting the dataset as per usual:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fastai2.vision.all import *\n",
"path = untar_data(URLs.PASCAL_2007)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This dataset is different to the ones we have seen before, and that it is not structured by file name or folder, but instead comes with a CSV (comma separated values) file telling us what labels to use for each image. We can have a look at the CSV file by reading it into a Pandas DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
fname
\n",
"
labels
\n",
"
is_valid
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
000005.jpg
\n",
"
chair
\n",
"
True
\n",
"
\n",
"
\n",
"
1
\n",
"
000007.jpg
\n",
"
car
\n",
"
True
\n",
"
\n",
"
\n",
"
2
\n",
"
000009.jpg
\n",
"
horse person
\n",
"
True
\n",
"
\n",
"
\n",
"
3
\n",
"
000012.jpg
\n",
"
car
\n",
"
False
\n",
"
\n",
"
\n",
"
4
\n",
"
000016.jpg
\n",
"
bicycle
\n",
"
True
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" fname labels is_valid\n",
"0 000005.jpg chair True\n",
"1 000007.jpg car True\n",
"2 000009.jpg horse person True\n",
"3 000012.jpg car False\n",
"4 000016.jpg bicycle True"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(path/'train.csv')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the list of categories in each image is shown as a space delimited string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sidebar: Pandas and DataFrames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"No, it’s not actually a panda! *Pandas* is a Python library that is used to manipulate and analysis tabular and timeseries data. The main class is `DataFrame`, which represents a table of rows and columns. You can get a DataFrame from a CSV file, a database table, python dictionaries, and many other sources. In Jupyter, a DataFrame is output as a formatted table, as you see above.\n",
"\n",
"You can access rows and columns of a DataFrame with the `iloc` property, which lets you access rows and columns as if it is a matrix:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"fname 000005.jpg\n",
"labels chair\n",
"is_valid True\n",
"Name: 0, dtype: object"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:,0]\n",
"df.iloc[0,:]\n",
"# Trailing ‘:’s are always optional (in numpy, PyTorch, pandas, etc),\n",
"# so this is equivalent:\n",
"df.iloc[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also grab a column by name by indexing into a DataFrame directly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 000005.jpg\n",
"1 000007.jpg\n",
"2 000009.jpg\n",
"3 000012.jpg\n",
"4 000016.jpg\n",
" ... \n",
"5006 009954.jpg\n",
"5007 009955.jpg\n",
"5008 009958.jpg\n",
"5009 009959.jpg\n",
"5010 009961.jpg\n",
"Name: fname, Length: 5011, dtype: object"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['fname']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can create new columns and do calculations using columns:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas is a fast and flexible library, and is an important part of every data scientist’s Python toolbox. Unfortunately, its API can be rather confusing and surprising, so it takes a while to get familiar with it. If you haven’t used Pandas before, we’d suggest going through a tutorial; we are particularly fond of the book “*Python for Data Analysis*” by Wes McKinney, the creator of Pandas. It also covers other important libraries like matplotlib and numpy. We will try to briefly describe Pandas functionality we use as we come across it, but will not go into the level of detail of McKinney’s book."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### End sidebar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Constructing a data block"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How do we convert from a `DataFrame` object to a `DataLoaders` object? We generally suggest using the data block API for creating a `DataLoaders` object, where possible, since it provides a good mix of flexibility and simplicity. Here we will show you the steps that we take to use the data blocks API to construct a `DataLoaders` object in practice, using this dataset as an example.\n",
"\n",
"As we have seen, PyTorch and fastai have two main classes for representing and accessing a training set or validation set:\n",
"\n",
"- `Dataset`: a collection which returns a tuple of your independent and dependent variable for a single item\n",
"- `DataLoader`: an iterator which provides a stream of mini batches, where each mini batch is a couple of a batch of independent variables and a batch of dependent variables\n",
"\n",
"On top of these, fastai provides two classes for bringing your training and validation sets together:\n",
"\n",
"- `Datasets`: an object which contains a training `Dataset` and a validation `Dataset`\n",
"- `DataLoaders`: an object which contains a training `DataLoader` and a validation `DataLoader`\n",
"\n",
"Since a `DataLoader` builds on top of a `Dataset`, and adds additional functionality to it (collating multiple items into a mini batch), it’s often easiest to start by creating and testing `Datasets`, and then look at `DataLoaders` after that’s working.\n",
"\n",
"When we create a `DataBlock`, we build up gradually, step-by-step, and use the notebook to check our data along the way. This is a great way to make sure that you maintain momentum as you are coding, and that you keep an eye out for any problems. It’s easy to debug, because you know that if there are any problems, it is in the line of code you just typed!\n",
"\n",
"Let’s start with the simplest case, which is a data block created with no parameters:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dblock = DataBlock()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can create a `Datasets` object from this. The only thing needed is a source, in this case, our dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dsets = dblock.datasets(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"this contains a `train` and a “valid” dataset, which we can index into:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(fname 008663.jpg\n",
" labels car person\n",
" is_valid False\n",
" Name: 4346, dtype: object, fname 008663.jpg\n",
" labels car person\n",
" is_valid False\n",
" Name: 4346, dtype: object)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dsets.train[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, this simply returns a row of the dataframe, twice. This is because by default, the datablock assumes we have two things: input and target. We are going to need to grab the appropriate fields from the DataFrame, which we can do by passing `get_x` and `get_y` functions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('005620.jpg', 'aeroplane')"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dblock = DataBlock(get_x = lambda r: r['fname'], get_y = lambda r: r['labels'])\n",
"dsets = dblock.datasets(df)\n",
"dsets.train[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, rather than defining a function in the usual way, we are using Python’s *lambda* keyword. This is just a shortcut for defining and then referring to a function. The above is identical to the following more verbose approach:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('002549.jpg', 'tvmonitor')"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def get_x(r): return r['fname']\n",
"def get_y(r): return r['labels']\n",
"dblock = DataBlock(get_x = get_x, get_y = get_y)\n",
"dsets = dblock.datasets(df)\n",
"dsets.train[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"lambda functions are great for quickly iterating, however they are not compatible with serialization, so we advise you to use the more verbose approach if you want to export your `Learner` after training (they are fine if you are just experimenting)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the independent variable will need to be converted into a complete path, so that we can open it as an image, and the second will need to be split on the space character (which is the default for Python’s split function) so that it becomes a list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide\n",
"Path.BASE_PATH = path"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Path('train/002844.jpg'), ['train'])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def get_x(r): return path/'train'/r['fname']\n",
"def get_y(r): return r['labels'].split(' ')\n",
"dblock = DataBlock(get_x = get_x, get_y = get_y)\n",
"dsets = dblock.datasets(df)\n",
"dsets.train[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To actually open the image and do the conversion to tensors, we will need to use a set of transforms; block types will provide us with those. We can use the same block types that we have used previously, with one exception. The `ImageBlock` will work fine again, because we have a path which points to a valid image, but the `CategoryBlock` is not going to work. The problem is: that block returns a single integer. But we need to be able to have multiple labels for each item. To solve this, we use a `MultiCategoryBlock`. This type of block expects to receive a list of strings, as we have in this case, so let’s test it out:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(PILImage mode=RGB size=500x375,\n",
" TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]))"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),\n",
" get_x = get_x, get_y = get_y)\n",
"dsets = dblock.datasets(df)\n",
"dsets.train[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, our list of categories is not encoded in the same way that it was for the regular CategoryBlock. In that case, we had a single integer, representing which category was present, based on its location in our vocab. In this case, however, we instead have a list of zeros, with a one in any position where that category is present. For example, if there is a one in the second and fourth positions, then that means that vocab items two and four are present in this image. This is known as *one hot encoding*. The reason we can’t easily just use a list of category indices, is that each list would be a different length, and PyTorch requires tensors, where everything has to be the same length."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> jargon: One hot encoding: using a vector of zeros, with a one in each location that is represented in the data, to encode a list of integers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let’s check what the categories represent for this example (we are using the convenient torch.where function, which tells us all of the indices where our condition is true or false):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(#1) ['dog']"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"idxs = torch.where(dsets.train[0][1]==1.)[0]\n",
"dsets.train.vocab[idxs]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With numpy arrays, PyTorch tensors, and fastai’s L class, you can index directly using a list or vector, which makes a lot of code (such as this example) much clearer and more concise.\n",
"\n",
"We have ignored the column `is_valid` up until now, which means that `DataBlock` has been using a random split by default. To explicitly choose the elements of our validation set, we need to write a function and pass it to `splitter` (or use one of fastai's predefined functions or classes). It will take the items (here our whole dataframe) and must return two (or more) list of integers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(PILImage mode=RGB size=500x333,\n",
" TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def splitter(df):\n",
" train = df.index[~df['is_valid']].tolist()\n",
" valid = df.index[df['is_valid']].tolist()\n",
" return train,valid\n",
"\n",
"dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),\n",
" splitter=splitter,\n",
" get_x=get_x, \n",
" get_y=get_y)\n",
"\n",
"dsets = dblock.datasets(df)\n",
"dsets.train[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we have discussed, a `DataLoader` collates the items from a `Dataset` into a mini batch. This is a tuple of tensors, where each tensor simply stacks the items from that location in the `Dataset` item. Now that we have confirmed that the individual items look okay there's one more step we need to ensure we can create our `DataLoaders`, which is to ensure that every item is of the same size. To do this, we can use `RandomResizedCrop`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),\n",
" splitter=splitter,\n",
" get_x=get_x, \n",
" get_y=get_y,\n",
" item_tfms = RandomResizedCrop(128, min_scale=0.35))\n",
"dls = dblock.dataloaders(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now we can display a sample of our data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"dls.show_batch(rows=1, cols=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And remember that if anything goes wrong when you create your `DataLoaders` from your `DataBlock`, or if you want to view exactly what happens with your `DataBlock`, you can use the `summary` method we presented in the last chapter."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Binary cross entropy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we'll create our `Learner`. We saw in <> that a `Learner` object contains four main things: the model, a `DataLoaders` object, an `Optimizer`, and the loss function to use. We already how our `DataLoaders`, and we can leverage fastai's `resnet` models (which we'll learn how to create from scratch later), and we know how to create an `SGD` optimizer. So let's focus on ensuring we have a suitable loss function. To do this, let's use `cnn_learner` to create a `Learner`, so we can look at its activations:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = cnn_learner(dls, resnet18)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also saw that the model in a `Learner` is generally an object of a class inheriting from `nn.Module`, and that you can call it using parentheses and it will return the activations of a model. You should pass it your independent variable, as a mini batch. We can try it out by grabbing a mini batch from our `DataLoader`, and then passing it to the model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([64, 20])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x,y = dls.train.one_batch()\n",
"activs = learn.model(x)\n",
"activs.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Have a think about why `activs` has this shape… We have a batch size of 64. And we need to calculate the probability of each of 20 categories. Here’s what one of those activations looks like:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([-1.0028, 0.3400, -0.5906, 0.7806, 3.1160, -0.1994, 1.3180, 1.6361, -1.7553, 0.2217, 2.8052, 1.3229, 0.9369, -1.4760, -0.3204, -2.3116, -3.8615, -1.5931, 0.0745, -3.6006],\n",
" device='cuda:5', grad_fn=)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"activs[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> note: Knowing how to manually get a mini batch and pass it into a model, and look at the activations and loss, is really important for debugging your model. It is also very helpful for learning, so that you can see exactly what is going on."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"They aren’t yet scaled between zero and one. We learned in <> how to scale activations to be between zero and one: the `sigmoid` function. We also saw how to calculate a loss based on this--this is our loss function from <>, with the addition of `log` as discussed in the last chapter:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def binary_cross_entropy(inputs, targets):\n",
" inputs = inputs.sigmoid()\n",
" return torch.where(targets==1, 1-inputs, inputs).log().mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that because we have a one-hot encoded dependent variable, we can't directly use `nll_loss` or `softmax` (and therefore we can't use `cross_entropy`):\n",
"\n",
"- **softmax**, as we saw, requires that all predictions sum to one, and tends to push one activation to be much larger than the others (due to the use of `exp`); however, we may well have multiple objects that we're confident appear in an image, so restricting the maximum sum of activations to one is not a good idea. By the same reasoning, we may want the sum to be *less* than one, if we don't think *any* of the categories appear in an image.\n",
"- **nll_loss**, as we saw, returns the value of just one activation: the single activation corresponding with the single label for an item. This doesn't make sense when we have multiple labels.\n",
"\n",
"On the other hand, the `binary_cross_entropy` function, which is just `mnist_loss` along with `log`, provides just what we need, thanks to the magic of PyTorch's elementwise operations. Each activation will be compared to each target for each column, so we don't have to do anything to make this function work for multiple colums."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> j: One of the things I really like about working with libraries like PyTorch, with broadcasting and elementwise operations, is that quite frequently I find I can write code that works equally well for a single item, or a batch of items, without changes. `binary_cross_entropy` is a great example of this. By using these operations, we don't have to write loops ourselves, and can rely on PyTorch to do the looping we need as appropriate for the rank of the tensors we're working with."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PyTorch already provides this function for us. In fact, it provides a number of versions, with rather confusing names!\n",
"\n",
"`F.binary_cross_entropy`, and it's module equivalent `nn.BCELoss`, calculate cross entropy on a one-hot encoded target, but do not include the initial `sigmoid`. Normally for one-hot encoded targets you'll want `F.binary_cross_entropy_with_logits` (or `nn.BCEWithLogitsLoss`), which do both sigmoid and binary cross entropy in a single function, as in our example above.\n",
"\n",
"The equivalent for single-label datasets (like MNIST or Pets), where the target is encoded as a single integer, is `F.nll_loss` or `nn.NLLLoss` for the version without the initial softmax, and `F.cross_entropy` or `nn.CrossEntropyLoss` for the version with the initial softmax.\n",
"\n",
"Since we have a one-hot encoded target, we will use `BCEWithLogitsLoss`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(1.0082, device='cuda:5', grad_fn=)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loss_func = nn.BCEWithLogitsLoss()\n",
"loss = loss_func(activs, y)\n",
"loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We don't actually need to tell fastai to use this loss function (although we can if we want) since it will be automatically chosen for us. fastai knows that the `DataLoaders` have multiple category labels, so it will use `nn.BCEWithLogitsLoss` by default.\n",
"\n",
"One change compared to the last chapter is the metric we use: since we are in a multilabel problem, we can't use the accuracy function. Why is that? Well accuracy was comparing our outputs to our targets like so:\n",
"\n",
"```python\n",
"def accuracy(inp, targ, axis=-1):\n",
" \"Compute accuracy with `targ` when `pred` is bs * n_classes\"\n",
" pred = inp.argmax(dim=axis)\n",
" return (pred == targ).float().mean()\n",
"```\n",
"\n",
"The class predicted was the one with the highest activation (this is what `argmax` does). Here it doesn't work because we could have more than one prediction on a single image. After applying the sigmoid to our activations (to make them between 0 and 1), we need to decide which ones are 0s and which ones are 1s by picking a *threshold*. Each value above the threshold will be considered as a 1, and each value lower than the threshold will be considered a 0:\n",
"\n",
"```python\n",
"def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):\n",
" \"Compute accuracy when `inp` and `targ` are the same size.\"\n",
" if sigmoid: inp = inp.sigmoid()\n",
" return ((inp>thresh)==targ.bool()).float().mean()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we pass `accuracy_multi` directly as a metric, it will use the default value for `threshold`, which is 0.5. We might want to adjust that default and create a new version of `accuracy_multi` that has a different default. To help with this, there is a function in python called `partial`. It allows us to *bind* a function with some arguments or keyword arguments, making a new version of that function that, whenever it is called, always includes those arguments. For instance, here is a simple function taking two arguments:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('Hello Jeremy.', 'Ahoy! Jeremy.')"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def say_hello(name, say_what=\"Hello\"): return f\"{say_what} {name}.\"\n",
"say_hello('Jeremy'),say_hello('Jeremy', 'Ahoy!')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can switch to a French version of that function by using `partial`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('Bonjour Jeremy.', 'Bonjour Sylvain.')"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f = partial(say_hello, say_what=\"Bonjour\")\n",
"f(\"Jeremy\"),f(\"Sylvain\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now train our model. Let's try setting the accuracy threshold to 0.2 for our metric:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"