mirror of
https://github.com/fastai/fastbook.git
synced 2025-04-04 01:40:44 +00:00
704 lines
16 KiB
Plaintext
704 lines
16 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#hide\n",
|
|
"# !pip install -Uqq fastbook\n",
|
|
"import fastbook\n",
|
|
"fastbook.setup_book()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#hide\n",
|
|
"from fastbook import *\n",
|
|
"from fastai.vision.widgets import *"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# From Model to Production"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## The Practice of Deep Learning"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Starting Your Project"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### The State of Deep Learning"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Computer vision"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Text (natural language processing)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Combining text and images"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Tabular data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Recommendation systems"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Other data types"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### The Drivetrain Approach"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Gathering Data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# clean\n",
|
|
"To download images with Bing Image Search, sign up at [Microsoft Azure](https://azure.microsoft.com/en-us/services/cognitive-services/bing-web-search-api/) for a free account. You will be given a key, which you can copy and enter in a cell as follows (replacing 'XXX' with your key and executing it):"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"key = os.environ.get('AZURE_SEARCH_KEY', 'XXX')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"search_images_bing"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"results = search_images_bing(key, 'grizzly bear')\n",
|
|
"ims = results.attrgot('content_url')\n",
|
|
"len(ims)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#hide\n",
|
|
"ims = ['http://3.bp.blogspot.com/-S1scRCkI3vY/UHzV2kucsPI/AAAAAAAAA-k/YQ5UzHEm9Ss/s1600/Grizzly%2BBear%2BWildlife.jpg']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"dest = 'images/grizzly.jpg'\n",
|
|
"download_url(ims[0], dest)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"im = Image.open(dest)\n",
|
|
"im.to_thumb(128,128)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"bear_types = 'grizzly','black','teddy'\n",
|
|
"path = Path('bears')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"if not path.exists():\n",
|
|
" path.mkdir()\n",
|
|
" for o in bear_types:\n",
|
|
" dest = (path/o)\n",
|
|
" dest.mkdir(exist_ok=True)\n",
|
|
" results = search_images_bing(key, f'{o} bear')\n",
|
|
" download_images(dest, urls=results.attrgot('contentUrl'))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"fns = get_image_files(path)\n",
|
|
"fns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"failed = verify_images(fns)\n",
|
|
"failed"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"failed.map(Path.unlink);"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Sidebar: Getting Help in Jupyter Notebooks"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### End sidebar"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## From Data to DataLoaders"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"bears = DataBlock(\n",
|
|
" blocks=(ImageBlock, CategoryBlock), \n",
|
|
" get_items=get_image_files, \n",
|
|
" splitter=RandomSplitter(valid_pct=0.2, seed=42),\n",
|
|
" get_y=parent_label,\n",
|
|
" item_tfms=Resize(128))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"dls = bears.dataloaders(path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"dls.valid.show_batch(max_n=4, nrows=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"bears = bears.new(item_tfms=Resize(128, ResizeMethod.Squish))\n",
|
|
"dls = bears.dataloaders(path)\n",
|
|
"dls.valid.show_batch(max_n=4, nrows=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"bears = bears.new(item_tfms=Resize(128, ResizeMethod.Pad, pad_mode='zeros'))\n",
|
|
"dls = bears.dataloaders(path)\n",
|
|
"dls.valid.show_batch(max_n=4, nrows=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"bears = bears.new(item_tfms=RandomResizedCrop(128, min_scale=0.3))\n",
|
|
"dls = bears.dataloaders(path)\n",
|
|
"dls.train.show_batch(max_n=4, nrows=1, unique=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Data Augmentation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"bears = bears.new(item_tfms=Resize(128), batch_tfms=aug_transforms(mult=2))\n",
|
|
"dls = bears.dataloaders(path)\n",
|
|
"dls.train.show_batch(max_n=8, nrows=2, unique=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Training Your Model, and Using It to Clean Your Data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"bears = bears.new(\n",
|
|
" item_tfms=RandomResizedCrop(224, min_scale=0.5),\n",
|
|
" batch_tfms=aug_transforms())\n",
|
|
"dls = bears.dataloaders(path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"learn = cnn_learner(dls, resnet18, metrics=error_rate)\n",
|
|
"learn.fine_tune(4)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"interp = ClassificationInterpretation.from_learner(learn)\n",
|
|
"interp.plot_confusion_matrix()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"interp.plot_top_losses(5, nrows=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"cleaner = ImageClassifierCleaner(learn)\n",
|
|
"cleaner"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#hide\n",
|
|
"# for idx in cleaner.delete(): cleaner.fns[idx].unlink()\n",
|
|
"# for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Turning Your Model into an Online Application"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Using the Model for Inference"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"learn.export()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"path = Path()\n",
|
|
"path.ls(file_exts='.pkl')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"learn_inf = load_learner(path/'export.pkl')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"learn_inf.predict('images/grizzly.jpg')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"learn_inf.dls.vocab"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Creating a Notebook App from the Model"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"btn_upload = widgets.FileUpload()\n",
|
|
"btn_upload"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#hide\n",
|
|
"# For the book, we can't actually click an upload button, so we fake it\n",
|
|
"btn_upload = SimpleNamespace(data = ['images/grizzly.jpg'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"img = PILImage.create(btn_upload.data[-1])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"out_pl = widgets.Output()\n",
|
|
"out_pl.clear_output()\n",
|
|
"with out_pl: display(img.to_thumb(128,128))\n",
|
|
"out_pl"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pred,pred_idx,probs = learn_inf.predict(img)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"lbl_pred = widgets.Label()\n",
|
|
"lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'\n",
|
|
"lbl_pred"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"btn_run = widgets.Button(description='Classify')\n",
|
|
"btn_run"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def on_click_classify(change):\n",
|
|
" img = PILImage.create(btn_upload.data[-1])\n",
|
|
" out_pl.clear_output()\n",
|
|
" with out_pl: display(img.to_thumb(128,128))\n",
|
|
" pred,pred_idx,probs = learn_inf.predict(img)\n",
|
|
" lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'\n",
|
|
"\n",
|
|
"btn_run.on_click(on_click_classify)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#hide\n",
|
|
"#Putting back btn_upload to a widget for next cell\n",
|
|
"btn_upload = widgets.FileUpload()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"VBox([widgets.Label('Select your bear!'), \n",
|
|
" btn_upload, btn_run, out_pl, lbl_pred])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Turning Your Notebook into a Real App"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#hide\n",
|
|
"# !pip install voila\n",
|
|
"# !jupyter serverextension enable --sys-prefix voila "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Deploying your app"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## How to Avoid Disaster"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Unforeseen Consequences and Feedback Loops"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Get Writing!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Questionnaire"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"1. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.\n",
|
|
"1. Where do text models currently have a major deficiency?\n",
|
|
"1. What are possible negative societal implications of text generation models?\n",
|
|
"1. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?\n",
|
|
"1. What kind of tabular data is deep learning particularly good at?\n",
|
|
"1. What's a key downside of directly using a deep learning model for recommendation systems?\n",
|
|
"1. What are the steps of the Drivetrain Approach?\n",
|
|
"1. How do the steps of the Drivetrain Approach map to a recommendation system?\n",
|
|
"1. Create an image recognition model using data you curate, and deploy it on the web.\n",
|
|
"1. What is `DataLoaders`?\n",
|
|
"1. What four things do we need to tell fastai to create `DataLoaders`?\n",
|
|
"1. What does the `splitter` parameter to `DataBlock` do?\n",
|
|
"1. How do we ensure a random split always gives the same validation set?\n",
|
|
"1. What letters are often used to signify the independent and dependent variables?\n",
|
|
"1. What's the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?\n",
|
|
"1. What is data augmentation? Why is it needed?\n",
|
|
"1. What is the difference between `item_tfms` and `batch_tfms`?\n",
|
|
"1. What is a confusion matrix?\n",
|
|
"1. What does `export` save?\n",
|
|
"1. What is it called when we use a model for getting predictions, instead of training?\n",
|
|
"1. What are IPython widgets?\n",
|
|
"1. When might you want to use CPU for deployment? When might GPU be better?\n",
|
|
"1. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?\n",
|
|
"1. What are three examples of problems that could occur when rolling out a bear warning system in practice?\n",
|
|
"1. What is \"out-of-domain data\"?\n",
|
|
"1. What is \"domain shift\"?\n",
|
|
"1. What are the three steps in the deployment process?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Further Research"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"1. Consider how the Drivetrain Approach maps to a project or problem you're interested in.\n",
|
|
"1. When might it be best to avoid certain types of data augmentation?\n",
|
|
"1. For a project you're interested in applying deep learning to, consider the thought experiment \"What would happen if it went really, really well?\"\n",
|
|
"1. Start a blog, and write your first blog post. For instance, write about what you think deep learning might be useful for in a domain you're interested in."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"jupytext": {
|
|
"split_at_heading": true
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|