fastbook/clean/11_midlevel_data.ipynb

889 lines
536 KiB
Plaintext
Raw Normal View History

2020-03-06 18:19:03 +00:00
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#hide\n",
"from utils import *\n",
"from IPython.display import display,HTML"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-15 22:04:52 +00:00
"# Data Munging with fastai's Mid-Level API"
2020-03-06 18:19:03 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-14 12:18:31 +00:00
"## Going Deeper into fastai's Layered API"
2020-03-06 18:19:03 +00:00
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fastai2.text.all import *\n",
"\n",
"dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"path = untar_data(URLs.IMDB)\n",
"dls = DataBlock(\n",
" blocks=(TextBlock.from_folder(path),CategoryBlock),\n",
" get_y = parent_label,\n",
" get_items=partial(get_text_files, folders=['train', 'test']),\n",
" splitter=GrandparentSplitter(valid_name='test')\n",
").dataloaders(path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transforms"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"files = get_text_files(path, folders = ['train', 'test'])\n",
"txts = L(o.open().read() for o in files[:2000])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-04-19 14:11:25 +00:00
"(#374) ['xxbos','xxmaj','well',',','\"','cube','\"','(','1997',')'...]"
2020-03-06 18:19:03 +00:00
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tok = Tokenizer.from_folder(path)\n",
"tok.setup(txts)\n",
"toks = txts.map(tok)\n",
"toks[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-04-19 14:11:25 +00:00
"tensor([ 2, 8, 76, 10, 23, 3112, 23, 34, 3113, 33])"
2020-03-06 18:19:03 +00:00
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"num = Numericalize()\n",
"num.setup(toks)\n",
"nums = toks.map(num)\n",
"nums[0][:10]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-04-19 14:11:25 +00:00
"(#10) ['xxbos','xxmaj','well',',','\"','cube','\"','(','1997',')']"
2020-03-06 18:19:03 +00:00
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nums_dec = num.decode(nums[0][:10]); nums_dec"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-04-19 14:11:25 +00:00
"'xxbos xxmaj well , \" cube \" ( 1997 )'"
2020-03-06 18:19:03 +00:00
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tok.decode(nums_dec)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-04-19 14:11:25 +00:00
"((#374) ['xxbos','xxmaj','well',',','\"','cube','\"','(','1997',')'...],\n",
" (#207) ['xxbos','xxmaj','conrad','xxmaj','hall','went','out','with','a','bang'...])"
2020-03-06 18:19:03 +00:00
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tok((txts[0], txts[1]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-14 12:18:31 +00:00
"### Writing Your Own Transform"
2020-03-06 18:19:03 +00:00
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-03-17 19:15:55 +00:00
"(3, 2.0)"
2020-03-06 18:19:03 +00:00
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
2020-03-17 19:15:55 +00:00
"def f(x:int): return x+1\n",
2020-03-06 18:19:03 +00:00
"tfm = Transform(f)\n",
2020-03-17 19:15:55 +00:00
"tfm(2),tfm(2.0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3, 2.0)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@Transform\n",
"def f(x:int): return x+1\n",
"f(2),f(2.0)"
2020-03-06 18:19:03 +00:00
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class NormalizeMean(Transform):\n",
" def setups(self, items): self.mean = sum(items)/len(items)\n",
" def encodes(self, x): return x-self.mean\n",
" def decodes(self, x): return x+self.mean"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-04-19 14:11:25 +00:00
"(3.0, -1.0, 2.0)"
2020-03-06 18:19:03 +00:00
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tfm = NormalizeMean()\n",
"tfm.setup([1,2,3,4,5])\n",
"start = 2\n",
"y = tfm(start)\n",
"z = tfm.decode(y)\n",
"tfm.mean,y,z"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([ 2, 8, 76, 10, 23, 3112, 23, 34, 3113, 33, 10, 8, 4477, 22, 88, 32, 10, 27, 42, 14])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tfms = Pipeline([tok, num])\n",
"t = tfms(txts[0]); t[:20]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'xxbos xxmaj well , \" cube \" ( 1997 ) , xxmaj vincenzo \\'s first movie , was one of the most interesti'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tfms.decode(t)[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-14 12:18:31 +00:00
"## TfmdLists and Datasets: Transformed Collections"
2020-03-06 18:19:03 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TfmdLists"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tls = TfmdLists(files, [Tokenizer.from_folder(path), Numericalize])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([ 2, 8, 91, 11, 22, 5793, 22, 37, 4910, 34, 11, 8, 13042, 23, 107, 30, 11, 25, 44, 14])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t = tls[0]; t[:20]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'xxbos xxmaj well , \" cube \" ( 1997 ) , xxmaj vincenzo \\'s first movie , was one of the most interesti'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tls.decode(t)[:100]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"xxbos xxmaj well , \" cube \" ( 1997 ) , xxmaj vincenzo 's first movie , was one of the most interesting and tricky ideas that xxmaj i 've ever seen when talking about movies . xxmaj they had just one scenery , a bunch of actors and a plot . xxmaj so , what made it so special were all the effective direction , great dialogs and a bizarre condition that characters had to deal like rats in a labyrinth . xxmaj his second movie , \" cypher \" ( 2002 ) , was all about its story , but it was n't so good as \" cube \" but here are the characters being tested like rats again . \n",
"\n",
" \" nothing \" is something very interesting and gets xxmaj vincenzo coming back to his ' cube days ' , locking the characters once again in a very different space with no time once more playing with the characters like playing with rats in an experience room . xxmaj but instead of a thriller sci - fi ( even some of the promotional teasers and trailers erroneous seemed like that ) , \" nothing \" is a loose and light comedy that for sure can be called a modern satire about our society and also about the intolerant world we 're living . xxmaj once again xxmaj xxunk amaze us with a great idea into a so small kind of thing . 2 actors and a blinding white scenario , that 's all you got most part of time and you do n't need more than that . xxmaj while \" cube \" is a claustrophobic experience and \" cypher \" confusing , \" nothing \" is completely the opposite but at the same time also desperate . \n",
"\n",
" xxmaj this movie proves once again that a smart idea means much more than just a millionaire budget . xxmaj of course that the movie fails sometimes , but its prime idea means a lot and offsets any flaws . xxmaj there 's nothing more to be said about this movie because everything is a brilliant surprise and a totally different experience that i had in movies since \" cube \" .\n"
]
}
],
"source": [
"tls.show(t)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cut = int(len(files)*0.8)\n",
"splits = [list(range(cut)), list(range(cut,len(files)))]\n",
"tls = TfmdLists(files, [Tokenizer.from_folder(path), Numericalize], \n",
" splits=splits)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([ 2, 8, 20, 30, 87, 510, 1570, 12, 408, 379, 4196, 10, 8, 20, 30, 16, 13, 12216, 202, 509])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tls.valid[0][:20]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(#50000) ['pos','pos','pos','pos','pos','pos','pos','pos','pos','pos'...]"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lbls = files.map(parent_label)\n",
"lbls"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((#2) ['neg','pos'], TensorCategory(1))"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cat = Categorize()\n",
"cat.setup(lbls)\n",
"cat.vocab, cat(lbls[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"TensorCategory(1)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tls_y = TfmdLists(files, [parent_label, Categorize()])\n",
"tls_y[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Datasets"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_tfms = [Tokenizer.from_folder(path), Numericalize]\n",
"y_tfms = [parent_label, Categorize()]\n",
"dsets = Datasets(files, [x_tfms, y_tfms])\n",
"x,y = dsets[0]\n",
"x[:20],y"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(tensor([ 2, 8, 20, 30, 87, 510, 1570, 12, 408, 379, 4196, 10, 8, 20, 30, 16, 13, 12216, 202, 509]),\n",
" TensorCategory(0))"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x_tfms = [Tokenizer.from_folder(path), Numericalize]\n",
"y_tfms = [parent_label, Categorize()]\n",
"dsets = Datasets(files, [x_tfms, y_tfms], splits=splits)\n",
"x,y = dsets.valid[0]\n",
"x[:20],y"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('xxbos xxmaj this movie had horrible lighting and terrible camera movements . xxmaj this movie is a jumpy horror flick with no meaning at all . xxmaj the slashes are totally fake looking . xxmaj it looks like some 17 year - old idiot wrote this movie and a 10 year old kid shot it . xxmaj with the worst acting you can ever find . xxmaj people are tired of knives . xxmaj at least move on to guns or fire . xxmaj it has almost exact lines from \" when a xxmaj stranger xxmaj calls \" . xxmaj with gruesome killings , only crazy people would enjoy this movie . xxmaj it is obvious the writer does n\\'t have kids or even care for them . i mean at show some mercy . xxmaj just to sum it up , this movie is a \" b \" movie and it sucked . xxmaj just for your own sake , do n\\'t even think about wasting your time watching this crappy movie .',\n",
" 'neg')"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t = dsets.valid[0]\n",
"dsets.decode(t)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dls = dsets.dataloaders(bs=64, before_batch=pad_input)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tfms = [[Tokenizer.from_folder(path), Numericalize], [parent_label, Categorize]]\n",
"files = get_text_files(path, folders = ['train', 'test'])\n",
"splits = GrandparentSplitter(valid_name='test')(files)\n",
"dsets = Datasets(files, tfms, splits=splits)\n",
"dls = dsets.dataloaders(dl_type=SortedDL, before_batch=pad_input)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"path = untar_data(URLs.IMDB)\n",
"dls = DataBlock(\n",
" blocks=(TextBlock.from_folder(path),CategoryBlock),\n",
" get_y = parent_label,\n",
" get_items=partial(get_text_files, folders=['train', 'test']),\n",
" splitter=GrandparentSplitter(valid_name='test')\n",
").dataloaders(path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-15 22:04:52 +00:00
"## Applying the Mid-Level Data API: SiamesePair"
2020-03-06 18:19:03 +00:00
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fastai2.vision.all import *\n",
"path = untar_data(URLs.PETS)\n",
"files = get_image_files(path/\"images\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class SiameseImage(Tuple):\n",
" def show(self, ctx=None, **kwargs): \n",
" img1,img2,same_breed = self\n",
" if not isinstance(img1, Tensor):\n",
" if img2.size != img1.size: img2 = img2.resize(img1.size)\n",
" t1,t2 = tensor(img1),tensor(img2)\n",
" t1,t2 = t1.permute(2,0,1),t2.permute(2,0,1)\n",
" else: t1,t2 = img1,img2\n",
" line = t1.new_zeros(t1.shape[0], t1.shape[1], 10)\n",
" return show_image(torch.cat([t1,line,t2], dim=2), \n",
" title=same_breed, ctx=ctx)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAASUAAAB6CAYAAAD5yEXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOy9abBl2VXn91tr73Pu8Kacs8asQSWVpKqShIRoCZWwEGISjRp344Y23e5AbjrCPRmD3SbcER3tiG5/dgcQgcM4gmgMNHQwBRjaCBkxWQEITUgqqaSqylJlZdaQw8s33OGcvffyh7XveS+zBqAMTUG8HSFlZb773jv3nN9ew3+tta+YGUfraB2to/VqWfoXfQFH62gdraN1eB0ZpaN1tI7Wq2odGaWjdbSO1qtqHRmlo3W0jtarah0ZpaN1tI7Wq2odGaWjdbSO1qtqHRmlo3W0jtarah0ZpaP1giUie4f+V0Rkfujv3/UXfX1H66/2kqPmyaP1cktEzgP/wMx+/WVeE80s/ae7qqP1V3kdRUpH60+9RORfi8hPi8hPicgu8HdF5P8UkX916DXvqwZt9fc7ROTnReR5EXlCRP7xX8ClH62/BOvIKB2tV7r+c+AngS3gp1/uhSISgF8G/gC4Hfh64H8Qka/7877Io/WXbx0ZpaP1StfvmNkvmVkxs/kf89p3AJtm9r+YWWdmXwL+D+A7//wv82j9ZVvxL/oCjtZf2vXUn+K1dwHnRGT70L8F4CN/pld0tP5KrCOjdLRe6bq5QrIPTA/9/ZZD//0U8EUze8Of+1Udrb/06yh9O1p/VuuTwLeIyHERuRX4Z4e+9lGgE5HvF5GxiAQReUhE3vYXc6lH69W8jozS0fqzWj8GPAI8CfxH4N+vvlDbBd4PfBVwHrgM/G/A5n/qizxar/511Kd0tI7W0XpVraNI6WgdraP1qlpHRuloHa2j9apaR0bpaB2to/WqWkdG6WgdraP1qlpHRuloHa2j9apaL9s8uSYTy3HBODSIFkIQQJEWWEIusLQAkggEoiQUIYiSVBEx2pJIGjlc5RMroALWkCyjBgEBE7ImtCgaAqYFKLTWkrWQLREskMWgGJiSrSAiAAQF/5IgmikIFENVSX0hqH9NA0BBaDAyIsKorfZZwnCdIQTMhCSZaEKfE+NRJPf+Pe24RVByzpj47/YfocTRmGYyZTJdR0Q4feasX4fB1rFjmApShCw9ljI5GX23YDydIKmQrLDYnxEk0o5HaFPvoQhmRp+WUARVkCLM9vaZbm5h6sP6uS8s+wV50WEmTKdTigqNKMvUI8UgKKoKuTDreiLCrJsTi6JRSF2mxEg322d35yopC2E0RtsRsye/hKU5jQa+eHlvuGdTUXIolZlECAGRAI3BElKGDmem1YDYjcwEycRSsNCSsoGUF2UmYojpC5ghGGaZpjSUYGRLUATU7xtZKBhmzkVQ/N9NEc2YyfC1gRn87wUjEDEyIQhNvJEZEUFV/WdIQQwsF0KjlJxpmgaNiqCUUvzn4r9PgtJMpsTRmMl0ndFoxNax48SgdOVGZoomulmHiFC6Jc107MykxGKxIISGEALNeERKiRCdndl8jya0RAGzysyxTYwMgHWFeVogqdD3mel0Shy1WMosc0JyeQEzIsJyOaexgARIXSZboe97dneugkWsadDxmNkTX8TSHFXlsef35KXszstGSjZONLGhK8YyGbMlLPrCfNeY9ZllSaS8IGVjWTqWSehN2E9GnzJqSg4NFEMM1AIqkWIBy0AphOwPXERIISNFsWCklLAiiASSZEIBK26QxMAEtLhxEHOwCoZpQACKYMXfd7ZCCYKhmBZyMgricAmgghTD8J+btBokUbIWvz6DJrS+gZpAjG7PzQwlIAYRwdSwkun7JaVPqBhtOwJzQ6Mls+g75vu7lBjRLCwzSAxMp1OCRDIGKAEowei6BfvzPSxEJuMxmgOp62lMiBYIonRpyXI+p9ExsYzQ7F+L2rB/7RJXLl8iGfRWyKVAdmMuqqRUmO9cpaBYX0AS1kYsQlsM9X+imy3ou8xaM6L0M6QY+Sa0bKSHmBFmS5h3+RAz/cDMvE8sk9AVBmasBHJofKNSUAtgOjAjZoRsJGRgRi28LDMhhAOHQTlkPCozKIJvVPP9SbY0MCMi5OSGbGAGPWDGhKT4z6zMgDMjAYJEQnPg7MwMM+dWYGAmdfkFzBgyMLO32KPECJ24UY6BtbW1gRlTIQBtG1kmZyaMxgMz0QKNCcRmYEYyAzMUf03JeWCm7zK9FUrOg+EVsYEZy2C939MVM2JlYGZv9xp9l1m3ODDT/TFtSC/bpzRqo+lqwwv+wANoEcqh12mAnLN7nhAwM3LORFFijCRLSDHWRiMCRhaj9OYPWyGaQP0dYu6Z/M1Xg6OFIB4VOT5QMu5FzT2YA6BYLmgMqJh7DzH8UUFPIZpgWjckgbZxTxUUkAYRIys0REyglIL1GZ0osYDogR1XEXfC9R7l4h6w6xaoKsfPnGJr8zjt5gbrm8ex1DPb36G/8Dhta+zHLTbP3sF0c4sYAhoC43bCsl/QzfbZ3t4mXb/GeOcSUFhKg2yeYnrydnIQxpMJMbaU3LP92OdJV5+l2zrN5m3nWF87Ri5LlrM59tQX6Hf32E+ZbtRw8v63kFOim+0znWyys7MNly+xvbvN8XsfJCFo6VnkJfPnn6fPRhxPafeu0VNYaoPMF4yCkIDz12bDPWmbwM3MaPBo7mZmVuxpvac5Z9oQUVVy7hEJTNsIlkHlBmaUghal+h03YoeYEa2vvYmZ1etelBkrQxQlImB6IzMEj5Abqc/fbmAmijuqUoo74hZCOXh/Ujd1ETxCKoWUnb+uW9CuTdjY2GBr8zijzS3WNrduYGakxt54ixN3vZa2bYkh0LQtTWhZ9gtm16+xtz+je/4Sa/MrmMBMx8SN40xP3k4hM1rfYKwtnTkz/c4V+rXjbN52js2NY/Rpye61qzTPfZl+d48rqUNHY8688e0s9nZJ/ZJ2NGG2tw+XL3Ft/zon7n6APncEURZ5yfWnvow0Y+J4Sphfp+REDhPybI9REBbFeGp79pKR0ssbpaa1QnZrXp++qGEl+E6kpizVOIg6iCNxb7V6uNkEsUzRwLEmkOiw5K+nOEiZg59Xgkcu7sn9T7VQf0+gWI8JkP3BGoL5haBSPKWRwDIbjTkIVqGXCJINQiQoqIJqpIme2oXQUEpi0fVYLqS+oDGysd4gEhAxVAQ1ZXe5RESYxsjurKPvOlCHPYqydeY4m6dPs7V5nM1jx+hTYeexT3NKM51lmh52Uw/3PsTGsdM8e+lpUr9gb/say6uXWV+fstYnehGWOdW0DfYFjt1yG5effJolmVvvuItQOq5evEgBln1i/dhxJiePs//ssywWCyx5xCgGqNBlT0kLRgnQ1JQirzaOCUGNKOrBvQLFPXik0ObCaCR0+Uaj1DQNZitmAlD+WGYQoRFuYCYVEMtYiGxGpUiPJXVjlgFRSjVzZp6KHmZGcaezSv9W6RL1KLoigo/vCdHDeISGLvVEUaiSQM4ZixBuYiZqQ4hGMSFqQ7bEbLFEDFJfGE3GjEeeLospJRhSjL0uISJEaVgul3TLJRL8njRxzObJNTZPn+b4idOsr6+zs7tPf/GLnNLsBrIzrqXC9C3vZqTKhSe+hARhb/sa3fPPsLa1yTQlOqDPmZwzuSj7AutnzrL91DPkNnD6zK2E0nHt0iWyGcs+cfzsWeL6GteffIpEwZI78lAgK5RihAJJDNSZSVYwu5EZl1MEFBQjxhuZ2Vtmnt5ZvDKj1IRoVEgBVgZqeNimK1oH4yHFIIYa8ibIhRhd7JnQEKaRtEw1BXMwpOoklj3NiiZudCrAEt2zYAqSKNlD6mVJ1eMJvRXaEMEKo6BgyrLvKny+H2IMQ+pnZCQ0jIK/vxCERTIsmW/OGq5WuYHxeMy4hUYVCUrXF7Z3ZjUUr9cqQikHGtd0fcod585x+eoVlsuOsuxpLdGo0ZshRWjEWIYAsSHt7VfDJ0SF3iCI0hTog+8mK4KFgHQ
"text/plain": [
"<Figure size 360x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"img = PILImage.create(files[0])\n",
"s = SiameseImage(img, img, True)\n",
"s.show();"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAASUAAAB6CAYAAAD5yEXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOy8eZhdRb3u/6mqtfbc85B0ks6ckBlImIR4GERQRIigDEcQHFEQwQHnAVQ4oqLicQAFRbkqIiAzKFOYBTIREkKGDpnT89x7WGtV1e+PWnt3g8o9z7nPPYfzu/19ns7urL327lprVb31ft/vWyWstYzHeIzHeLxZQv53N2A8xmM8xmNsjIPSeIzHeLypYhyUxmM8xuNNFeOgNB7jMR5vqhgHpfEYj/F4U8U4KI3HeIzHmyrGQWk8/o9DCPG/hBCX/3e3Yzz+/xHjoDQerwkhxA4hREEIMTzmZ9J/d7vG4/+d8P67GzAeb8p4t7X24f/uRozH/5sxzpTG438bQggphLhNCNEuhOgXQqwUQsz/J+c2CyHuj8/rFUI8Mea9KUKIPwshuoQQrwohLvqvu4rx+J8S46A0Hv/RuBeYA0wENgA3/5PzLgO2A03xuV8DEEKo+DteACYDbwcuE0K87f9us8fjf1qMg9J4/KO4M2Y6/UKIO621xlp7k7V2yFpbBC4Hlgkhsv/gsyEwCZhqrQ2stY/Hx48Aqq21V8XHtwE3Amf9V1zQePzPiXFQGo9/FCustbXxzwohhBJCfFcIsV0IMQhsi89r/Aef/Q6wE3hECNEmhLgsPj4NmDoG7PqBz+PY1HiMRyXGhe7x+I/EB4CTgONwgNMAdAHi9SdaaweBTwOfFkIsBh4TQjwP7Aa2Wmv/oRY1HuNRjnGmNB7/kagCSkAPkAGu/GcnCiHeLYSYJYQQwACg459ngUAI8VkhRCpmX4uFEMv+C9o/Hv+DYhyUxuM/Er8G9sU/G4Fn3uDcA4BHgWHgaeBaa+1T1toIx7YOA3YA3cD1QPX/vWaPx//EEOObvI3HeIzHmynGmdJ4jMd4vKliHJTGYzzG400V46A0HuMxHm+qGAel8RiP8XhTxTgojcd4jMebKt7QPJkVaau9IinlI6RBKQFIRAIogTZQsgpEhELhiQiJQAlJJCVCWBImIpIeY6t8whqQAqxPZDXSgkKAFWgZIY1EKoWVBjAkbAItDdpGKKvQwoKxYCXaGpwlBpQE95ZASI1BgLFIKYlCg5LuPakADAIfi0YIQTIR47NQlXYqpbBWEAmNZwWhjkglPXToPpNIJRBItNZY4f62+wqJl0zhpzOkMzmEEDQ1T3DtsFBTW4uVAmEEWoTYSKMjSxgUSWXSiMgQWUNxJI8SHolUEunH91AIrLWEUQmMQEoQRpAfHiFTXYOVEQA6NJTCIroYYK0gk8lgpMAXklIUIowFJZFSgjbkgxAPQT4o4BmJ9ARRoDGeR5AfYWiwl0gLVDKFTCTJ79yGjQr4UrG1e7hyzx5dsx0hBMYYFKLybACsFEgL1lqkFLjLEZW+IcTo+e5Y+bjEGINEYAVYa9BCoaMAE4a0bW2jsamGXVu2cOixxzNcyLO/bTtNU6ZQ7B9GJDxEdYZG4bHh1a3k80UOWXoIOizS2zfCYO8Akye3sH7to9zzh/s47rT3k0Wyed9LHLH8aNpWr2HS1PkkGgSlQpF7f3crBy9dyDmXXMYtv/w1Tz38BNZashOauOWW6znhqBO49FtXMrGhnuHCCM2TGnn5b09TN2EOw4P9HHzoEfTs3cWZK06mtWUCp577Afq37mHCojk888ijTJ0xj6p6qMtNxfctm9c/z29u/gtLFkxj+fLlfPArn+Kh2+7hO1+5kvMu/iiP3vkHfvj7W/nBp7/IU2ueJJlMcNoHP0Emk2Lj04/R3T7Ae973Pp5at4pkQtGxu48vXnMV37zgk1z65a9w5Ve+zpwjlvKRCy5g0ysb6N3fTevkGcw/bAFDg0X+ct+93HbdD5k0sYUzz/sA1914Pf/63tO596Y/otGcfdUPaC5K+oMCxd69vPDEs1zxq1/w9DPPMKm2nk0vvsiiYw5nYjLH1d/6CW897jA+9dH3/p3xthxvyJRsKsL3fAJjKUWWfAmKoaEwZMmHmpKJiHSRSFtKJqAUCUIrGIksYaSRVqKVD8YiLEirkMLDWIXVgDEo7UBDCEGkNMJIrLJEUYQ1AiEUkdAoA9Y4QBIWrABpHDiIuKMbLFYqZzM2AmvcdWtrMEpgkVhp0JHFILBoHM4KhLFY3PdGMgYkIdHSuPZZ8FXCga6v8DyvMngkCmHBQ2ClxRpNGJYwYYQUlkQiCdYBjTSaYhhQGBnCeB5SC0oahKfIZDIo4aGxgEQBRlmCoMhIYRirPNKpFFIroiDEtwLPKpSQBFGJUqGAL1N4JonU7j1P+oz07aenez+RhdAatDGgHZgLKYkiQ2GwF4PEhgZEhE14WA8SxiLdIYJ8kTDQZP0kJswjjEW/rmtZa7HWvgZgyoDjgKkMRK/93GvBqHIUGANayoHT1q3b8LDkh4bI1VSz8PBDMSrL5IXzeO7JlUyd2MqMWXOpr6mlYUoLAwP91DTX88jDD7Ltb+v5869+y1MPPUpNIsfElmZmzW2lWBphzgFLeM8553L0Sccw3NfJIUuPRkkYISQ/EtC9ex/dGzuZNHkmwmvi2su/wswFi/nqd77F6R/+AMcc/lbuu+9xzjn/XK77ty/x8oa13Pmbn3D5h77EXXetxkQRh5/4L5yy/AhK0vLrO+9iuNjNHb+5meyEJHMWz+Wir19FkoCVdz/JECWmzWvlqFPey+pdT3D1H37JkrceyfNrNhMVCyw8sJXOnds56JB/Yag0QG7WVNK+jy9gz45tdO3ex1BREugif3niYfxCyNrn13LgEcv4+gXnM3HKNHb1v4qfsvTub2OgZxMnnvBO/BqfYd3LH2/8DZ9676mUevaQpZovXH0l69as5bRzz+f5Z1bz8u5dXPyDH5Ms5Zlz5Hy8TIqBfsE3/3ATTzzyOBMnTmLjpu0c8bajuez8C1jx1lMYGdpFPkq9Eey8sU8pmfBseWYzAgcSCqQRmDHnSQVaa6SUMbuwaK3xhMTzPCIbIYwlm0yisGhhMaHruFqC56Y/TMw2DKOdWliw0qCEY0UyXtlgNCAMWOnOt67TWm2QnkIK6xiHsAgc+wkxeFZgZTwgUSR81+mVBISPEBYtwcfDCjDGYEONTEs8A0KO4rgUAiOozP7auEETBEWklNQ1N1JTXUeiuopcdR02CsmPDBLu2U4iYRnxaqieMIVMdQ2eUkilSCXSlMIiQX6E/v5+ooE+UoP7AUNJ+IjqRjINk9FKkEqn8bwERof0t71C1NtBUNNE9aSp5LK1aFOilC9gd28mHBpmJNIESZ+GAw5CRxFBfoRMuprBwX7o3k//UD91MxcRIZAmpKhLFLq6CLXFS2VIDPcRYihJH1EoklSCCNjRl6/ck0dWt1XYj4yZ0SjoiBhwLULIynuvB6bXdlA3UYDrY8OFPOl0moyfZNO2LcyaNQs/lSAsBuQLJba8+DINtTUseMsS/nLLHSw9+q00V9fTtncXDQ31tHd0UlvdRF+pg6OWH8aTf3oSL2dZfOjhPHbvg+wbGKKppo6ps2dzz913Mm9CK4oCR558HE/+dTUDhS7uuO1W5kybRl19mqb6Kdx92x186WtX8cLqp5g5fT5D9GJlmj/deAO/v/8Jtr/yMgcuX8Dq1WuY3jKVrdt2Ul2Vo6tzH1+96GJWnH4WTzz6Jz7wwfNZu2Yv+4fa6W8f4sAjDuSSz13A6W87g6tvuoGL3308n/jKNci0YHLLNDKpEgcsOZyLz/8wn/nSZby6qY0brv8G3Xt7mFg7kctv+l98+1Mfpq66iQ9+8YusfOgBHvv9bfzk9vvYtHUTt/zgWs797Af46+/u5XPX/Ii+/d388EvfQuYMe3ZtxvfTDAz0MX/eQXT072GkkGdg525+t3ojGx9/gmNXHMtFJ3wEmcngJwOWHjWXFx9+iREdcduqe1n14IvcdfufKUadfOj
"text/plain": [
"<Figure size 360x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"img1 = PILImage.create(files[1])\n",
"s1 = SiameseImage(img, img1, False)\n",
"s1.show();"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAASUAAACmCAYAAACcNJm1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOy9ebTl11Xf+dnnnN/93fnNryaVZskabCPL2BgbL4zBiRlM6AQD6SRAkl7pNCy6YRHS0CyIQ5qEeUgTt+ngDgGyWGmgoRkcHGNs3DZ40mDJkixLqlKp6lW9euOd7/0N55z+45zffbdsqeS1WAHReVvrqaru8BvO75zv/u7v3vs88d5zbMd2bMf2UjH1l30Bx3Zsx3Zsi3YMSsd2bMf2krJjUDq2Yzu2l5Qdg9KxHduxvaTsGJSO7diO7SVlx6B0bMd2bC8pOwalY/tzm4j8moi84y/7Oo7t/x92DErHdo2JyLMiMhWR0cLP6b/s6zq2/3rM/GVfwLG9JO1t3vs/+su+iGP7r9OOmdKxvaiJiBKR3xSRbRHpicgHReTuF/jspoi8J37uQEQ+tPDeDSLy2yKyKyLnReQ7/+Lu4tj+qtgxKB3bF2q/D9wBnAQ+DfzqC3zu+4BzwEb87A8BiIiOx/gEcAZ4C/B9IvKV/2Uv+9j+qtkxKB3b89nvRKbTE5Hf8d477/0ve++H3vsZ8A7g1SLSep7vFsBp4Ebvfe69/5P4+uuArvf+X8bXnwbeDXzLX8QNHdtfHTsGpWN7PvsG7/1y/PkGEdEi8hMick5EBsDT8XPrz/PdHwMuAO8XkWdE5Pvi6zcBNy6AXQ/4pwQ2dWzHNrdjofvYvhD7VuBrgDcTAGcN2AXkcz/ovR8A3wN8j4i8AviAiHwcuAg85b1/Xi3q2I6tsmOmdGxfiHWADNgHmsCPvtAHReRtInKbiAjQB2z8+TMgF5HvFZF6ZF+vEJFX/wVc/7H9FbJjUDq2L8T+HXA5/jwG/Ol1Pvsy4I+BEfAR4Oe99x/23pcEtvVa4FlgD/hFoPtf7rKP7a+iyfEmb8d2bMf2UrJjpnRsx3ZsLyk7BqVjO7Zje0nZMSgd27Ed20vKjkHp2I7t2F5Sdt06pe/6G3f7ydQynFrywjLJLdOZxeaOaZ5jvceVHutDzre0DmcdXkDwlAg4j/fhMx7A+1DcIoKIIITPCh7nQYlQevDOoQREBO8Faz0e0Erw4hAvWB+O4fFoFY4r8TsAWhQqUTgHSikQyMqSxGiMNnjxeBuuTyFYW6KUoLRCicID1lqM0Rij0aKwAt55tAh5lmOSBN2ooZSh0Wiha3XAo5OUpGZITI2yKGh2OtSbTRCNy2d4D/VWCxUuGMTjvEd5z2wyw9RqFNmM2WREWm9Sb3UQ5SmLkuHhHt57Go022oRHKFqjtAn3IoK1BfksQ+kEvMWWBcV4gmiNaTRw1oJSpPUGHo9zFq0TpoMB4DC1lLIsmY0n1FttlABKUMpQFjnOO977nve94Nz50EPn53/33odnLYJzbv58qtdfyMKzX0zEHH0vHnn+9+olEXACg16ftJGyde4C3fUVZlmGnWX09vt0VrtohIPdPa5cvcJSq8N0MuW5S5/lDW/6ajrdFU6cWmcwGKG0ZjTuMxlN+fQDH+eL7v8StFb87rt+kp3+HpQOa0u8c4jWuLJEtOZ1X/HXuPOeLyLLZlw4/xRPPPoA+9tXKLIMnCdJNOIcxhiM1liELC8wOqGm6ySNhNf9zb/Jn77vA7zxrV/Low98nG/+u/+ItdVldg8O+I/v/jf8w+/+Xqa9GaduvZlHHvwk7/7pn6DWavOjP/9O3veffoc//O3fxIrH5gXtpS7/60//Ip12m9FojBhhOJ7w9KOP8H/+ws+Q5zlf9y1/j/tf/RpW1tfp9/vUgP/0nvcwHfZ4+X2vodlZZnVjhWwy4uZbbmXv8JCbb76J4aDPZ598hntf/nKcs9TrDUTicxLhwvnzOGvZOLHJ1pVtTp3c5OvfdP8LPvjrgtJ//OCziPcgCu8USAAPACeCeNCAiMdHUMArHB5BcDjEVxNTcN4DggjgBe88SoFCEDzeCyUgPhwfBO/B4XHK4T1INTHxREQL53cCEs6t8OEcqpqoPi4IixGNFg0uDJjygsWCEsRpnPNoHQ4s3uNsuElnPWJAObAeSudAGay1+GmOSIHLS3Q9w2iDMiXeNXCqxFpLt9vFlyUiFrxHGR0eHKC0DiPmHd5bVJLgfFhwab1FLa3hbI63gjhIag28c7h4i8qHm3XOhUVvDIimlqYUtsR5IUnrIAqlFN6HsdbaIFoQrxExoB31RhPrSoqixFmHGI3o8IQsHq0Ek9TC/b+IVaCiVLhXEJQSvHcLYHIEWIvfi39DhanC52aJva+8HDiIc8pjwx947zHGsLa5wXg24enHn+DW227n5I2nmMwyPvWJB8gnY1rtLqV4brjlJlZPneD0TWfY393h/PkxFy88A6VlMhjR7+0xGgz48NZ5+jvPsbO3yyzPKZ3FO0dpHUliwsnLkk98+IM8d+FZ2t0lyixnd2ePQa8PROeZgdEJkpcopaindeppnfFkgtUFudX88a/8e3ZtxqQ/4Evf+JX8xq/+H4z7Gd/07X+fG2+5m6sXt3ns4Ud4gxKuXt3mf/mpn+cPf+v/4hMf+xjnnnwSEY0vMxr1lLM33oLWivF0wtXtK6ysreOdZXv7MoW1OOV569d/HR/704/SbLVYWu7wYz/4A3z927+JdneJ585vkeVT2sstuitdDkZDakYxHE4Yjad0ukt89EMf5tbbbyFJ61hncc5RupKlpWVKWzKZTGjWm+xfPbzuvLkuKFUgggdPAV4jBEBQASFwAgYBq8LC9y5OIgEVFn5AE4+Kk8VHAKrOUS1O5wNABSYU2FflAgUVJ6LgJYCeiEf5wJS8KFQEzfCVcO3ehT9L6xAlKBG8c6B1mNPiAj5Zh1KCdQ6Pq2ASrQWjFKLCAivDYKC8q4YGb10ATufAO0pjqKUeZRReG5JaLTykHJQKYJJiAutwFiUK7TyFs7jShWsWj0nrc+bnvYvjazHaUFIioiMQO8CR1upYa9HGoJzDFuCKEvEWkoS6buC9kGUZYBFRKG2wNoJE6UAEJ4ER4QWhoCgy0iSlnjYQ8Thrsc5ed2JVFgAmziN8vBc+D4SeD5RE4vPyHucDywrmw088tFQApeIL3lNP0/CcRDEbjLjrnrvYvrKFGaQc7uzQrRvuff2X09k4weH+IdoYmv0+H/3QR5kM9rjlZXeysbbO/v4Bp288gafg9KlNPv0nf8Bhb8Bhv4cXwVsHCvK8YJbFuQUMRgOcUrT6XZKkTjabBSYcnWOapjjrsc5hy5LcWjRCs1EnyzK0FhKtWMGw9fiDNNttvvbt34rRwrt/7if5H77/++l211nbXOfTDz9KojT1JOGbv/Xb+Cff8Y/53h/+YX7pX/8Mu7sjrNNMRmNsYcltxpkzJ+n3B3jveeqJJ6jXa7Q7Gzz71HN85I/ex2gw40Pv/S1e98av5MK5Z/mKt34Nm2fO8vSjn2V9bY3+4RDvR1y5tMXK+j4vu+ce1leXGPZ6LK+v02i1sGWJtSVaa7Isx6Q1nC0xOPL67Lpz5rqaknhB+yq8iqGWOBQS3JNX4IUSjxOHE0uJ4CIgBebkcXhwDvEOJ+G480noqimmwzkXz+biZPMqfFAFEEQCMIW5KLh4HoSF0FBFVicBZCKAehUXOQEgnRBDNYWSEPIFnhRYhVYLQ+Q9ygWs9SIYBIPChy8CHutsAClncc4iIpgqxKqGPN5zXuSUeU5WzHASQkjnBK1Aa402Gm0UohXeVxM+nEeJIEqwZUlR5OTZlDKGEEZrQCHBOyAoNCqMbsA78nzGbDrGW0GLCWGkD6xJeT1nuGU2Zby3y7B/QJbnFEUZPOuL1Ld9fug1H8TP+9wiIGkVxl2pz52anirSDWxb5iMq8Q0Xj62UxnuPjn+unTyB0Zp8OuPc40/QXd/g/je9kZUTZ8hzS1LT4AsuPfc0u5cv0mq12Nw4DRhWVlaRpM2JM2d47KPvJyt
"text/plain": [
"<Figure size 360x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"s2 = Resize(224)(s1)\n",
"s2.show();"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def label_func(fname):\n",
" return re.match(r'^(.*)_\\d+.jpg$', fname.name).groups()[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class SiameseTransform(Transform):\n",
" def __init__(self, files, label_func, splits):\n",
" self.labels = files.map(label_func).unique()\n",
" self.lbl2files = {l: L(f for f in files if label_func(f) == l) \n",
" for l in self.labels}\n",
" self.label_func = label_func\n",
" self.valid = {f: self._draw(f) for f in files[splits[1]]}\n",
" \n",
" def encodes(self, f):\n",
" f2,t = self.valid.get(f, self._draw(f))\n",
" img1,img2 = PILImage.create(f),PILImage.create(f2)\n",
" return SiameseImage(img1, img2, t)\n",
" \n",
" def _draw(self, f):\n",
" same = random.random() < 0.5\n",
" cls = self.label_func(f)\n",
" if not same: \n",
" cls = random.choice(L(l for l in self.labels if l != cls)) \n",
" return random.choice(self.lbl2files[cls]),same"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAASUAAAB6CAYAAAD5yEXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOy8d3QUV7r1/TvnVHVQlshC5GgyGBsMmGBwwAkb5zg2xgmHcRjHcQCPs8c5x3FO2OCAE8YkAyaLJJKEUEAoZ6lj1TnfH9Vgz/3W+F3vrHvf63uX9lq9pO7qrj5dXWfX3s+zTwtjDG1oQxva8EeB/O8eQBva0IY2/BZtpNSGNrThD4U2UmpDG9rwh0IbKbWhDW34Q6GNlNrQhjb8odBGSm1oQxv+UGgjpTa0oQ1/KLSRUhv+fxBCtPzmpoUQ4d/cv+i/e3xt+N8N0RaebMPvQQhRBMw2xiz5nedYxhjn/92o2vC/GW1KqQ3/1xBCPCiE+EQI8ZEQohm4WAjxvhBi7m+eMy1BaIfu5wghFgohqoUQ+4UQ1/03DL0N/wPQRkpt+HdxJvAhkA588ntPFEIoYBGwAegKHA/cJoSY+l89yDb8z0MbKbXh38UqY8zXxhhtjAn/H547FkgzxjxsjIkZYwqAN4Hz/+uH2Yb/abD+uwfQhv+xKP2/eG4PoLsQouE3jylg+X/qiNrwvwJtpNSGfxf/sUPSCiT95n7n3/xfCuQbY474Lx9VG/7Ho82+teE/C1uAU4QQmUKILsCNv9n2CxATQtwqhAgIIZQQYqgQ4sj/nqG24Y+MNlJqw38W3gZ2AcXA98DHhzYk4gInA0cDRUAN8CqQ9v96kG3446Mtp9SGNrThD4U2pdSGNrThD4U2UmpDG9rwh0IbKbWhDW34Q6GNlNrQhjb8odBGSm1oQxv+UPjd8GSyCBrXihBQNkJqlBKARPiAKLgaokaBcFAoLOEgESghcaRECINPOzjS4rddPmE0SAHGxjEu0oBCgBG40kFqiVQKIzWg8RkfrtS4xkEZhSsMaANG4hqNEAIAJcHbJBDSRSNAG6SUOHGNkt42qQA0AhuDixACvy/Bz0IdHqdSCmMEjnCxjCDuOgT8Fm7ce40v4EMgcV0XI7z39nYhsfwB7GASwaQUhBB06NjJG4eB9IwMjBQILXBFHOO4uI4hHosQSAoiHI1jNJHWEEpY+AJ+pJ04hkJgjCHuREELpAShBaGWVpLS0jHSW6zvxjXReAQ3EsMYQVJSEloKbCGJOnGENqAkUkpwNaFYHAtBKBbG0hJpCZyYi7YsYqFWmpvqcFyB8geQPj+h4gKME8aWivyalsPHrGvHdjSnhBjcL5X6aID4wUpSOim6Dp7GtJMs4sGjyF/5Lm88sYtBY2Fkx7HsoJHP/z6LR+95ntaI5sLrLya5fTvOuOBOAllw7slDWLdhPz8+/zZXzbuR8qYaZl92E189+RE/lBez+YeX+XLVJ4zvPJkmU80bi7/gyI6jGN9jGFc98waR+iaceJzWRoc7/nYidn2Yq/98I4W1RWzLXcXGNTuoj1dz6vjL+WT5Ak4cPZmXPvieTskpxBtbKNdNJKdaNNSG+O7rL5n/48uM7ZPNCw+s56xZU/nsm6+Y//zPDJzUHbcmnb5dHQb0asdZl5/HL4s3cuWcq8jdsJg6y2Zz3iZ27Gxk/YZ8jj9rIge3bcUfaeXe2x7l09w3+GVpHvc8+BgPzXmcqtZWJk0WXHvVDVx++UtYqpXtqxQzrmlHs9XMnhUh3v54Dnef9iV74mVsXvYtN9x3J7vL8ulswmR0aE+VaeCZx+bx98feormmiZkXn8bZk8/jxy++Y79/LW+8sZbChcvoO+V4enVzEIEAvY/qzZb5Bzj1nAlMvnQyexbXsalgPq/O/ZmbLx9HQ2YTRx1zPO+/8QnnnnMha7bsRYhGkivS6HPEYHqc3p5Xb1+AlWVTE91HhupOwcYidB/FETk5zJx1Kjce/4L4V7yj5s6d+y9J6aHH5s21pMLRBlcb4q7ANQYnCnGtcdE42kEbgYuD0RKEIOqCMQZbKozHBHh0phBSoTVgDMIIhDZIpZAIHOkitQRl0I5GCG/SGAyWEThGYKRBGDAClKsxUnr3MSBBSAthDBwiJwwagxYCicRIjXEB6W1TAoyU2IARAqTElXhjFxJXaKSUSA1SWlgShBIoqRBSgAFhJGCwELjSgPHeU0mbQNCPzxcg4PfhCpBaYyyLeDSMCiYjHZeIq1GWwm/bSGHhaAeQmHgMbQm06xB1Y1j+JJL8ftyoIRoL4RcWSiikELSEmhFIAoFkpKswjusddyNorisjEotiJ6UCBsd1ka5BWAqpFE7MoaWxFtufQjwSRSkNAT9au/i0wInFiYXCRMJRUDbpySm0VBYjDRglqQ/FDp8znbJitMQVqX39nHbqOJZ/vYfW+jgPP3YvfTIcSg4epDhvLwfKWgnWCujYyPSpk9GpDYzoO5EXvv2O8ceNJBxvZsa0QQyeOZUkGhjVrRuTRvTjlRVL0AegQ6dUfli1haPH9CCnVw/eeOFTjj5lKB8v+BpfAMaPGkd5vImcdh1YtmIbxvVR73c46eSBqJBk+b61yFiAx2//gCD1VBiXvC1ljBubzdNPLSUcaaW0qJZpJ0xl+LDB7N+/F59M4ofvvqeuvohRg/rz4gvL+XrRKtykCL17t7Dw63W4JkBpRQPp7VIJOBnUVa6lZ88hVFXs5/s1qxhw5Hh6djIU761k6nGj2Zm7kX59uzN4yEiS0qL0OaIPS5Yupuagn/ZdXEIyRP6mZkrLG1AByd33XkRSph8n1Ex5scWE8dN487NF2BGbHZuXc/NfL+Suv/yVlMxWOnT3s/rnYk6aOZ57bvmC9B6K5gPNdO5tyMrOJBoq5JwLZiObyklKTabVNHLG+ePYs62AxnKX777aQ6yhkszkKMdOH0Fu7jqyKsLsDx/gqDEzuPbC61i47DNyl+2kpbWO2vIyquJlDBo9CuW6zJpzGqeNGctDj33HzFlTuf+iq6ipqGNw764M6XfivH+LlB544IG5RhuMMbh4ykgLMBpcI9BagJFICdrVGAFGSDSGuOPiuBptJBHtEnc0tmWhEuRhXO89XCW8k9sYQCDwRJCQCSJ1QePtW6MPqxHjgk6wk8YjKoREOy4oiZAeaXnbPNKLo1HGIz1PXSksWyK8l4KwEXiEJ72RYrTBRF3wC+8RIQDhrbEw4GLAaE+9uAI3bgiHIzgxh0BSgIAdwAoG8AWTwXVpbW2iefcWnLpSqiqqcJHYto1MjN9nB9DG4ETCNDY3E6qqwBzYjVtVQmNZMY0NDWhp4RiDsC1QHomFy4oJFe2itqkR7bPx+ZMwQhOPx/HVlkN1JY2l+6mvPog/syPReIyWhnpwBPV1tejKg1Ts340vtR3hmCbW0kJzqJma0iIaGurQQuIP1eOG6mloqIWYiy0lrjY0ROKHz5mPF99NUiBI7qZtLF+cx4kn9GffbsM7iz5gyLCuZPmaGTtuGjt2bqJDlwh7cmPU11SzYP5Kjp7SlytuvIS6ulK+/mYl6R0kyQfKaAlFWfH5ZuYt/JT6/Bay+6ZyzrCxfLlxE9h+Bo/qjb86Qs/+A1m8ehG9Wy2sQDtWLlnFxXfN4YJzjmZ7eT5zLj+bV17+mCxl8/HirfQdIDl2/BAGZw9hT1MDQ3sFqC5LZW/ZQXw+w9gB/dl6oIQDu3ZwYE8Lnfp1QdGLaFUzJ50xnB8X5/Pd968jwha9B49mzaKfGJnTnp+Xf8PVtz9L3+5JbNyzGxoruPepDfTr3JmrL7uQfSUHaWpq4b13lvPlJwvJ6Ozw9+dfIWTVsGlFMb1H9KAkP4/vv1jM3vx8LrvgWr788lOCSnHNbRcTlDkM6zqUc2dN4ot3l1FaUUJTjaEoXk9uwXIGZqfz2DOfMvWE0az/vpSJU0ey9OtNWKkhunWCwSMm8P2yhWzatZE/nX4t+5qb2VK0k8n9RnLxCTdw1TUvMn36cRx/ZgfWbm2lPtTM2eNOIXfDdwS6dmZXbYzVyxYy8sQhDEkaQ5G7g4F
"text/plain": [
"<Figure size 360x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"splits = RandomSplitter()(files)\n",
"tfm = SiameseTransform(files, label_func, splits)\n",
"tfm(files[0]).show();"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAASUAAADICAYAAACuyvefAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOy9ebBt+VXf91m/YQ/nnDu++XX365bUrZZakyVrYIhASGAzxsSB4BSJjR1SZZdTFOUEO7ErVaSSSuIYKhMGHFcIRSAgx0KOTWwGM0hCQlJkkITUQlJLPXe/fuO994x7/4aVP3773Htft3gCCtyvXfdb9eree86+5/3OPfv33Wt913etLarKCU5wghPcKTAv9gJOcIITnOA4TkjpBCc4wR2FE1I6wQlOcEfhhJROcIIT3FE4IaUTnOAEdxROSOkEJzjBHYUTUjrBCU5wR+GElE7wAojI7Ni/LCLLYz9/94u9vhP8mw05MU+e4HYQkceA71XVf3mbY5yqxn99qzrBv8k4iZRO8IeGiPw3IvJuEflZEZkC/4GI/LSI/OCxY75+ILT1z3eLyHtF5KqIPCoif/1FWPoJXgI4IaUT/FHx7wD/F7AFvPt2B4qIBX4B+P+Au4BvAH5ARN71J73IE7z0cEJKJ/ij4jdV9Z+palbV5Zc59iuATVX9b1W1V9VHgP8d+At/8ss8wUsN7sVewAlesnjyD3HsvcAlEdk79pgFfuOPdUUn+DcCJ6R0gj8qnl8hmQOjYz+fP/b9k8DnVfXVf+KrOsFLHifp2wn+uPBx4FtEZEdELgDfd+y53wJ6EflPRaQRESsirxORP/3iLPUEdzJOSOkEf1z4SeAzwOPALwI/t35isAt8M/BW4DHgGvAPgM1/3Ys8wZ2PE5/SCU5wgjsKJ5HSCU5wgjsKJ6R0ghOc4I7CCSmd4AQnuKNwQkonOMEJ7iickNIJTnCCOwq3NU/+1//F96tPK+rYo6s95osDtE8c9JGuTyyXC/oUmS2XrKRmZaHyhklMSLdARXCuJjab9FT42mNNhYiSEboUcShRM6IZUbAClTOMqprKWVpf0dQV1jmwhpH1qDV476m8x3tL4yy2bfHe45zBWo9Yj3MO6x3WCsZ6rLVY61AD1pa3bozBiAMjiAgqFkNGxaKqqMkYteVYIJGQrOScy/Navu/mU27uXWdvOceEhDGGRpSD5QE3nn2Oa1efImVh1IzpjLIAej+irj33nDrLXWfuYms8JhuDiZmD2ZTnLl/nuSvP8tQzl7n+3BUODg6oqgpVRWJGbaKiZhl7FMNstWJvGZnGjtCXqqpqAgzrKquqIlLeK1lRKZ+1iBw+t/66Pn6N44+vkXMmxqMBAa957f2kpOAcfd/jRfG2YhmWkAU3aljuT9ne3gajpJARMVBZlosDtkebJI18xZvehK8aJjtnsMZgXUWfFZczSCAnSzuqyQmqqkJEWK1WqFpijCTteeyxJ/jFn/s5upT5j77vr9HYlmXoCUQQC31P3/eICL3z6HKOk5Y+LOj7Fe/9+feSy4f+koQzlqiGb/rWb+Ff/MI/KQ/eIcV2VZXf77nbktK4aXFZYJVJMmYclKWDzTqx6AO+cfR9ZJUjpu9JOAiJUe3Y2tjGG6GPif0cSJpYRWi8w3qLNY4RHieGYAx9v0BCIOVIzsPJniFoxqGA0ohBjWCtxVmLWoNYgzW+EAuAcahYxBx7z2LLRlz/HYyQouKcIxtDVsFgsChIRkXQPGzA7MiqWKeQv/RGNcghQTm1zCUwEQsq+AzEjMHgTHmNnBLJlfPDGAdqQBWLQRSiGf4PySRVSJmcM0LZcMYYshgEIeSe2hrmIdBpIguIAmRKH6w9XO+aUA6JxygCqMoLSOv4z8YYVFMhj+fh+SQVY/n9sOowviKGGdaA9zWV8yCGdnebHKHXHpsatFZiLjs/5IQTy6c/8zBf9daHED9C6cAqohbjhBgCzhs0C8YIfVhRVRVVVeG9J4TEbNFz8f4HaM+fhiv7bFUtfRYQT03F0sxxImAsfbdi0QeMQG1yudA1m9z/mlfx8jzhF3/vX0G+Q3bzHxJv/oq38dTVp7C2IsUI5Bd7SV8Wt03f3GhE3W5gxxt415B2dhlt71JvnGFna4NTk202JyNO1RWNM9iUaL1jY9JwbmfEy8/scPfuBmOX2LCOxfKAxXJKCAFjlGYyZmNrh3O7Zzl9+jzt1hbG1eAFaSrGp7fZ3N6kampAMLreyAZrLTjwWTDGEIgYBJMVtYpKRkQoWxdEhWSURMZmizOGmAMSC+kZIGMGgjDDRhQgY+0R6RgFK6b8X6I4KyAZMUo2Qs6ZVlwhQaNUzlN7S2UAFWLKJFNIxoiiKYIpr9+LkjG4DKLlo9GoRFEUgwqkDCEEnCREDREliUEi+GhxyhAJ2cM1Q3ks53wLiais3+MRjhMXw9/6S+H5x62Rc6bvOxJKih11M0FxxFXPYrEk0rFaBQ6mN/GmxviIicI4w2a7VaKslJkuIgcHEQ0rXFWRlhGbC5EZBM2elBJIxjlHHC4o1lq6bklbT9g2lm//89/OxrjhH/zY/4QfWZq2EG2TKwKWtq6o65qJM2y4Gt96vHPEEDB7M/7u//m/Yfh9L+p3NDQnHvm9z/L5T34aZzzG2hd7SX8g3D5S2tiCGJCVZ2Ucdc5466hDh/Qeuin1Qoldw4rMXhfwTY2rG5p2TNVadmrHvF8xzYmDaHhuPuV0VSEyYqsZsbNzClN5QtfT75zmynNPMB7VnD+7y5ndXSrrWM4XTG/ss1ouMQJZBNVEjoLxFZ0RHCUaysZisZhhEx+mZWQYnkkkQMrGN0IeTmgzkAmU6MEYQVXIOSHiynEAAkoaIowEauhjHiKpTNJIRQ0i9DGQUiLGSI4R0zhcspAD2ghiHTlnkEKoxkDISso95LLpbIYUe9CEEwOmIolCThiEmEOJrhykvI6QelQz69RtnbLdYpbNa8JaR1P5lq/ldTJIBv2DndDOe3wFiiGJIajSVjVtY0jRsexWTEYtdsNhe0VcRdJMUKFpGq5f3Sc5R1PXzBYd7fwaVXUeYy2ddoxsg0gCIl02aDY0xmK0RJL70wMwlhh7EjXbO+eYbG/RPLcgaU82DucTq5QY1cJitkDE4sRgrJC7KdaOsHbEy97+Jt7wlje9FIKLL4kE7O/fRHNkPNolxPmLvaQ/EG4bKbWjmnbc4EYNrt1gvLlLM9lisrXFeGNCO5rQVA1N1dK2LbWzqDPUzmMMbNWeU23DPae32KkMp2vLqU2PsZmmqWk2WiaTEZu1Z9RUbDY1d7/s5bz83ks8+MADvPr+V/KKl72ce++9l+2LZ/BNhWTF5IhJisHRGSWTig60JomskPXwCrfelHbIwEoEZbHWogNplfROh8jBHP4eHKU7VkpEpSkc+x1TIpCsxNyBsYSB3IpeZUu0psOackJJrCST14+ZclUThZxjeQ/i6LOSVIiaUclYKQQTU180LSnkabMZ1qN44zGkw2jpOBkdrnmtKQ0QOuBIFzp+jKqCOp4fEf1+nQCaMxotGhMOocKQYqSPNRih8jX7ix6NG+wv9+noiBrp+x4NHTt1gxeDRfjdT/0e1iTIQlLwTgihIyZhvpjiK0vOmZTK+02aMQY2JltY64mScd7wrd/9XXQWxpTXHo0mJRLOFlc5rGnZMharmd7UHBx09LKimi5xQ1T+/Pf/0oCBHAHP/a95I8b5F3tBfyDclpSaekTlaurxmMnmDhsbm4zaMe1oQt022LqhaWrGbcOo8oydI8aeVe5BMt4YfFuuZC5nGpsYqWCthxzZbFqc1HjvS0pkDHedOcfFC2c5f/oMk80xk80NNre32N0cM9qYII0QotIP6ZlVg9WSZgRJJQ3LcUhXiggNkEkl9dMjklmneGpKXqQiGAMiiiFjRREpRCUihxdMYwYiokQiDKK3qCeh2EFINUlIIZY0I8WygaKSMvhYtJGUS1rlBr3GGIOVoi15WzZClSgEKiWq8bYqaWYuontCqMUOm1nxrqLo+PmQPIxZi955+HqUemV1lIhqregajsRx88I
"text/plain": [
"<Figure size 360x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"tls = TfmdLists(files, tfm, splits=splits)\n",
"show_at(tls.valid, 0);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dls = tls.dataloaders(after_item=[Resize(224), ToTensor], \n",
" after_batch=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)])"
]
},
2020-04-23 18:24:16 +00:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Questionnaire"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-15 22:04:52 +00:00
"1. Why do we say that fastai has a \"layered\" API? What does it mean?\n",
"1. Why does a `Transform` have a `decode` method? What does it do?\n",
"1. Why does a `Transform` have a `setup` method? What does it do?\n",
2020-04-23 18:24:16 +00:00
"1. How does a `Transform` work when called on a tuple?\n",
"1. Which methods do you need to implement when writing your own `Transform`?\n",
2020-05-15 22:04:52 +00:00
"1. Write a `Normalize` transform that fully normalizes items (subtract the mean and divide by the standard deviation of the dataset), and that can decode that behavior. Try not to peek!\n",
"1. Write a `Transform` that does the numericalization of tokenized texts (it should set its vocab automatically from the dataset seen and have a `decode` method). Look at the source code of fastai if you need help.\n",
2020-04-23 18:24:16 +00:00
"1. What is a `Pipeline`?\n",
"1. What is a `TfmdLists`? \n",
2020-05-15 22:04:52 +00:00
"1. What is a `Datasets`? How is it different from a `TfmdLists`?\n",
"1. Why are `TfmdLists` and `Datasets` named with an \"s\"?\n",
2020-04-23 18:24:16 +00:00
"1. How can you build a `DataLoaders` from a `TfmdLists` or a `Datasets`?\n",
"1. How do you pass `item_tfms` and `batch_tfms` when building a `DataLoaders` from a `TfmdLists` or a `Datasets`?\n",
"1. What do you need to do when you want to have your custom items work with methods like `show_batch` or `show_results`?\n",
"1. Why can we easily apply fastai data augmentation transforms to the `SiamesePair` we built?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-14 12:18:31 +00:00
"### Further Research"
2020-04-23 18:24:16 +00:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-18 21:18:08 +00:00
"1. Use the mid-level API to prepare the data in `DataLoaders` on your own datasets. Try this with the Pet dataset and the Adult dataset from Chapter 1.\n",
2020-05-15 22:04:52 +00:00
"1. Look at the Siamese tutorial in the fastai documentation to learn how to customize the behavior of `show_batch` and `show_results` for new type of items. Implement it in your own project."
2020-04-23 18:24:16 +00:00
]
},
2020-03-06 18:19:03 +00:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-14 12:18:31 +00:00
"## Becoming a Deep Learning Practitioner"
2020-03-06 18:19:03 +00:00
]
},
2020-04-23 18:24:16 +00:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2020-05-15 22:04:52 +00:00
"Congratulations—you've completed all of the chapters in this book that cover the key practical parts of training models and using deep learning! You know how to use all of fastai's built-in applications, and how to customize them using the data block API and loss functions. You even know how to create a neural network from scratch, and train it! (And hopefully you now know some of the questions to ask to make sure your creations help improve society too.)\n",
2020-04-23 18:24:16 +00:00
"\n",
2020-05-15 22:04:52 +00:00
"The knowledge you already have is enough to create full working prototypes of many types of neural network application. More importantly, it will help you understand the capabilities and limitations of deep learning models, and how to design a system that's well adapted to them.\n",
2020-04-23 18:24:16 +00:00
"\n",
2020-05-15 22:04:52 +00:00
"In the rest of this book we will be pulling apart those applications, piece by piece, to understand the foundations they are built on. This is important knowledge for a deep learning practitioner, because it is what allows you to inspect and debug models that you build and create new applications that are customized for your particular projects."
2020-04-23 18:24:16 +00:00
]
},
2020-03-06 18:19:03 +00:00
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
2020-03-24 12:47:36 +00:00
"jupytext": {
"split_at_heading": true
},
2020-03-06 18:19:03 +00:00
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}