1
0
mirror of https://github.com/fastai/fastbook.git synced 2025-04-09 04:10:44 +00:00

09_tabular: Add note about bug when splitting data

As reported in , there's a mistake in the predicate that splits the data into the training and validation sets. Jeremy commented that it won't be fixed in this edition: https://github.com/fastai/fastbook/pull/337#issuecomment-735401046

This PR adds an errata comment to the notebook so that readers are aware of the mistake. This can save them time if they run into the bug on their own.
This commit is contained in:
Adam Comella 2023-06-21 21:50:31 -07:00
parent 823b69e00a
commit 847cf17367

View File

@ -726,6 +726,13 @@
"metadata": {},
"outputs": [],
"source": [
"# Errata:\n",
"# This line should have been:\n",
"# cond = ((df.saleYear<2011) | ((df.saleYear==2011) & (df.saleMonth<10))\n",
"#\n",
"# Correcting this line is postponed to a future edition of the book because\n",
"# it requires a re-analysis of the data. For discussion see:\n",
"# https://github.com/fastai/fastbook/issues/325.\n",
"cond = (df.saleYear<2011) | (df.saleMonth<10)\n",
"train_idx = np.where( cond)[0]\n",
"valid_idx = np.where(~cond)[0]\n",