Update 09_tabular to fastai v2.2.7 (#413)

saleElaped is now detected as continuous variable right away.
This commit is contained in:
Armin Berres 2021-02-22 23:06:26 +01:00 committed by GitHub
parent c3ceea7996
commit 8be580737e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 17 additions and 33 deletions

View File

@ -9366,33 +9366,27 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, however, there's one variable that we absolutely do not want to treat as categorical: the `saleElapsed` variable. A categorical variable cannot, by definition, extrapolate outside the range of values that it has seen, but we want to be able to predict auction sale prices in the future. Therefore, we need to make this a continuous variable:"
"In this case, there's one variable that we absolutely do not want to treat as categorical: the `saleElapsed` variable. A categorical variable cannot, by definition, extrapolate outside the range of values that it has seen, but we want to be able to predict auction sale prices in the future. Let's verify that `cont_cat_split` did the correct thing."
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [],
"source": [
"cont_nn.append('saleElapsed')\n",
"cat_nn.remove('saleElapsed')"
"outputs": [
{
"data": {
"text/plain": [
"['saleElapsed']"
]
},
{
"cell_type": "markdown",
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Also, to use this as a continuous variable, we have to ensure it's of a numeric type:"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [],
"source": [
"df_nn['saleElapsed'] = df_nn['saleElapsed'].astype(int)"
"cont_nn"
]
},
{
@ -9975,7 +9969,7 @@
"1. What's a good type of plot for showing tree interpreter results?\n",
"1. What is the \"extrapolation problem\"?\n",
"1. How can you tell if your test or validation set is distributed in a different way than your training set?\n",
"1. Why do we make `saleElapsed` a continuous variable, even although it has less than 9,000 distinct values?\n",
"1. Why do we ensure `saleElapsed` is a continuous variable, even although it has less than 9,000 distinct values?\n",
"1. What is \"boosting\"?\n",
"1. How could we use embeddings with a random forest? Would we expect this to help?\n",
"1. Why might we not always use a neural net for tabular modeling?"

View File

@ -1153,17 +1153,7 @@
"metadata": {},
"outputs": [],
"source": [
"cont_nn.append('saleElapsed')\n",
"cat_nn.remove('saleElapsed')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_nn['saleElapsed'] = df_nn['saleElapsed'].astype(int)"
"cont_nn"
]
},
{
@ -1375,7 +1365,7 @@
"1. What's a good type of plot for showing tree interpreter results?\n",
"1. What is the \"extrapolation problem\"?\n",
"1. How can you tell if your test or validation set is distributed in a different way than your training set?\n",
"1. Why do we make `saleElapsed` a continuous variable, even although it has less than 9,000 distinct values?\n",
"1. Why do we ensure `saleElapsed` is a continuous variable, even although it has less than 9,000 distinct values?\n",
"1. What is \"boosting\"?\n",
"1. How could we use embeddings with a random forest? Would we expect this to help?\n",
"1. Why might we not always use a neural net for tabular modeling?"