mirror of
https://github.com/fastai/fastbook.git
synced 2025-04-04 18:00:48 +00:00
Fix tabular example
This commit is contained in:
parent
c640215af0
commit
5a2aafdfa2
@ -2,7 +2,7 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -2188,11 +2188,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This model is using the IMDb dataset from the paper [Learning Word Vectors for Sentiment Analysis]((https://ai.stanford.edu/~amaas/data/sentiment/)). It works well with movie reviews of many thousands of words. But let's test it out on a very short one, to see it does its thing:"
|
||||
]
|
||||
@ -2200,11 +2196,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
@ -2302,29 +2294,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> jargon: Tabular: Data that is in the form of a table, such as from a spreadsheet, database, or CSV file. A tabular model is a model which tries to predict one column of a table based on information in other columns of a table."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from fastai2.tabular.all import *\n",
|
||||
"path = untar_data(URLs.ADULT_SAMPLE)\n",
|
||||
"\n",
|
||||
"dls = TabularDataLoaders.from_csv(path/'adult.csv', path, y_names=\"salary\",\n",
|
||||
"dls = TabularDataLoaders.from_csv(path/'adult.csv', path=path, y_names=\"salary\",\n",
|
||||
" cat_names = ['workclass', 'education', 'marital-status', 'occupation',\n",
|
||||
" 'relationship', 'race'],\n",
|
||||
" cont_names = ['age', 'fnlwgt', 'education-num'],\n",
|
||||
@ -2335,11 +2319,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As you see, we had to tell fastai which columns are *categorical* (that is, they contain values that are one of a discrete set of choices, such as `occupation`), versus *continuous* (that is, they contain a number that represents a quantity, such as `age`).\n",
|
||||
"\n",
|
||||
@ -2349,11 +2329,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
@ -2407,11 +2383,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This model is using the *adult* dataset, from the paper [Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid](https://archive.ics.uci.edu/ml/datasets/adult), which contains some data regarding individuals (like their education, marital status, race, sex, etc.) and whether or not they have an annual income greater than \\$50k. The model is over 80\\% accurate, and took around 30 seconds to train.\n",
|
||||
"\n",
|
||||
@ -2421,11 +2393,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
@ -2550,22 +2518,13 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This model is predicting movie ratings on a scale of 0.5 to 5.0 to within around 0.6 average error. Since we're predicting a continuous number, rather than a category, we have to tell fastai what range our target has, using the `y_range` parameter.\n",
|
||||
"\n",
|
||||
"Although we're not actually using a pretrained model (for the same reason that we didn't for the tabular model), this example shows that fastai lets us use `fine_tune` even in this case (we'll learn how and why this works later in <<chapter_pet_breeds>>). Sometimes it's best to experiment with `fine_tune` versus `fit_one_cycle` to see which works best for your dataset.\n",
|
||||
"\n",
|
||||
"We can use the same `show_results` call we saw earlier to view a few examples of user and movie IDs, actual ratings, and predictions:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"learn.show_results()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -2969,17 +2928,8 @@
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.4"
|
||||
},
|
||||
"pycharm": {
|
||||
"stem_cell": {
|
||||
"cell_type": "raw",
|
||||
"source": [],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
}
|
||||
|
@ -2,7 +2,7 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -1135,14 +1135,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from fastai2.tabular.all import *\n",
|
||||
"path = untar_data(URLs.ADULT_SAMPLE)\n",
|
||||
"\n",
|
||||
"dls = TabularDataLoaders.from_csv(path/'adult.csv', path, y_names=\"salary\",\n",
|
||||
"dls = TabularDataLoaders.from_csv(path/'adult.csv', path=path, y_names=\"salary\",\n",
|
||||
" cat_names = ['workclass', 'education', 'marital-status', 'occupation',\n",
|
||||
" 'relationship', 'race'],\n",
|
||||
" cont_names = ['age', 'fnlwgt', 'education-num'],\n",
|
||||
|
Loading…
Reference in New Issue
Block a user