|
238 | 238 | "* *F1-Score*: An average metric that takes both precision and recall into account.\n", |
239 | 239 | "* *Support*: How many instances of this class are there in the test dataset?\n", |
240 | 240 | "\n", |
241 | | - "The classification report also icludes averages for these metrics, including a weighted average that allows for the imbalance in the number of cases of each class.\n", |
| 241 | + "The classification report also includes averages for these metrics, including a weighted average that allows for the imbalance in the number of cases of each class.\n", |
242 | 242 | "\n", |
243 | 243 | "Because this is a *binary* classification problem, the ***1*** class is considered *positive* and its precision and recall are particularly interesting - these in effect answer the questions:\n", |
244 | 244 | "\n", |
|
394 | 394 | "\n", |
395 | 395 | "In this case, the ROC curve and its AUC indicate that the model performs better than a random guess which is not bad considering we performed very little preprocessing of the data.\n", |
396 | 396 | "\n", |
397 | | - "In practice, it's common to perform some preprocessing of the data to make it easier for the algorithm to fit a model to it. There's a huge range of preprocessing trasformations you can perform to get your data ready for modeling, but we'll limit ourselves to a few common techniques:\n", |
| 397 | + "In practice, it's common to perform some preprocessing of the data to make it easier for the algorithm to fit a model to it. There's a huge range of preprocessing transformations you can perform to get your data ready for modeling, but we'll limit ourselves to a few common techniques:\n", |
398 | 398 | "\n", |
399 | 399 | "- Scaling numeric features so they're on the same scale. This prevents feaures with large values from producing coefficients that disproportionately affect the predictions.\n", |
400 | 400 | "- Encoding categorical variables. For example, by using a *one hot encoding* technique you can create individual binary (true/false) features for each possible category value.\n", |
401 | 401 | "\n", |
402 | 402 | "To apply these preprocessing transformations, we'll make use of a Scikit-Learn feature named *pipelines*. These enable us to define a set of preprocessing steps that end with an algorithm. You can then fit the entire pipeline to the data, so that the model encapsulates all of the preprocessing steps as well as the regression algorithm. This is useful, because when we want to use the model to predict values from new data, we need to apply the same transformations (based on the same statistical distributions and catagory encodings used with the training data).\n", |
403 | 403 | "\n", |
404 | | - ">**Note**: The term *pipeline* is used extensively in machine learning, often to mean very different things! In this context, we're using it to refer to pipeline objects in Scikit-Learn, but you may see it used elsewhere to mean someting else.\n" |
| 404 | + ">**Note**: The term *pipeline* is used extensively in machine learning, often to mean very different things! In this context, we're using it to refer to pipeline objects in Scikit-Learn, but you may see it used elsewhere to mean something else.\n" |
405 | 405 | ] |
406 | 406 | }, |
407 | 407 | { |
|
0 commit comments