Skip to content

Revise Machine Learning Tutorial and Scaling Workflow#3

Open
amandalin047 wants to merge 6 commits into
school-brainhack:mainfrom
amandalin047:main
Open

Revise Machine Learning Tutorial and Scaling Workflow#3
amandalin047 wants to merge 6 commits into
school-brainhack:mainfrom
amandalin047:main

Conversation

@amandalin047
Copy link
Copy Markdown

This PR revises the machine learning tutorial notebook to improve conceptual clarity for beginners.
Main changes:

  • Clarified the explanation of linear SVC.
  • Reframed standard scaling as preprocessing rather than "model tweaking."
  • Added explanation of why scaling should be performed inside each CV fold using a Pipeline.
  • Clarified that matching predicted labels after scaling does not imply identical model behavior, since the decision-function values may still differ.
  • Added explanation of the standard workflow: cross-validation on the training set, choose a final pipeline, refit on the full training set, and evaluate once on the held-out test set.
  • Revised comments around final pipeline refitting and test-set evaluation.
  • Clarified that linear SVC coefficients / weights should not be interpreted as straightforward feature importance. They define the decision boundary in the transformed feature space, and their interpretation depends on scaling, correlations among features, preprocessing choices, regularization, and model geometry.

No hyperparameter tuning was added in this revision.

  • 𝒜𝓂𝒶𝓃𝒹𝒶 ℒ𝒾𝓃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant