This repository contains comprehensive notes on various machine learning topics. These notes cover a wide range of concepts from basic principles to advanced algorithms.
- Hypothesis Space
- Bayes Classifier
- Linear Regression
- Generalized Linear Regression
- Non-parametric Density Estimation
- Parzen Window Estimate
- K-Nearest Neighbour (KNN)
- Linear Discriminant Analysis (LDA)
- Support Vector Machine (SVM)
- Neural Networks
- Backpropagation
- Decision Trees
- Ensemble Learning
- Bagging and Random Forest
- Boosting
- XGBoost
- Principal Component Analysis (PCA)
- K-means Clustering
- Expectation Maximization (EM) Algorithm
- Miscellaneous Machine Learning Terms
The hypothesis space is defined as the set of all possible hypothesis functions that map feature vectors to labels. It's represented as:
H = {h : X → Y}
The Bayes classifier is defined as:
h*(x) = argmax[y ∈ Y] P(Y = y | X = x)
It's proven to be the best classifier for the 0-1 loss function.
Linear regression models the relationship between input features and output as a linear function. The notes cover the formulation, ideal regressor derivation, and the closed-form solution.
This extends linear regression by projecting data into a higher-dimensional space before performing linear regression, allowing for capture of more complex relationships.
Non-parametric density estimation techniques estimate the probability density function directly from the data without assuming a specific functional form.
Also known as kernel density estimation, this method uses a window function to estimate the probability density function.
KNN is a non-parametric method used for classification and regression. The algorithm and its formulation are explained in detail.
LDA is explained from a Bayesian perspective, including the derivation of the decision boundary.
The notes cover SVM for both linearly separable and non-linearly separable data, as well as the kernel trick for handling non-linear decision boundaries.
The notes provide a mathematical formulation of neural networks and explain the importance of non-linear activation functions.
A detailed derivation of the backpropagation algorithm used for training neural networks is provided.
The notes cover how decision trees work, including the growing and pruning processes, and metrics like Gini Impurity and Mean Squared Error.
An introduction to ensemble learning techniques, which combine multiple models to improve overall performance.
Bagging (Bootstrap Aggregating) and Random Forest, which is an application of bagging to decision trees, are explained.
The notes provide a mathematical formulation of boosting, explaining how it sequentially trains models to correct errors of previous ones.
XGBoost, a specific implementation of gradient boosting, is explained in detail.
PCA, a dimensionality reduction technique, is explained step-by-step, including the intuition behind it.
K-means clustering, an unsupervised learning algorithm, is explained along with its connection to the Expectation-Maximization algorithm.
The EM algorithm, used for finding maximum likelihood estimates of parameters in statistical models with latent variables, is derived and explained.
Various important machine learning terms and concepts are explained, including epochs, batch size, gradient descent variants, batch normalization, layer normalization, dropout, and N-fold cross-validation.
Contributions to improve or expand these notes are welcome. Please feel free to submit a pull request or open an issue for discussion.
This project is licensed under the MIT License - see the LICENSE.md file for details.