Machine-Learning-Guide

Machine Learning: A Comprehensive Guide

By: Jacob Johnson

Basics

What is Machine Learning?

What is Machine Learning

Screen Shot 2022-10-15 at 12.52.14 PM.png

Source: https://www.ibm.com/cloud/learn/machine-learning


Sources/Reading Materials:

https://www.geeksforgeeks.org/difference-between-machine-learning-and-artificial-intelligence/

https://www.techtarget.com/searchenterpriseai/definition/machine-learning-ML

https://www.ibm.com/cloud/learn/machine-learning

https://www.sas.com/en_us/insights/analytics/machine-learning.html


Supervised vs Unsupervised Learning

Screen Shot 2022-10-15 at 4.33.30 PM.png

Source: https://medium.com/quick-code/regression-versus-classification-machine-learning-whats-the-difference-345c56dd15f7

Sources/Reading Materials:


Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow 2nd Edition by Aurelien Geron (Chapter 1: The Machine Learning Landscape)

https://www.geeksforgeeks.org/supervised-unsupervised-learning/

https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning


Regression vs Classification

Regression

\[Y = a + bX\]

Types of Regression

Screen Shot 2022-10-15 at 5.04.19 PM.png

Screen Shot 2022-10-15 at 5.06.13 PM.png

https://corporatefinanceinstitute.com/resources/knowledge/finance/regression-analysis/

Popular Regression Problems


Gradient Descent

Untitled

Source: https://gdsc.community.dev/events/details/developer-student-clubs-sir-syed-university-of-engineering-and-technology-presents-gradient-descent/

One of the most fundamental concepts of machine learning is gradient descent.

******Gradient Descent:****** Gradient Descent is an optimization algorithm used to train many machine learning model by iteratively seeking to minimize the cost by adjusting the model’s parameters.

What is a Gradient?

In mathematics a gradient is defined as a vector operand applied on a scalar function to produce a vector whose magnitude is the maximum derivative of the function at the point pointed in the direction of that maximum rate of change.

Untitled

Source: https://deepai.org/machine-learning-glossary-and-terms/stochastic-gradient-descent

The Cost Function

The cost function or loss is the function that is to be minimized by the gradient. Its result allows us to understand the difference between the actual values and the predicted values of our model. Under the hood, the gradient descent algorithm attempts to optimize the cost function by minimizing the difference between the actual a predicted values by adjusting the model parameters.

Cost Function Formula: Squared error cost function

\[J(f_w,_b) = \frac{1} {2m} {\sum\limits_{i = 0}^{m} ( \hat{y}^i - y^i)^2}\]

Visualizing the cost function

$Model: f_w,_b(x) = wx + b$

$Parameters : w,b$

$Cost Function : J(f_w,b) = \frac{1} {2m} \sum\limits{i = 0}^{m}(x^{(i)}- y^{(i)})^2$

$Objective : \frac{minimize} {w,b}(w,b)$

Train the model with gradient descent

Gradient descent algorithm

\[w = w - \alpha \frac{\partial }{\partial w} J(f_w,_b)\]

Here $\alpha$ is referred to as the learning rate. This controls how big the descent.

\[b = b - \alpha \frac{\partial }{\partial b} J(f_w,_b)\]

Simultaneous update w and b. Repeat until convergence.

Near a local minimum

Can reach minimum without decreasing learning rate $\alpha$

Code Example: Gradient Descent

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
# Define target and feature variables
X = 3 * np.random.rand(100, 1)
y = 4 + 2 * X + np.random.randn(100, 1)

b = np.random.random()  # bias
theta = np.random.randn(2, 1)  # weights
# Plot relation between variables
def plot_reg():
    sns.regplot(x=X, y=y, color="g")
    plt.xlabel('x')
    plt.ylabel('y', rotation='horizontal')
    plt.title("Relationship between x and y")
    plt.savefig("Images/regression_plot.png")
    plt.show()

plot_reg()

Untitled

Source: Original Image

# Define function to calculate the cost
def compute_cost(X, y, theta):
    """
    compute cost
    Args:
        X (ndarray (m,n)): Data, m examples with n features
        y (ndarray (m,)): target values
        theta (ndarray (n,)): model parameters
    Returns
        cost (scalar): cost
    """
    m = X.shape[0]
    predictions = X.dot(theta)
    cost = (1 / 2 * m) * np.sum(np.square(predictions - y))

    return cost
# Create function to perform gradient descent algorithm
def gradient_descent(X, y, theta, learning_rate, iterations):
    """
    Computes the gradient for linear regression
    Args:
        X (ndarray (m, )): Data, m examples
        y (ndarray (m, )): target values
        theta  (scalar)    : model parameters
        learning_rate (scalar) : linear rate of our model
        iterations (scalr) : number of iterations
    Returns:
        Final theta vector and array of cost history over no of iterations
    """
    m = X.shape[0]

    loss_history = np.zeros(iterations)
    weight_history = np.zeros((iterations, 2))

    for i in range(iterations):
        prediction = np.dot(X, theta)
        theta = theta - (1 / m) * learning_rate * (X.T.dot((prediction - y)))
        weight_history[i, :] = theta.T
        loss_history[i] = compute_cost(X, y, theta)

    return loss_history, weight_history, theta
# Instantiate model and create variables for tracking loss history
X_b = np.c_[np.ones((len(X), 1)), X]
cost_history, theta_history, theta = gradient_descent(X_b, y, theta, lr, iter)
# Plot learning curve
def plot_loss():
    fig, ax = plt.subplots(figsize=(7, 5))
    ax.set_ylabel('Theta')
    ax.set_xlabel('Iterations')
    ax.plot(range(iter), cost_history, 'b.')
    ax.title.set_text('Learning Curve')
    plt.show()

plot_loss()

Untitled

Source: Original Image

Click link for full code example:https://github.com/JacobJ215/Machine-Learning-Guide/blob/main/Regression%20Code%20Examples/gradient_descent.py

Screen Shot 2022-09-07 at 8.59.07 PM.png

https://www.kaggle.com/getting-started/193766


********Sources/Reading Materials********

https://www.geeksforgeeks.org/gradient-descent-algorithm-and-its-variants/

https://www.ibm.com/cloud/learn/gradient-descent

https://realpython.com/gradient-descent-algorithm-python/

https://towardsdatascience.com/gradient-descent-algorithm-a-deep-dive-cf04e8115f21

https://medium.com/@lachlanmiller_52885/machine-learning-week-1-cost-function-gradient-descent-and-univariate-linear-regression-8f5fe69815fd


Regression Algorithms

Stochastic Gradient Descent Regressor

SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

https://ogrisel.github.io/scikit-learn.org/sklearn tutorial/modules/generated/sklearn.linear_model.SGDRegressor.html

The word ‘stochastic‘ means a process linked with random probability. Hence, in Stochastic Gradient Descent, a few samples are selected randomly instead of the whole data set for each iteration. In SGD, we find out the gradient of the cost function of a single example at each iteration instead of the sum of the gradient of the cost function of all the examples.

Code Example: SGDRegressor

# Generate our target and features using the make_regression function
X, y = make_regression(n_samples=100, n_features=25)

# Split X and y into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=42)

# Scale data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Instantiate model
model = SGDRegressor()

# Train the model
model.fit(X_train, y_train)

# Evaluate how well our model fits to the training set
score = model.score(X_train, y_train)
print(f"R-Squared: {score}")

# Predict on testing set
y_pred = model.predict(X_test)

# Calculate error and print results
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(f"RMSE: {rmse}")
# Print intercept and coefficient
print(f"Intercept: {model.intercept_}")
print(f"Coefficient: {model.coef_}")

Click link for full code example: https://github.com/JacobJ215/Machine-Learning-Guide/blob/main/Regression%20Code%20Examples/sgd_regressor.py


****************Sources/Reading Materials****************

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#sklearn.linear_model.SGDRegressor

https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/

https://towardsdatascience.com/stochastic-gradient-descent-clearly-explained-53d239905d31

https://medium.com/@nikhilparmar9/simple-sgd-implementation-in-python-for-linear-regression-on-boston-housing-data-f63fcaaecfb1

https://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/modules/generated/sklearn.linear_model.SGDRegressor.html


Linear Regression

Untitled

source: https://h1ros.github.io/posts/linear-regression/

The Linear Regression model is used to find the relationship between one or more features and the continuous target variable.

The objective of the Linear Regression algorithm is to obtain a line of best fit and this is done by minimizing the loss function.

\[MSE = \frac{1} {n} {\sum\limits_{i = 1}^{n} (y_i - \hat{y}_i )^2} = \frac{1} {n} {\sum\limits_{i = 1}^{n} (y_i - (wx + b)^2}\]

********************Linear Regression Assumptions:********************

For an in depth explanation on Linear Regression Assumptions Please see: https://medium.com/analytics-vidhya/understanding-linear-regression-in-depth-intuition-6c9f3b1cbb51

https://dataaspirant.com/assumptions-of-linear-regression-algorithm/

https://www.kdnuggets.com/2017/11/first-linear-regression-assumptions.html


********************Code Example: Linear Regression********************

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Define X and y variables
X = 5 * np.random.rand(100, 1)
y = 4 - 2 * X + np.random.randn(100, 1)

# Split X and y into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=42)

# Scale data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Instantiate model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Evaluate how well our model fits to the training set
score = model.score(X_train, y_train)
print(f"R-Squared: {score}")

# Predict on testing set
y_pred = model.predict(X_test)

# Calculate error and print results
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(f"RMSE: {rmse}")
# Plot residuals
def plot_residuals():
    plt.scatter(model.predict(X_train), model.predict(X_train) - y_train, color="green", s=10, label='Train data')
    plt.scatter(y_pred, y_pred - y_test, color="blue", s=10, label='Test data')
    plt.hlines(y=0, xmin=-20, xmax=20, linewidth=2)
    plt.legend(loc='upper right')
    plt.title("Residual errors")
    plt.savefig('Images/residual_plot.png')
    plt.show()

plot_residuals()

Untitled

Source: Original Image

Click link for full code example: https://github.com/JacobJ215/Machine-Learning-Guide/blob/main/Regression%20Code%20Examples/linear_regression.py


**Sources/Reading Materials**

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression

https://www.kdnuggets.com/2019/03/beginners-guide-linear-regression-python-scikit-learn.htmlhttps://jakevdp.github.io/PythonDataScienceHandbook/05.06-linear-regression.html

https://medium.com/analytics-vidhya/understanding-linear-regression-in-depth-intuition-6c9f3b1cbb51

https://towardsdatascience.com/linear-regression-detailed-view-ea73175f6e86

https://www.geeksforgeeks.org/linear-regression-python-implementation/


Regularization

Untitled

Untitled

Source: https://medium.com/mlearning-ai/understanding-overfitting-and-underfitting-in-layman-terms-e4c82a28e2d2

Source: kdnuggets.com

Regularization is a model constraining technique used to reduce the complexity of a model, this in turn prevents the model from overfitting.

A model is said to be overfit when it performs well on the training set but and poorly or not at well on the testing set.

Ridge, Lasso and ElasticNet regression are regularization techniques to commonly used to reduce and prevent over-fitting which may result from linear regression.

****Lasso Regression:****

\[Lasso = {\sum\limits_{i = 1}^{n} (y_i - {x}{_i}\hat\beta)^2} + \lambda {\sum\limits_{j = 1}^{n} |\hat\beta^2_j|}\]

To reduce model complexity **Lasso (least absolute shrinkage and selection operator) or L1 regularization** adds a regularization term to the linear regression cost function. The regularized term is the L1 norm of the weight vector

An important characteristic of Lasso Regression is that it tends to completely eliminate the weights of the least important features (i.e., set them to zero).

Lasso Regression automatically performs feature selection and outputs a sparse model (i.e., with few nonzero feature weights).

**********Ridge Regression:**********

\[Ridge = {\sum\limits_{i = 1}^{n} (y_i - {x}{_i}\hat\beta)^2} + \lambda {\sum\limits_{j = 1}^{n} w_j\hat\beta^2_j}\]

Like Lasso, Ridge or L2 regularization adds a regularization penalty term to the cost function equivalent to square of the magnitude of the coefficients. Here our regularization term is of L2 norm.

Instead of eliminating the least important features the Ridge Regression minimizes the weights assigned to each feature.

Screen Shot 2022-10-18 at 7.04.28 PM.png

https://online.stat.psu.edu/stat508/book/export/html/749

**Elastic Net Regression:**

******Elastic Net Regression****** is an application of both L1 and L2 regularization terms simultaneously applied to the cost function of linear regression. The hyperparameter l1_ratio defines how we mix  **both L1 and L2 regularization.

\[ElasticNet = {\sum\limits_{i = 1}^{n} (y_i - {x}{_i}\hat\beta)^2} + \lambda {\sum\limits_{j = 1}^{n} |\hat\beta^2_j|} + \lambda {\sum\limits_{j = 1}^{n} w_j\hat\beta^2_j}\]

******************Code Example: Regularization******************

# Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

plt.style.use('ggplot')

# Read in data
df = pd.read_csv('Data/advertising_and_sales_clean.csv')
# Select Target and Features
X = df.iloc[: , 1: 3].values
y = df["sales"].values

# Split data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.3,
                                                    random_state=42)

# Scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

**************Lasso Regularization**************

# Instantiate and Fit Lasso regression model 
lasso = Lasso(alpha = 10, random_state=42)
lasso.fit(X_train,y_train)

# Compute R-Squared for the Ridge Model
print("R-Squared: {:.3f}".format(ridge.score(X_train, y_train)))

# R-Squared 0.7517

# Make prediction
y_pred = lasso.predict(X_test)

# Compute mean_squared_error
mse = mean_squared_error(y_test, y_pred)

print("RMSE: ", np.sqrt(mse)) # RMSE:  45881.400342063855

**************Ridge Regularization**************

# Instantiate and Fit Ridge regression model 
ridge = Ridge(alpha = 10, random_state=42)
ridge.fit(X_train,y_train)

# Compute R-Squared for the Ridge Model
print("R-Squared: {:.3f}".format(ridge.score(X_train, y_train))) 

# R-Squared: 0.752

# Make prediction
y_pred = ridge.predict(X_test) 

# Compute mean_squared_error
mse = mean_squared_error(y_test, y_pred)

print("RMSE: ", np.sqrt(mse)) # RMSE:  45887.96626202806

**ElasticNet**

# Instantiate and Fit ElasticNet Model
elastic = ElasticNet(l1_ratio=0.5, alpha=1, random_state=42)
elastic.fit(X_train, y_train)

# Compute R-Squared for the ElasticNet Model
print("R-Squared: {:.3f}".format(elastic.score(X_train, y_train)))

# R-Squared: 0.672

# Compute mean_squared_error
mse = mean_squared_error(y_test, y_pred)

print("RMSE: ", np.sqrt(mse)) # RMSE:  53566.47742727666

Click link for full code example: https://github.com/JacobJ215/Machine-Learning-Guide/blob/main/Regression%20Code%20Examples/regularization.ipynb

Support Vector Regressor

K-Nearest Neighbors

Decision Trees

Random Forest Regressor

Machine Learning Project: 1

Linear Regression Project: Melbourne housing clearance data from Kaggle

GitHub - JacobJ215/Melbourne-Housing-Price-Prediction


Classification Algorithms

**Naïve Bayes**

Logistic Regression

Support Vector Classifier

K-Nearest Neighbors

Decision Trees

Random Forest Classifier

Machine Learning Project: 2


Unsupervised Learning Algorithms


Model Evaluation


Model Deployment


Artificial Neural Networks


Convolutional Neural Networks


Recurrent Neural Networks


Time Series


Natural Language Processing