This course introduces you to one of the main types of modelling families of supervised Machine Learning: Regression. You will learn how to train regression models to predict continuous outcomes and how to use error metrics to compare across different models. This course also walks you through best practices, including train and test splits, and regularization techniques.
By the end of this course you should be able to:
Differentiate uses and applications of classification and regression in the context of supervised machine learning
Describe and use linear regression models
Use a variety of error metrics to compare and select a linear regression model that best suits your data
Articulate why regularization may help prevent overfitting
Use regularization regressions: Ridge, LASSO, and Elastic net
Who should take this course?
This course targets aspiring data scientists interested in acquiring hands-on experience with Supervised Machine Learning Regression techniques in a business setting.
What skills should you have?
To make the most out of this course, you should have familiarity with programming on a Python development environment, as well as fundamental understanding of Data Cleaning, Exploratory Data Analysis, Calculus, Linear Algebra, Probability, and Statistics.
What you will learn
Introduction to Supervised Machine Learning and Linear Regression
This module introduces a brief overview of supervised machine learning and its main applications: classification and regression. After introducing the concept of regression, you will learn its best practices, as well as how to measure error and select the regression model that best suits your data.
Data Splits and Polynomial Regression
There are a few best practices to avoid overfitting of your regression models. One of these best practices is splitting your data into training and test sets. Another alternative is to use cross validation. And a third alternative is to introduce polynomial features. This module walks you through the theoretical framework and a few hands-on examples of these best practices.
There is a trade-off between the size of your training set and your testing set. If you use most of your data for training, you will have fewer samples to validate your model. Conversely, if you use more samples for testing, you will have fewer samples to train your model. Cross Validation will allow you to reuse your data to use more samples for training and testing.
Bias Variance Trade off and Regularization Techniques: Ridge, LASSO, and Elastic Net
This module walks you through the theory and a few hands-on examples of regularization regressions including ridge, LASSO, and elastic net. You will realize the main pros and cons of these techniques, as well as their differences and similarities.