Create Your Own Datasets

with Google Colab and sklearn

In this course the student will learn how to use Google Colab and Python’s machine learning library, sklearn, to create datasets and use them in machine learning enterprises.

What you’ll learn

  • Students will learn how to create their own datasets using Google Colab and sklearn..
  • Students will be given an introduction to Google Colab, which they will use to write their own programs..
  • Students will be given an introduction to Python’s machine learning library, sklearn, which they will need in order to create their own datasets..
  • Students will learn how to create the twenty datasets that are included in sklearn..

Course Content

  • Introduction –> 3 lectures • 32min.
  • Projects –> 22 lectures • 4hr 2min.

Auto Draft

Requirements

  • The equipment needed for this course is a computer with an internet connection.
  • A prerequisite to this course is the course I have created, “Use Google Colab to learn Python programming”..

In this course the student will learn how to use Google Colab and Python’s machine learning library, sklearn, to create datasets and use them in machine learning enterprises.

The datasets will be created in sklearn and they are comprised of classifications and regressions, being twenty in total.

When the datasets have been created, machine learning techniques will be employed to make predictions on the labels. In addition, the concepts of supervised and unsupervised learning will be discussed. Although most of the examples will be of supervised learning, clustering will be brushed upon.

Some of the datasets introduce noise into the system, and this will decrease accuracy of predictions. The student will be shown how to tune the parameters of the appropriate datasets to reduce noise and thereby improve accuracy of the predictions. This proves that noise has an inverse relationship to accuracy of the model.

Some of the datasets will have outliers, so methods for reducing outliers in the dataset will also be discussed. When the outliers are removed, accuracy of the predictions are also likely to be increased. This proves that outliers have an inverse relationship to accuracy of the model.

The student will be taken through the following steps to create a dataset and write a program to make predictions on the labels:-

1. Import libraries.

2. Create dataset.

3. Plot a graph of the dataset so it can be seen in the computer’s memory.

4. Analyse the label.

5. Remove outliers if necessary.

6. Normalise or standardise the independent variable if necessary.

7. Split the dataframe into training and validation sets.

8. Select the model.

9. Train and fit the training set into the model.

10. Make predictions on the validation set.

11. Check accuracy and / or error of the predictions.

12. Compare the predictions against the actual values.

13. Plot the predictions on a graph.

Get Tutorial