- Machine Learning Introduction
- Problems in Machine Learning
- Machine Learning Workflow
- Types of Machine Learning
- Machine Learning Use Cases
There are several types of machine learning. When considering which type of machine learning model to use, it’s very important to understand whether quantities are discrete or continuous. You also want to understand what categorical data is.
- Supervised Learning (labeled data)
- Deep Learning
- Reinforcement Learning/Weak Learning
- Unsupervised Learning (unlabeled data)
Unsupervised Clustering, Supervised Classification, Supervised Continuous.
You work for a bank and would like to identify current customers who are at risk of taking their business elsewhere. You have six years of customer data that includes tenure, transaction amounts, account balances, and whether the person is still a customer. Supervised Classification.
You want to predict the sale price of houses. You have a dataset that includes square footage, number of bedrooms, number of bathrooms, number of floors, and the sale price of many houses that have recently sold in the area. Supervised Continuous. Generally, money is continuous.
You’re a data analyst working on a congressperson’s campaign, and they would like you to predict how much money a person will donate. You are provided a dataset containing each person’s name, birth date, height, weight, and eye color. Not enough information here. We’d need income, past donation history or political affiliation.
You’re a marketing manager. You manage an ad campaign and would like to develop unique ads to appeal to different segments (groups) within your target market. You analyze customer surveys that include food preferences, education level, number of children, postal code, and political affiliation. Unsupervised clustering because you don’t know how to segment your customers, yet.
You would like to predict how many kilograms of corn you can expect to grow based on different combinations of water and fertilizer. You have corn yield data from the past 25 years, including the amounts of fertilizer and water used. Supervised Continuous.
You want to predict whether a customer’s future card transaction is legitimate or fraud. You have five years of transaction history, including fraudulent charges, with the time, date, amount, and place of each transaction. Supervised Classification because we need to predict a binary event and the examples of fraudulent and legitimate transaction serve as labelled data.
You are a librarian, and your task is to organize books in designated sections based on their genre. You have a spreadsheet of book titles, authors, page counts, and a summary of each book. Unsupervised Clustering because we have no topical information (except summary). Book genres are categorical.
Your medical diagnostics company is looking to build a model that uses blood test analysis to identify whether a person has cancer.
The data provided to train your model has the following measurements only for each person: BMI, number of past doctor visits, oxygen levels, and pulse rate. Not enough data.