Machine learning problems generally fall into two groups: supervised learning and unsupervised learning.
Supervised Learning
In supervised learning, you have input data and target labels. The goal is to learn the relationship between them.
Two common tasks:
- Classification: predicting a category. Example: deciding whether a fruit is an apple or a banana based on colour and shape.
- Regression: predicting a number. Example: estimating house prices based on features such as square footage.
Unsupervised Learning
Here, the data has no labels. The aim is to find useful structure.
Common tasks include:
- Clustering: grouping similar items, e.g. customer segments.
- Anomaly detection: spotting unusual behaviour or errors.
- Dimensionality reduction: reducing the number of features while keeping the important patterns. (PCA) is a common technique that finds the main directions of variation in the data and uses them to create a smaller, cleaner set of features.