LECTURE 1 TERM 2:

MSIN0097

Predictive Analytics

A P MOORE

INTRODUCTION TO AI

Why do they call it intelligence?

MACHINE LEARNING

Data + modelàprediction

MACHINE LEARNING DATA DRIVEN AI

Assume there is enough data to find statistical associations to solve specific tasks

Data + modelàprediction

Define how well the model solves the task and adapt the parameters to maximize performance

LEARNING A FUNCTION

𝑥→𝑦

𝑥 →𝑓(𝑥)→𝑦

LEARNING A FUNCTION

𝑥→𝑦

𝑥 →𝑓(𝑥)→𝑦

Measured data

Features Inferred/Predicted/Estimated value

Trueinitialvalue𝑥 →𝑥’→𝑓 𝑥 =𝑦’ →𝑦

(world state) True target value

Learned/Fitted function (world state) From n observations

LEARNING A FUNCTION

𝑥→𝑦

𝑥 →𝑓(𝑥)→𝑦

Measured data

Features Inferred/Predicted/Estimated value

Trueinitialvalue𝑥 →𝑥’→𝑓 𝑥 =𝑦’ →𝑦

(world state) True target value

Learned/Fitted function (world state) From n observations

input 𝑥→ 𝑓 𝑥 →𝑦 output

MACHINE LEARNING DATA DRIVEN AI

Source: https://twitter.com/Kpaxs/status/1163058544402411520

MACHINE LEARNING DATA DRIVEN AI

𝑥 → 𝑥’ → 𝑓 𝑥 = 𝑦’ → 𝑦 𝑓𝑥

𝑦!

𝑓𝑥

𝑥! 𝑥!

𝑦!

{𝑥!,𝑦!} Labelled training data Source: https://twitter.com/Kpaxs/status/1163058544402411520

INTRODUCTION TO AI

Learning the rules

MATURITY OF APPROACHES ML

“Classical” Machine learning

“Modern” Machine learning

Source: hazyresearch.github.io/snorkel/blog/snorkel_programming_training_data.html

PARADIGMS IN ML

Source: https://twitter.com/IntuitMachine/status/1200796873495318528/photo/1

TASKS IN MACHINE LEARNING

MACHINE LEARNING BR ANCHES

We know what the right answer is

We don’t know what the right answer is – but we can recognize a good answer if we find it

We have a way to measure how good our current best answer is, and a method to improve it

Source: Introduction to Reinforcement Learning, David Silver

BUILDING BLOCKS OF ML

A – B – C- D

A TAXONOMY OF PROBLEMS

A. ClAssification B. Regression

C. Clustering D. Decomposition

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification

– Support vector machines

– Neural networks

– Random Forests

– Maximum entropy classifiers -…

C. Clustering

– K-means

– KD Trees

– Spectral clustering – Density estimation – …

B. Regression

– Logistic regression

– Support vector regression – SGD regressor

– …

D. Decomposition

– PCA – LDA – t-SNE – Umap – VAE -…

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification

– Support vector machines

– Neural networks

– Random Forests

– Maximum entropy classifiers -…

C. Clustering – K-means

– KD Trees

– Spectral clustering – Density estimation – …

B. Regression

– Logistic regression

– Support vector regression – SGD regressor

– …

Super vised

D. Decomposition – PCA

– LDA – t-SNE – Umap – VAE -…

Unsuper vised

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification

B. Regression

Super vised

C. Clustering

D. Decomposition

Unsuper vised

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification

B. Regression

We know what the right answer is

Super vised

C. Clustering

D. Decomposition

Unsuper vised

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification

B. Regression

C. Clustering

D. Decomposition

Super vised

Unsuper vised

We don’t know what the right answer is – but we can recognize a good answer if we find it

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification

B. Regression

Super vised

We have a way to measure how good our current best answer is Reinforcement

C. Clustering

D. Decomposition

Learning Unsuper vised

MACHINE LEARNING

B. Regression

B. REGRESSION REAL VALUED VARIABLE

B. REGRESSION REAL VALUED VARIABLE

B. REGRESSION REAL VALUED VARIABLE

LINEAR REGRESSION

REGRESSION BY MODELING PROBABILITIES

B. REGRESSION REAL VALUED VARIABLE

B. REGRESSION REAL VALUED VARIABLE

MULTIPLE DIMENSIONS

DEVELOPING MORE COMPLEX ALGORITHMS

MACHINE LEARNING

A. Classification

A. CLASSIFICATION CATEGORICAL VARIABLE

A. CLASSIFICATION CATEGORICAL VARIABLE

A. CLASSIFICATION CATEGORICAL VARIABLE

A. CLASSIFICATION CATEGORICAL VARIABLE

LOGISTIC REGRESSION

DEVELOPING MORE COMPLEX ALGORITHMS

CONFUSION MATRIX BINARY FORCED CHOICE

A. CLASSIFICATION CATEGORICAL VARIABLE

Model 1

predicted

predicted

Model 2

actual

actual

CL ASSIFIC ATION MNIST DATASET

CONFUSION MATRIX

MACHINE LEARNING

C. Clustering

CLASSIFICATION VS CLUSTERING CATEGORICAL VARIABLE

CLASSIFICATION VS CLUSTERING

C. CLUSTERING

C. CLUSTERING

C. CLUSTERING

C. CLUSTERING 1. AGGLOMERATIVE

C. CLUSTERING 1. AGGLOMERATIVE

C. CLUSTERING 1. AGGLOMERATIVE

C. CLUSTERING 1. AGGLOMERATIVE

C. CLUSTERING 1. AGGLOMERATIVE

Dendrogram

C. CLUSTERING 2. DIVISIVE

C. CLUSTERING 2. DIVISIVE

C. CLUSTERING 2. DIVISIVE

C. CLUSTERING 2. DIVISIVE

C. CLUSTERING 3. PARTITIONAL

C. CLUSTERING 3. PARTITIONAL

C. CLUSTERING 3. PARTITIONAL

EXPECTATION MAXIMISATION

MACHINE LEARNING

D. Decomposition

D. DECOMPOSITION 2. PROJECTION METHODS

Dimensionality reduction

D. DECOMPOSITION 2. KERNEL METHODS

D. DECOMPOSITION 3. MANIFOLD LEARNING

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification B. Regression

C. Clustering D. Decomposition

TAXONOM Y

A.

B.

C.

D.

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification

B. Regression Source: Computer Vision: Learning, Models and Inference

DISCRIMINATIVE VS GENERATIVE A SIMPLE EXAMPLE

PARAMETRIC VS NON-PARAMETRIC

— With data gathered from uncontrolled observations on complex systems involving unknown [physical, chemical, biological, social, economic] mechanisms, the a priori assumption that nature would generate the data through a parametric model selected by the statistician can result in questionable conclusions that cannot be substantiated by appeal to goodness-of-fit tests and residual analysis.

— Usually, simple parametric models imposed on data generated by complex systems, for example, medical data, financial data, result in a loss of accuracy and information as compared to algorithmic models

Source: Statistical Science 2001, Vol. 16, No. 3, 199–231 Statistical Modeling: The Two Cultures Leo Breiman

REGUL ARIZ ATION IMPOSING ADDITIONAL CONSTRAINTS

ASSESSING GOODNESS OF FIT

ML PIPELINES

Source: https://epistasislab.github.io/tpot/

ML PIPELINES

FEATURE SELECTION AND AUTOMATION

Source: https://epistasislab.github.io/tpot/

HOMEWORK

Hands-on Machine Learning

Chapter 2: End-to-End Machine Learning Project

Try reading the Chapter from start to finish.We will work through the problem in class but please come prepared to discuss the case study.

It is easier to understand the different stages of a ML project if you follow one from start to finish.

END TO END

TESTING AND VALIDATION

— Generalization of data

— Generalization of feature representation — Generalization of the ML model

TOY VS REAL DATA

— Toy data is useful for exploring behaviour of algorithms

— Demonstrating the advantages and disadvantages of an algorithm — However, best not to use just Toy datasets

— Use real datasets

Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

BOOKS

THINKING ABOUT BUSINESS

WORKING WITH DATA

DESIGNING PREDICTIVE MODELS

PYTHON PROGRAMMING

A – B – C- D

A TAXONOMY OF PROBLEMS

A. ClAssification B. Regression Week 2 – Classification and Regression

Week 3 – Trees and Ensembles

C. Clustering D. Decomposition

Week 5 – Clustering

Week 4 – Kernel spaces and Decomposition

LECTURE 1 TERM 2:

MSIN0097

Predictive Analytics

A P MOORE