# CS代考程序代写 GMM Bayesian algorithm LECTURE 5 TERM 2:

LECTURE 5 TERM 2:

MSIN0097

Predictive Analytics

A P MOORE

MSIN0097

Individual coursework

MSIN0097

Individual Coursework assignment has been extended by one week

to Friday 5th March 2021 at 10:00 am

USING OTHER PEOPLE’S CODE

— Wojciech Zaremba (@woj_zaremba) February 4, 2021

MACHINE LEARNING JARGON

— Model

— Interpolating / Extrapolating — Data Bias

— Noise / Outliers

— Learning algorithm

— Inference algorithm

— Supervised learning

— Unsupervised learning

— Classification

— Regression

— Clustering

— Decomposition

— Parameters

— Optimisation

— Training data

— Testing data

— Error metric

— Linear model

— Parametric model

— Model variance

— Model bias

— Model generalization

— Overfitting

— Goodness-of-fit

— Hyper-parameters

— Failure modes

— Confusion matrix

— True Positive

— False Negative

— Partition

— Margin

— Data density

— Hidden parameter

— High dimensional space

— Low dimensional space

— Separable data

— Manifold / Decision surface

— Hyper cube / volume / plane

A – B – C- D ALGORITHMIC APPROACHES

A. ClAssification

C. Clustering

Hidden variables

Density estimation Manifolds

B. Regression

Super vised

D. Decomposition

Subspaces

Unsuper vised

QUES TIONS

— How would I know if my data will be benefitted from a transformation to a higher or lower dimensional space?

CURSE OF DIMENSIONALITY

https://www.nature.com/articles/s41592-018-0019-x

QUES TIONS

— Would I always have to visualize the data at a 2D or 3D level to visually understand if the data can be better separable? (but then this would defeat the idea of going a higher dimensional space which can’t be visualized).

SUMMARY STATISTICS

Anscombe’s quartet

SUMMARY STATISTICS

https://seaborn.pydata.org/examples/scatterplot_matrix.html

QUES TIONS?

— Should I have to go all the way through modelling (e.g. classification) and evaluate a metric such as the Gini coefficient and then go back to comparing different Gini scores from (addition of) extra dimensions?

QUES TIONS?

— I understand that it might be better to go up a dimension in certain cases and other cases it will be better to go lower a dimension?

MULTIPLE MODELS

MSIN0097

K-means

K-MEANS LLOYD–FORGY ALGORITHM

K-MEANS

— Advantages — Disadvantages

ELLIPSOIDAL DISTRIBUTED DATA

MSIN0097

Gaussian mixtures

PARTITIONAL

MIXTURE OF GAUSSIANS (1D)

HIDDEN (LATENT) VARIABLES

MIXTURE OF GAUSSIANS (2D)

GRAPHICAL MODELS GAUSSIAN MIXTURES

PLATE NOTATION

— including its parameters (squares, solid circles, bullet) — random variables (circles)

— conditional dependencies (solid arrows)

FAMILIES OF MODELS

Gaussian mixture T-distribution mixture Factor Analysis

TWO STEP – EM ALGORITHM

EM ALGORITHM

EXPECTATION MAXIMIZATION

MIXTURE OF GAUSSIANS AS MARGINALIZATION

E-S TEP

M-S TEP

EM ALGORITHM

EXPECTATION MAXIMIZATION

MANIPULATING THE LOWER BOUND

LOCAL MAXIMA

Repeated fitting of mixture of Gaussians model with different starting points results in different models as the fit converges to different local maxima.

Log likelihoods are a) 98.76 b) 96.97 c) 94.35, respectively, indicating that (a) is the best fit.

COVARIANCE COMPONENTS

a) Full covariances.

b) Diagonal covariances.

c) Identical diagonal covariances.

LEARNING GMM PSEUDO CODE

ANOMALY DETECTION

BIC AND AIC

GAUSSIAN MIXTURES

BAYESIAN GMMS

CONCENTRATION PRIORS

The more data we have, however, the less the priors matter. In fact, to plot diagrams with such large differences, you must use very strong priors and little data.

TWO MOONS DATA

PROBLEMS WITH MULTI-VARIATE NORMAL DENSITY

MSIN0097

Types of models

GENERATIVE VS DISCRIMINATIVE

CLASSIFICATION (DISCRIMINATIVE)

LOGISTIC REGRESSION REVISITED

MODEL CONTINGENCY OF THE WORLD ON DATA

World state: Linear model Bernoulli distribution

Probability / Decision surface

CLASSIFICATION (GENERATIVE)

GAUSSIAN MIXTURE

MODEL CONTINGENCY OF DATA ON THE WORLD

WHAT SORT OF MODEL SHOULD WE USE?

WHAT SORT OF MODEL SHOULD WE USE? TL;DR NO DEFINITIVE ANSWER

— Inference is generally simpler with discriminative models.

— Generative models calculate this probability via Bayes’ rule, and sometimes this requires a computationally expensive algorithm.

— Generative models might waste modelling power.

The data are generally of much higher dimension than the world, and modelling it is costly. Moreover, there may be many aspects of the data which do not influence the state;

— Using discriminative approaches, it is harder to exploit this knowledge: essentially we have to re-learn these phenomena from the data.

— Sometimes parts of the training or test data vector x may be missing. Here, generative models are preferred.

— It is harder to impose prior knowledge in a principled way in discriminative models.

SUMMARY OF APPROACHES

MSIN0097

Best practice…

BEST PRACTICE…

BEST PRACTICE…

BEST PRACTICE…

BEST PRACTICE…

Source: https://www.marekrei.com/blog/ml-and-nlp-publications-in-2019/

Percentage of papers mentioning GitHub (indicating that the code is made available):

ACL 70%, EMNLP 69%, NAACL 68% ICLR 56%, NeurIPS 46%, ICML 45%, AAAI 31%.

It seems the NLP papers are releasing their code much more freely.

PAPERS WITH CODE

https://paperswithcode.com/

PERCEPTIONS OF PROBABILITY

DEPLO YMEN T

@SOCIAL

@chipro @random_forests @zachar ylipton @yudapearl @svpino @jackclarkSF

TEACHING TEAM

Dr Alastair Moore Senior Teaching Fellow

a.p.moore@ucl.ac.uk

@latticecut

Kamil Tylinski Teaching Assistant

kamil.tylinski.16@ucl.ac.uk

Jiangbo Shangguan Teaching Assistant

j.shangguan.17@ucl.ac.uk

Individual Coursework workshop

to Thursday 11th Feb 2021 at 12:00 am

LECTURE 3 TERM 2:

MSIN0097

Predictive Analytics

A P MOORE