Monthly Archives: February 2021

CS代考程序代写 case study algorithm AI python LECTURE 1 TERM 2:

LECTURE 1 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

INTRODUCTION TO AI
Why do they call it intelligence?

MACHINE LEARNING
Data + modelàprediction

MACHINE LEARNING DATA DRIVEN AI
Assume there is enough data to find statistical associations to solve specific tasks
Data + modelàprediction
Define how well the model solves the task and adapt the parameters to maximize performance

LEARNING A FUNCTION
𝑥→𝑦
𝑥 →𝑓(𝑥)→𝑦

LEARNING A FUNCTION
𝑥→𝑦
𝑥 →𝑓(𝑥)→𝑦
Measured data
Features Inferred/Predicted/Estimated value
Trueinitialvalue𝑥 →𝑥’→𝑓 𝑥 =𝑦’ →𝑦
(world state) True target value
Learned/Fitted function (world state) From n observations

LEARNING A FUNCTION
𝑥→𝑦
𝑥 →𝑓(𝑥)→𝑦
Measured data
Features Inferred/Predicted/Estimated value
Trueinitialvalue𝑥 →𝑥’→𝑓 𝑥 =𝑦’ →𝑦
(world state) True target value
Learned/Fitted function (world state) From n observations
input 𝑥→ 𝑓 𝑥 →𝑦 output

MACHINE LEARNING DATA DRIVEN AI
Source: https://twitter.com/Kpaxs/status/1163058544402411520

MACHINE LEARNING DATA DRIVEN AI
𝑥 → 𝑥’ → 𝑓 𝑥 = 𝑦’ → 𝑦 𝑓𝑥
𝑦!
𝑓𝑥
𝑥! 𝑥!
𝑦!
{𝑥!,𝑦!} Labelled training data Source: https://twitter.com/Kpaxs/status/1163058544402411520

INTRODUCTION TO AI
Learning the rules

MATURITY OF APPROACHES ML
“Classical” Machine learning
“Modern” Machine learning
Source: hazyresearch.github.io/snorkel/blog/snorkel_programming_training_data.html

PARADIGMS IN ML
Source: https://twitter.com/IntuitMachine/status/1200796873495318528/photo/1

TASKS IN MACHINE LEARNING

MACHINE LEARNING BR ANCHES
We know what the right answer is
We don’t know what the right answer is – but we can recognize a good answer if we find it
We have a way to measure how good our current best answer is, and a method to improve it
Source: Introduction to Reinforcement Learning, David Silver

BUILDING BLOCKS OF ML

A – B – C- D
A TAXONOMY OF PROBLEMS
A. ClAssification B. Regression
C. Clustering D. Decomposition

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
– Support vector machines
– Neural networks
– Random Forests
– Maximum entropy classifiers -…
C. Clustering
– K-means
– KD Trees
– Spectral clustering – Density estimation – …
B. Regression
– Logistic regression
– Support vector regression – SGD regressor
– …
D. Decomposition
– PCA – LDA – t-SNE – Umap – VAE -…

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
– Support vector machines
– Neural networks
– Random Forests
– Maximum entropy classifiers -…
C. Clustering – K-means
– KD Trees
– Spectral clustering – Density estimation – …
B. Regression
– Logistic regression
– Support vector regression – SGD regressor
– …
Super vised
D. Decomposition – PCA
– LDA – t-SNE – Umap – VAE -…
Unsuper vised

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
Super vised
C. Clustering
D. Decomposition
Unsuper vised

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
We know what the right answer is
Super vised
C. Clustering
D. Decomposition
Unsuper vised

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
C. Clustering
D. Decomposition
Super vised
Unsuper vised
We don’t know what the right answer is – but we can recognize a good answer if we find it

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
Super vised
We have a way to measure how good our current best answer is Reinforcement
C. Clustering
D. Decomposition
Learning Unsuper vised

MACHINE LEARNING
B. Regression

B. REGRESSION REAL VALUED VARIABLE

B. REGRESSION REAL VALUED VARIABLE

B. REGRESSION REAL VALUED VARIABLE

LINEAR REGRESSION

REGRESSION BY MODELING PROBABILITIES

B. REGRESSION REAL VALUED VARIABLE

B. REGRESSION REAL VALUED VARIABLE

MULTIPLE DIMENSIONS

DEVELOPING MORE COMPLEX ALGORITHMS

MACHINE LEARNING
A. Classification

A. CLASSIFICATION CATEGORICAL VARIABLE

A. CLASSIFICATION CATEGORICAL VARIABLE

A. CLASSIFICATION CATEGORICAL VARIABLE

A. CLASSIFICATION CATEGORICAL VARIABLE

LOGISTIC REGRESSION

DEVELOPING MORE COMPLEX ALGORITHMS

CONFUSION MATRIX BINARY FORCED CHOICE

A. CLASSIFICATION CATEGORICAL VARIABLE
Model 1
predicted
predicted
Model 2
actual
actual

CL ASSIFIC ATION MNIST DATASET

CONFUSION MATRIX

MACHINE LEARNING
C. Clustering

CLASSIFICATION VS CLUSTERING CATEGORICAL VARIABLE

CLASSIFICATION VS CLUSTERING

C. CLUSTERING

C. CLUSTERING

C. CLUSTERING

C. CLUSTERING 1. AGGLOMERATIVE

C. CLUSTERING 1. AGGLOMERATIVE

C. CLUSTERING 1. AGGLOMERATIVE

C. CLUSTERING 1. AGGLOMERATIVE

C. CLUSTERING 1. AGGLOMERATIVE
Dendrogram

C. CLUSTERING 2. DIVISIVE

C. CLUSTERING 2. DIVISIVE

C. CLUSTERING 2. DIVISIVE

C. CLUSTERING 2. DIVISIVE

C. CLUSTERING 3. PARTITIONAL

C. CLUSTERING 3. PARTITIONAL

C. CLUSTERING 3. PARTITIONAL

EXPECTATION MAXIMISATION

MACHINE LEARNING
D. Decomposition

D. DECOMPOSITION 2. PROJECTION METHODS
Dimensionality reduction

D. DECOMPOSITION 2. KERNEL METHODS

D. DECOMPOSITION 3. MANIFOLD LEARNING

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification B. Regression
C. Clustering D. Decomposition

TAXONOM Y
A.
B.
C.
D.

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression Source: Computer Vision: Learning, Models and Inference

DISCRIMINATIVE VS GENERATIVE A SIMPLE EXAMPLE

PARAMETRIC VS NON-PARAMETRIC
— With data gathered from uncontrolled observations on complex systems involving unknown [physical, chemical, biological, social, economic] mechanisms, the a priori assumption that nature would generate the data through a parametric model selected by the statistician can result in questionable conclusions that cannot be substantiated by appeal to goodness-of-fit tests and residual analysis.
— Usually, simple parametric models imposed on data generated by complex systems, for example, medical data, financial data, result in a loss of accuracy and information as compared to algorithmic models
Source: Statistical Science 2001, Vol. 16, No. 3, 199–231 Statistical Modeling: The Two Cultures Leo Breiman

REGUL ARIZ ATION IMPOSING ADDITIONAL CONSTRAINTS

ASSESSING GOODNESS OF FIT

ML PIPELINES
Source: https://epistasislab.github.io/tpot/

ML PIPELINES
FEATURE SELECTION AND AUTOMATION
Source: https://epistasislab.github.io/tpot/

HOMEWORK
Hands-on Machine Learning
Chapter 2: End-to-End Machine Learning Project
Try reading the Chapter from start to finish.We will work through the problem in class but please come prepared to discuss the case study.
It is easier to understand the different stages of a ML project if you follow one from start to finish.

END TO END

TESTING AND VALIDATION
— Generalization of data
— Generalization of feature representation — Generalization of the ML model

TOY VS REAL DATA
— Toy data is useful for exploring behaviour of algorithms
— Demonstrating the advantages and disadvantages of an algorithm — However, best not to use just Toy datasets
— Use real datasets

Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

BOOKS

THINKING ABOUT BUSINESS

WORKING WITH DATA

DESIGNING PREDICTIVE MODELS

PYTHON PROGRAMMING

A – B – C- D
A TAXONOMY OF PROBLEMS
A. ClAssification B. Regression Week 2 – Classification and Regression
Week 3 – Trees and Ensembles
C. Clustering D. Decomposition
Week 5 – Clustering
Week 4 – Kernel spaces and Decomposition

LECTURE 1 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

CS代考程序代写 GMM Bayesian algorithm LECTURE 5 TERM 2:

LECTURE 5 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

MSIN0097
Individual coursework

MSIN0097
Individual Coursework assignment has been extended by one week
to Friday 5th March 2021 at 10:00 am

USING OTHER PEOPLE’S CODE

MACHINE LEARNING JARGON
— Model
— Interpolating / Extrapolating — Data Bias
— Noise / Outliers
— Learning algorithm
— Inference algorithm
— Supervised learning
— Unsupervised learning
— Classification
— Regression
— Clustering
— Decomposition
— Parameters
— Optimisation
— Training data
— Testing data
— Error metric
— Linear model
— Parametric model
— Model variance
— Model bias
— Model generalization
— Overfitting
— Goodness-of-fit
— Hyper-parameters
— Failure modes
— Confusion matrix
— True Positive
— False Negative
— Partition
— Margin
— Data density
— Hidden parameter
— High dimensional space
— Low dimensional space
— Separable data
— Manifold / Decision surface
— Hyper cube / volume / plane

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
C. Clustering
Hidden variables
Density estimation Manifolds
B. Regression
Super vised
D. Decomposition
Subspaces
Unsuper vised

QUES TIONS
— How would I know if my data will be benefitted from a transformation to a higher or lower dimensional space?

CURSE OF DIMENSIONALITY
https://www.nature.com/articles/s41592-018-0019-x

QUES TIONS
— Would I always have to visualize the data at a 2D or 3D level to visually understand if the data can be better separable? (but then this would defeat the idea of going a higher dimensional space which can’t be visualized).

SUMMARY STATISTICS
Anscombe’s quartet

SUMMARY STATISTICS
https://seaborn.pydata.org/examples/scatterplot_matrix.html

QUES TIONS?
— Should I have to go all the way through modelling (e.g. classification) and evaluate a metric such as the Gini coefficient and then go back to comparing different Gini scores from (addition of) extra dimensions?

QUES TIONS?
— I understand that it might be better to go up a dimension in certain cases and other cases it will be better to go lower a dimension?

MULTIPLE MODELS

MSIN0097
K-means

K-MEANS LLOYD–FORGY ALGORITHM

K-MEANS
— Advantages — Disadvantages

ELLIPSOIDAL DISTRIBUTED DATA

MSIN0097
Gaussian mixtures

PARTITIONAL

MIXTURE OF GAUSSIANS (1D)

HIDDEN (LATENT) VARIABLES

MIXTURE OF GAUSSIANS (2D)

GRAPHICAL MODELS GAUSSIAN MIXTURES

PLATE NOTATION
— including its parameters (squares, solid circles, bullet) — random variables (circles)
— conditional dependencies (solid arrows)

FAMILIES OF MODELS
Gaussian mixture T-distribution mixture Factor Analysis

TWO STEP – EM ALGORITHM

EM ALGORITHM

EXPECTATION MAXIMIZATION

MIXTURE OF GAUSSIANS AS MARGINALIZATION

E-S TEP

M-S TEP

EM ALGORITHM

EXPECTATION MAXIMIZATION

MANIPULATING THE LOWER BOUND

LOCAL MAXIMA
Repeated fitting of mixture of Gaussians model with different starting points results in different models as the fit converges to different local maxima.
Log likelihoods are a) 98.76 b) 96.97 c) 94.35, respectively, indicating that (a) is the best fit.

COVARIANCE COMPONENTS
a) Full covariances.
b) Diagonal covariances.
c) Identical diagonal covariances.

LEARNING GMM PSEUDO CODE

ANOMALY DETECTION

BIC AND AIC

GAUSSIAN MIXTURES

BAYESIAN GMMS

CONCENTRATION PRIORS
The more data we have, however, the less the priors matter. In fact, to plot diagrams with such large differences, you must use very strong priors and little data.

TWO MOONS DATA

PROBLEMS WITH MULTI-VARIATE NORMAL DENSITY

MSIN0097
Types of models

GENERATIVE VS DISCRIMINATIVE

CLASSIFICATION (DISCRIMINATIVE)
LOGISTIC REGRESSION REVISITED
MODEL CONTINGENCY OF THE WORLD ON DATA
World state: Linear model Bernoulli distribution
Probability / Decision surface

CLASSIFICATION (GENERATIVE)
GAUSSIAN MIXTURE
MODEL CONTINGENCY OF DATA ON THE WORLD

WHAT SORT OF MODEL SHOULD WE USE?

WHAT SORT OF MODEL SHOULD WE USE? TL;DR NO DEFINITIVE ANSWER
— Inference is generally simpler with discriminative models.
— Generative models calculate this probability via Bayes’ rule, and sometimes this requires a computationally expensive algorithm.
— Generative models might waste modelling power.
The data are generally of much higher dimension than the world, and modelling it is costly. Moreover, there may be many aspects of the data which do not influence the state;
— Using discriminative approaches, it is harder to exploit this knowledge: essentially we have to re-learn these phenomena from the data.
— Sometimes parts of the training or test data vector x may be missing. Here, generative models are preferred.
— It is harder to impose prior knowledge in a principled way in discriminative models.

SUMMARY OF APPROACHES

MSIN0097
Best practice…

BEST PRACTICE…

BEST PRACTICE…

BEST PRACTICE…

BEST PRACTICE…
Source: https://www.marekrei.com/blog/ml-and-nlp-publications-in-2019/
Percentage of papers mentioning GitHub (indicating that the code is made available):
ACL 70%, EMNLP 69%, NAACL 68% ICLR 56%, NeurIPS 46%, ICML 45%, AAAI 31%.
It seems the NLP papers are releasing their code much more freely.

PAPERS WITH CODE
https://paperswithcode.com/

PERCEPTIONS OF PROBABILITY

DEPLO YMEN T

@SOCIAL
@chipro @random_forests @zachar ylipton @yudapearl @svpino @jackclarkSF

TEACHING TEAM
Dr Alastair Moore Senior Teaching Fellow
a.p.moore@ucl.ac.uk
@latticecut
Kamil Tylinski Teaching Assistant
kamil.tylinski.16@ucl.ac.uk
Jiangbo Shangguan Teaching Assistant
j.shangguan.17@ucl.ac.uk
Individual Coursework workshop
to Thursday 11th Feb 2021 at 12:00 am

LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

CS代考程序代写 algorithm LECTURE 4 TERM 2:

LECTURE 4 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

SYSTEMS DESIGN
Original problem

DEALING WITH DIFFICULT PROBLEMS
— Improving bad solutions
– StartwithabadSolution(weaklearner)andimproveit
– Buildupabettersolutionbythinkingabouthowpartialsolutionscan support/correct each others mistakes

DEALING WITH DIFFICULT PROBLEMS
— Improving bad solutions
– StartwithabadSolution(weaklearner)andimproveit
– Buildupabettersolutionbythinkingabouthowpartialsolutionscan support/correct each others mistakes
— Make the problem simpler – Divideandconcur
– Problemdecomposition

DEALING WITH DIFFICULT PROBLEMS
— Improving bad solutions
– StartwithabadSolution(weaklearner)andimproveit
– Buildupabettersolutionbythinkingabouthowpartialsolutionscan support/correct each others mistakes
— Make the problem simpler – Divideandconcur
– Problemdecomposition
— Building much better solutions – Deepmodels

ENSEMBLES
IMPROVING BAD SOLUTIONS
— Start with a bad Solution (weak learner) and improve it
— Build up a better solution by thinking about how partial solutions can support/correct each others mistakes

ENSEMBLES
IMPROVING BAD SOLUTIONS
— Voting
– Majorityvoting
— Bagging and Pasting
– Out-of-bagevaluation
— Boosting
– AdaptiveBoosting(Adaboost) – GradientBoosting
– XGBoost
— Stacking

MAJORITY VOTING

B AGGING

GRADIENT BOOSTING FITTING RESIDUAL ERRORS

DECOMPOSITION
STARTING WITH EASIER PROBLEMS
— Start with a hard Problem
— Break the problem into a lot of easier sub-tasks
— Make each subtask support the analysis in subsequent tasks easier

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
Super vised
C. Clustering
D. Decomposition
Unsuper vised

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
We know what the right answer is
Super vised
C. Clustering
D. Decomposition
Unsuper vised

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
Super vised
C. Clustering
D. Decomposition
Unsuper vised
We don’t know what the right answer is – but we can recognize a good answer if we find it

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
Super vised
C. Clustering
D. Decomposition
Unsuper vised
We don’t know what the right answer is – but we can recognize a good answer if we find it

MOTIVATING DECOMPOSITION

COMPRESSION

D. DECOMPOSITION 2. PROJECTION METHODS
Dimensionality reduction

D. DECOMPOSITION 2. KERNEL METHODS

D. DECOMPOSITION 3. MANIFOLD LEARNING

CURSE OF DIMENSIONALITY

SUBSPACES

MOTIVATING DECOMPOSITION

LOW DIMENSIONAL SUBSPACES

DECOMPOSITION
THREE APPROACHES
— Dimensionality Reduction / Projection — Kernel Methods
— Manifold Learning

B. REGRESSION REAL VALUED VARIABLE

MOTIVATING PROJECTION INSTABILITY

FINDING THE RIGHT DIMENSION

SUBSPACES

PROJECTION IN MULTIPLE DIMENSIONS

REDUCTION TO A SINGLE DIMENSION

COMPRESSION
MNIST 95% VARIANCE PRESERVED

PROBLEMS WITH PROJECTION

PROBLEMS WITH PROJECTION

KERNEL METHODS
Kernel spaces

KERNEL PCA

MANIFOLD METHODS
Manifold learning

MANIFOLD LEARNING

MANIFOLD LEARNING

OTHER TECHNIQUES

LOCAL LINEAR EMBEDDING

DECOMPOSITION METHODS
— Random Projections
— Multidimensional Scaling (MDS) — Isomap
— Linear Discriminant Analysis (LDA)

ADVANTAGES
The main motivations for dimensionality reduction are:
— To speed up a subsequent training algorithm (in some cases it may even remove noise and redundant features, making the training algorithm perform better).
— To visualize the data and gain insights on the most important features. — Simply to save space (compression).

DIS AD VAN TAGES
The main drawbacks are:
— Some information is lost, possibly degrading the performance of subsequent training algorithms.
— It can be computationally intensive.
— It adds some complexity to your Machine Learning pipelines. — Transformed features are often hard to interpret.

WHEN IT DOESN’T WORK
IMPLICIT ASSUMPTION IT MAKES THE PROBLEM EASIER

EMBEDDING PROJECTOR
GOOGLE BRAIN TEAM 2016

CS代考程序代写 UNIVERSITY COLLEGE LONDON Faculty of Engineering Sciences

UNIVERSITY COLLEGE LONDON Faculty of Engineering Sciences
Department of Computer Science
Problem Set: Classification
Dr. Dariush Hosseini (dariush.hosseini@ucl.ac.uk)
1

Notation
Inputs: x=[1,×1,x2,…,xm]T ∈Rm+1
Outputs:
y ∈ R for regression problems
y ∈ {0, 1} for binary classification problems
Training Data:
S = {(x(i), y(i))}ni=1
Input Training Data:
The design matrix, X, is defined as:
 (1)T  (1) (1) x 1×1··xm
 (2) (2) x  1 x · · xm 
(2)T 1
X=·=· · ·· · 
 ·   · · · · ·  x(n)T 1 x(n) · · x(n)
1m
Output Training Data:
y(1)
y(2) y= ·   · 
y(n)
Data-Generating Distribution:
The outcomes of S are drawn i.i.d. from a data-generating distribution, D
Page 2

1. This problem focuses on generative approaches to classification. It begins by asking for ba- sic statements and derivations pertaining to probabilistic classification, before asking you to consider a particular generative model. The model is not one which we discussed in lectures, but is very similar to Naive Bayes. It is known as ‘Linear Discriminant Analysis’ (LDA). You are asked to investigate the discriminant boundaries that emerge from this model. Following this you are asked to consider a slight generalisation of the model with fewer restrictions placed upon the class conditional covariances. This more general model is known as ‘Quadratic Discriminant Analysis’ (QDA). Finally you are asked to consider how these models differ from the Naive Bayes model which we discussed in the lectures. Note throughout how different model assumptions imply different discriminant boundaries and hence different classifiers.
(a) [2 marks]
Describe the generative approach to classification. How does it differ from the dis- criminative approach?
(b) [3 marks]
Derive the Bayes Optimal Classifier for binary classification, assuming misclassifica- tion loss.
(c) [10 marks]
In a binary classification setting, assume that classes are distributed according to a Bernoulli random variable, Y, whose outcomes are y, i.e. y ∼ Bern(θ), where θ = pY (y = 1). Furthermore we model the class conditional probability distributions for the random variable, X , whose outcomes are given by instances of particular input attribute vectors, x = [x1, x2, …, xm]T ∈ Rm, as (note that here we will take care of the bias parameter explicitly, hence the absence of a leading ‘1’ in the attribute vector):
x|(y = 0) ∼ N(μ0,Σ0) where: μ0 ∈ Rm, Σ0 ∈ Rm×m,ΣT0 = Σ0,Σ0 ≻ 0 x|(y = 1) ∼ N(μ1,Σ1) where: μ1 ∈ Rm, Σ1 ∈ Rm×m,ΣT1 = Σ1,Σ1 ≻ 0
The off-diagonal elements of Σ0,Σ1 are not necessarily zero.
Assume that Σ = Σ0 = Σ1 and show that the discriminant boundaries between the
classes can be described by the following expression (you should clearly express w and b):
w·x+b=0 where: w∈Rm andb∈R
(d) [2 marks]
What does this expression describe? Explain.
(e) [4 marks]
Now assume that Σ0 ̸= Σ1. What happens to the discriminant boundaries? Explain.
(f) [4 marks]
Explain how this approach differs from that of Na ̈ıve Bayes.
Page 3

2. This problem focuses on discriminative classification. You begin by considering the Lo- gistic Noise Latent Variable model and use it to motivate the Logistic Regression model, as we do in lectures. Following this you are asked to consider whether changing the pa- rameterisation of the underlying logistic noise will imply a different classification model (it won’t!). Next you are asked to repeat this analysis but for a Gaussian Latent Variable model. The resulting classification model is know as ‘probit regression’. While it is sim- ilar in form to logistic regression, the probit function is less easy to manipulate than the logistic sigmoid, and furthermore has more sensitivity to outliers. Finally you are asked to consider a multinomial extension of the logistic regression model, and in particular to examine the form of the boundaries which exist between classes for this model.
(a) [2 marks]
Describe the discriminative approach to classification. How does it differ from the generative approach?
(b) [3 marks]
Recall that in binary logistic regression, we seek to learn a mapping characterised by the weight vector, w and drawn from the function class, F:
􏰅 􏰂􏰂 1 m+1􏰆 F = fw(x)=I[pY(y=1|x)≥0.5]􏰂pY(y=1|x)= ,w∈R
Here pY(y|x) is the posterior output class probability associated with a data gener- ating distribution, D, which is characterised by the joint distribution pX ,Y (x, y). Provide a motivation for this form of the posterior output class probability pY(y|x) by considering a Logistic Noise Latent Variable Model. Remember that the noise in such a model characterises a random variable ε, with outcomes, ε, which are drawn i.i.d. as follows:
ε ∼ Logistic(a,b) where: a = 0, b = 1
The characteristic probability distribution function for such a variable is:
exp 􏰃− (ε−a) 􏰄 p (ε|a, b) = b
􏰂 1 + e−w·x
ε 􏰃 􏰃 (ε−a)􏰄􏰄2 b 1+exp − b
(c) [3 marks]
If we allow the Logistic parameters to take general values a ∈ R, b > 0 explain the effect which this has on the final logistic regression model.
(d) [4 marks]
Let us assume instead a Gaussian Noise Latent Variable Model. Now ε is drawn i.i.d. as follows:
ε ∼ N(0,1)
Derive an expression for the posterior output class probability pY(y|x) in this case.
(e) [3 marks]
How will the treatment of outliers in the data differ for these two models? Explain.
Page 4

(f) [2 marks]
For K-class multinomial regression, assuming misclassification loss, we can express a discriminative model for the posterior output class probability as:
exp(wj · x)
pY (y = j|x) = 􏰇Kk=1 exp(wk · x)
Where now y ∈ {1, …, K}
Demonstrate that this model reduces to logistic regression when K = 2.
(g) [3 marks]
For K > 2 derive an expression for the discriminant boundaries between classes. What does this expression describe?
Page 5

CS代考程序代写 algorithm MSIN0097

MSIN0097
Predictive Analytics Individual Coursework
A P MOORE

INDIVIDUAL COURSEWORK
¡ª Friday 26th February 2021 ¡ª 60% of module mark
¡ª 2000 words

BRIEF
The individual coursework task is to identify a dataset and explore building a predictive model using the methods and techniques presented in the first 5 weeks of the course.
There are six main steps:
1. Obtain a dataset and explain the problem you are trying to solve.
This will characterise the type of predictive model you can build
2. Explore the data to gain insights.
Visualize and explain the main trends in the data
3. Prepare the data to better expose the underlying data patterns to Machine Learning algorithms.
4. Explore different models and shortlist the best ones.
5. Fine-tune your models and combine them into a better solution.
6. Present your final solution with any summary conclusions.

GUIDANCE

END- TO-END

NOTEBOOK

DATASETS
Useful places for ML datasets:
¡ª Tabular & cleaned: https://github.com/EpistasisLab/pmlb/tree/master/datasets ¡ª By domain: https://datasetlist.com
¡ª By application: https://github.com/awesomedata/awesome-public-datasets ¡ª Search engine: https://datasetsearch.research.google.com
@rasbt

CRISP CYCLE
DATA DEVELOPMENT LIFECYCLES
1
2
3
4 5

GUIDANCE
Fast First Pass
Make a first-pass through the project steps as fast as possible.This will give you confidence that you have all the parts that you need and a baseline from which to improve.
Cycles – The process in not linear but cyclic. You will loop between steps, and probably spend most of your time in tight loops between steps 3-4 or 3-4-5 until you achieve a level of accuracy that is sufficient or you run out of time.
The write up in the final submitted Notebook can be more linear – you do not need to include all of your work, ie. including all dead-ends, and it should be concise and consistent.

GUIDANCE
Attempt Every Step
It is easy to skip steps, especially if you are not confident or familiar with the tasks of that step.Try and do something at each step in the process, even if it does not improve accuracy.You can always build upon it later. Don¡¯t skip steps, just reduce their contribution to your final submission as necessary.
Ratchet Accuracy
The goal of the project is to achieve good model performance (which ever metric you use to measure this). Every step contributes towards this goal.
Set some simple benchmarks early on.Treat changes that you make as experiments that potentially increase accuracy.
Performance is a ratchet that can only move in one direction (better, not worse).

GUIDANCE
Adapt As Needed
Modify the steps as you need on a project, especially as you become more experienced with using the Notebook.
The final submitted Notebook does not need to preserve the suggested structure if you think something else is more appropriate.

A NOTE ON GRADES

KEY DATES
¡ª Submission Friday 26th February 2021, 10 am

TEACHING SUPPORT
Kamil Tylinski Teaching Assistant
kamil.tylinski.16@ucl.ac.uk
Jiangbo Shangguan Teaching Assistant
j.shangguan.17@ucl.ac.uk
Bartos Kultys Teaching Assistant
bartosz.kultys.18@ucl.ac.uk
Editha Nemsic
Teaching Assistant
editha.nemsic.19@ucl.ac.uk
Dr Viviana Culmone Teaching Assistant
v.culmone@ucl.ac.uk
Walter Hernandez
Teaching Assistant
walter.hernandez.18@ucl.ac.uk

CS代考程序代写 Skip to content

Skip to content

Sign up

ageron
/
handson-ml
• Notifications
• Star 
22.4k 

• Fork 
12k 


Code

Issues
123

Pull requests
17

Actions

Projects

Security


Insights
More
master
handson-ml/01_the_machine_learning_landscape.ipynb
Go to file


ageron Add missing import, fixes #510

Latest commit 2adec01 Nov 10, 2019
History
2 contributors


3007 lines (3007 sloc) 274 KB

Raw
Blame

© 2021 GitHub, Inc.
Terms
Privacy
Security
Status
Docs
Contact GitHub
Pricing
API
Training
Blog
About

CS代考程序代写 Skip to content

Skip to content

Sign up

ageron
/
handson-ml
• Notifications
• Star 
22.4k 

• Fork 
12k 


Code

Issues
123

Pull requests
17

Actions

Projects

Security


Insights
More
master
handson-ml/08_dimensionality_reduction.ipynb
Go to file


ageron Crop long outputs to make it easier to visualize the notebooks on git…

Latest commit 2d5d735 Apr 23, 2019
History
3 contributors



5.51 MB
Download

© 2021 GitHub, Inc.
Terms
Privacy
Security
Status
Docs
Contact GitHub
Pricing
API
Training
Blog
About

CS代考程序代写 Skip to content

Skip to content

Sign up

ageron
/
handson-ml
• Notifications
• Star 
22.4k 

• Fork 
12k 


Code

Issues
123

Pull requests
17

Actions

Projects

Security


Insights
More
master
handson-ml/03_classification.ipynb
Go to file


ageron import matplotlib as mpl and use mpl.rc()

Latest commit 5bb8d2e Jan 8, 2019
History
4 contributors




4480 lines (4480 sloc) 434 KB

Raw
Blame

© 2021 GitHub, Inc.
Terms
Privacy
Security
Status
Docs
Contact GitHub
Pricing
API
Training
Blog
About

CS代考程序代写 Skip to content

Skip to content

Sign up

ageron
/
handson-ml
• Notifications
• Star 
22.4k 

• Fork 
12k 


Code

Issues
123

Pull requests
17

Actions

Projects

Security


Insights
More
master
handson-ml/05_support_vector_machines.ipynb
Go to file


ageron Crop long outputs to make it easier to visualize the notebooks on git…

Latest commit 2d5d735 Apr 23, 2019
History
2 contributors


2938 lines (2938 sloc) 861 KB

Raw
Blame

© 2021 GitHub, Inc.
Terms
Privacy
Security
Status
Docs
Contact GitHub
Pricing
API
Training
Blog
About