# CS代考 CSC 311: Introduction to Machine Learning – cscodehelp代写

CSC 311: Introduction to Machine Learning

Lecture 1 – Introduction

Anthony Bonner &

Based on slides by Amir-massoud Farahmand & Emad A.M. Andrews

Intro ML (UofT) CSC311-Lec1 1 / 53

This course

Broad introduction to machine learning

First half: algorithms and principles for supervised learning

nearest neighbors, decision trees, ensembles, linear regression, logistic regression

Unsupervised learning: PCA, K-means, mixture models

Basics of reinforcement learning

Coursework is aimed at advanced undergrads. We will use multivariate calculus, probability, and linear algebra.

Intro ML (UofT) CSC311-Lec1

Do I have the appropriate background?

This is where we use all the math that you learned, and then some!

Linear algebra (MAT223/240): vector/matrix manipulations, properties.

Calculus (MAT232): partial derivatives/gradient

Probability, Statistics (STA256): common distributions, Bayes Rule; expectation, variance, covariance, median; maximum likelihood.

Geometry and geometric intuition.

Recommend preparation: Numerical Methods (CSC338) or Computational Statistics

Mathematical maturity will be assumed: you should be able to write and understand rigorous proofs.

Intro ML (UofT) CSC311-Lec1

Course Information

Recommended readings will be given for each lecture. But the following will be useful throughout the course:

, , and , The Elements of Statistical Learning, Second Edition, 2009.

. Bishop, Pattern Recognition and Machine Learning, 2006

. Sutton and . Barto, Reinforcement Learning: An Introduction, Second Edition, 2018.

, and , Deep Learning, 2016 , Machine Learning: A Probabilistic Perspective, 2012.

, , , and , An Introduction to Statistical Learning, 2017.

-Shwartz and -David, Understanding Machine Learning: From Theory to Algorithms, 2014.

Kay, Information Theory, Inference, and Learning Algorithms, 2003.

There are lots of freely available, high-quality ML resources.

Intro ML (UofT) CSC311-Lec1 4 / 53

Requirements and Marking

Three (3) assignments

Combination of pen & paper derivations and programming exercises

Due Oct 8, Nov 8, Dec 7 at 11:59pm

October 25, 8–9pm.

Final Exam

Two hours

Date and time TBA

Intro ML (UofT) CSC311-Lec1 5 / 53

More on Assignments

Collaboration on the assignments is not allowed. Each student is responsible for their own work. Discussion of assignments should be limited to clarification of the handout itself, and should not involve any sharing of pseudocode or code or simulation results. Violation of this policy is grounds for a semester grade of F, in accordance with university regulations.

The schedule of assignments will be posted on the course webpage.

Extensions will be granted only in special situations, and you will need a Student Medical Certificate or a written request approved by the course coordinator at least one week before the due date.

Intro ML (UofT) CSC311-Lec1 6 / 53

Related Courses and a Note on the Content

More advanced ML courses such as CSC413 (Neural Networks and Deep Learning) and CSC412 (Probabilistic Learning and Reasoning) both build upon the material in this course.

If you’ve already taken an applied statistics course, there will be some overlap.

Warning: This is not a very difficult course, but it has a lot of content. To be successful, you need to work hard. For example, don’t start working on your homework assignments just two days before the deadline.

Intro ML (UofT) CSC311-Lec1 7 / 53

What is learning?

“The activity or process of gaining knowledge or skill by studying, practicing, being taught, or experiencing something.”

dictionary

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

Intro ML (UofT) CSC311-Lec1 8 / 53

What is machine learning?

For many problems, it is difficult to program the correct behaviour by hand

recognizing people and objects

understanding human speech

Machine learning approach: program an algorithm to automatically learn from data, or from experience Why might you want to use a learning algorithm?

hard to code up a solution by hand (e.g. vision, speech)

system needs to adapt to a changing environment (e.g. spam

detection)

want the system to perform better than the human programmers

privacy/fairness (e.g. ranking search results)

Intro ML (UofT) CSC311-Lec1

What is machine learning?

It is similar to statistics…

Both fields try to uncover patterns in data

Both fields draw heavily on calculus, probability, and linear algebra,

and share many of the same core algorithms But it is not statistics!

Stats is more concerned with helping scientists and policymakers draw good conclusions; ML is more concerned with building autonomous agents

Stats puts more emphasis on interpretability and mathematical rigor; ML puts more emphasis on predictive performance, scalability, and autonomy

Intro ML (UofT) CSC311-Lec1

Relations to AI

Nowadays, “machine learning” is often brought up with “artificial intelligence” (AI)

AI does not often imply a learning based system

Symbolic reasoning

Rule based system

Tree search etc.

Learning based system → learned based on the data → more flexibility, good at solving pattern recognition problems.

Intro ML (UofT) CSC311-Lec1

Relations to human learning

Human learning is:

Very data efficient

An entire multitasking system (vision, language, motor control, etc.)

Takes at least a few years 🙂

For serving specific purposes, machine learning doesn’t have to look like human learning in the end.

It may borrow ideas from biological systems, e.g., neural networks. It may perform better or worse than humans.

Intro ML (UofT) CSC311-Lec1 12 / 53

What is machine learning?

Types of machine learning

Supervised learning: access to labeled examples of the correct

Reinforcement learning: learning system (agent) interacts with

the world and learn to maximize a reward signal

Unsupervised learning: no labeled examples – instead, looking

for “interesting” patterns in the data

Intro ML (UofT) CSC311-Lec1 13 / 53

History of machine learning

ML conferences selling out like Beyonce tickets.

Source: medium.com/syncedreview/

Intro ML (UofT) CSC311-Lec1 14 / 53

History of machine learning

ML conferences selling out like Beyonce tickets.

Intro ML (UofT) CSC311-Lec1 14 / 53

Computer vision: Object detection, semantic segmentation, pose estimation, and almost every other task is done with ML.

Instance segmentation –

Intro ML (UofT)

CSC311-Lec1 15 / 53

Speech: Speech to text, personal assistants, speaker identification…

Intro ML (UofT) CSC311-Lec1 16 / 53

NLP: Machine translation, sentiment analysis, topic modeling, spam filtering.

Intro ML (UofT) CSC311-Lec1 17 / 53

Playing Games

Intro ML (UofT) CSC311-Lec1 18 / 53

E-commerce & Recommender Systems : Amazon, Netflix, …

Intro ML (UofT) CSC311-Lec1 19 / 53

Why this class?

Why not jump straight to CSC412/413, and learn neural nets first?

The principles you learn in this course will be essential to really understand neural nets.

The techniques in this course are still the first things to try for a new ML problem.

For example, you should try applying logistic regression before building a deep neural net!

There is a whole world of probabilistic graphical models.

Intro ML (UofT) CSC311-Lec1 20 / 53

Why this class?

2017 Kaggle survey of data science and ML practitioners: what data science methods do you use at work?

Intro ML (UofT) CSC311-Lec1 21 / 53

Implementing machine learning systems

You will often need to derive an algorithm (with pencil and paper), and then translate the math into code.

Array processing (NumPy)

vectorize computations (express them in terms of matrix/vector operations) to exploit hardware efficiency

This also makes your code cleaner and more readable!

Intro ML (UofT) CSC311-Lec1 22 / 53

Preliminaries and Nearest Neighbourhood Methods

Intro ML (UofT) CSC311-Lec1 23 / 53

Introduction

Today (and for the next 5-6 lectures) we focus on supervised learning. This means we are given a training set consisting of inputs and

corresponding labels, e.g.

object recognition image captioning document classification speech-to-text

Inputs image image

text audio waveform

Labels object category caption document category text

Intro ML (UofT)

CSC311-Lec1

Supervised Learning Example

What an image looks like to the computer:

[Image credit: ]

Intro ML (UofT) CSC311-Lec1 25 / 53

Representing the Input

Machine learning algorithms need to handle lots of types of data: images, text, audio waveforms, credit card transactions, etc.

Common strategy: represent the input as an input vector in Rd

Representation = mapping to another space that is easy to manipulate

Vectors are a great representation since we can do linear algebra

Intro ML (UofT) CSC311-Lec1

Supervised Learning Setup

Mathematically, our training set consists of a collection of pairs of an input vector x ∈ Rd and its corresponding target, or label, t

Regression: t is a real number (e.g. stock price)

Classification: t is an element of a discrete set {1, . . . , C }

These days, t is often a highly structured object (e.g. image)

Denote the training set {(x(1), t(1)), . . . , (x(N), t(N))}

Note: these superscripts have nothing to do with exponentiation!

Intro ML (UofT) CSC311-Lec1 27 / 53

Input Vectors

Can use raw pixels:

Can do much better if you compute a vector of meaningful features.

Intro ML (UofT) CSC311-Lec1 28 / 53

Nearest Neighbour

Our first supervised learning model!

Suppose we’re given a novel input vector x we’d like to classify.

The idea: find the nearest input vector to x in the training set and copy its label.

Intro ML (UofT) CSC311-Lec1 29 / 53

Nearest Neighbor: Example

Example: We would like to classify which Beetle is in this image x:

We find the closest image x(i) in the training set, and look at its label.

Intro ML (UofT) CSC311-Lec1 30 / 53

Nearest Neighbor: Example

Example: We would like to classify which Beetle is in this image x:

We find the closest image x(i) in the training set, and look at its label.

Intro ML (UofT) CSC311-Lec1 30 / 53

Nearest Neighbor: Example

Example: We would like to classify which Beetle is in this image x:

We find the closest image x(i) in the training set, and look at its label.

Intro ML (UofT) CSC311-Lec1 30 / 53

Nearest Neighbor: Example

Example: We would like to classify which Beetle is in this image x:

We find the closest image x(i) in the training set, and look at its label.

But how do we determine what is the “closest” image?

Intro ML (UofT) CSC311-Lec1 30 / 53

The 1-Nearest Neighbor Algorithm

Can formalize “nearest” in terms of Euclidean distance

||x(a) − x(b)||2 = (x(a) − x(b))2

Algorithm:

input x (that we would like to classify),

training set (x(1), t(1)), (x(2), t(2)), . . . .

Find an example (x(∗),t(∗)) in the training set closest to x,

i.e. that minimizes distance(x(∗), x). 2. Output y = t(∗)

Note: we do not need to compute the square root. Why?

Intro ML (UofT) CSC311-Lec1

Nearest Neighbors: Decision Boundaries

We can visualize the behaviour in the classification setting using a Voronoi diagram.

Here, our data set x ∈ R2 (e.g. a 2-pixel image), and there are two possible labels t either red or blue.

Intro ML (UofT) CSC311-Lec1 32 / 53

Nearest Neighbors: Decision Boundaries

Decision boundary: the boundary between regions of input space assigned to different categories.

Intro ML (UofT) CSC311-Lec1 33 / 53

Nearest Neighbors: Decision Boundaries

Example: Decision boundary in a 3D data set.

Intro ML (UofT) CSC311-Lec1 34 / 53

Estimating Model Accuracy

Typically, we want to have some sense of how well our model performs.

For example, we want to use this model in a real-world application, and want to know how reliable the predictions will be.

There are several ways to measure model performance. The simplest is the accuracy mesaure:

Accuracy = Number of Correct Predictions Total Number of Predictions

Intro ML (UofT) CSC311-Lec1 35 / 53

The Test Set

What might be an issue with computing the accuracy using the training set?

Intro ML (UofT) CSC311-Lec1 36 / 53

The Test Set

What might be an issue with computing the accuracy using the training set? We will always get perfect performance in our 1-nearest neighbor model

(a data point is always closest to itself!)

Not a realistic representation of how our model will perform on unseen data.

Intro ML (UofT) CSC311-Lec1 36 / 53

The Test Set

What might be an issue with computing the accuracy using the training set? We will always get perfect performance in our 1-nearest neighbor model

(a data point is always closest to itself!)

Not a realistic representation of how our model will perform on unseen data.

Solution: set aside a test set (from our set of labeled data) so we can estimate how well our model will generalize.

Intro ML (UofT) CSC311-Lec1 36 / 53

Nearest Neighbors

[Pic by ]

Nearest neighbors sensitive to noise or mis-labeled data (“class noise”). Solution?

Intro ML (UofT) CSC311-Lec1 37 / 53

k-Nearest Neighbors

[Pic by ]

Nearest neighbors sensitive to noise or mis-labeled data (“class noise”). Solution?

Smooth by having k nearest neighbors vote

Intro ML (UofT) CSC311-Lec1 37 / 53

the K-Nearest neighbors Algorithm

Algorithm (kNN):

1. Find k examples {x(i),t(i)} closest to the test instance x

2. Classification output is majority class k

y = argmax I{t(z) = t(i)}

I{statement} is the identity function and is equal to one whenever the statement is true. We could also write this as δ(t(z), t(i)) with δ(a, b) = 1 if a = b, 0 otherwise. I{1}.

Intro ML (UofT) CSC311-Lec1 38 / 53

K-Nearest neighbors (k=1)

[Image credit: ”The Elements of Statistical Learning”]

Intro ML (UofT) CSC311-Lec1 39 / 53

K-Nearest neighbors (k=15)

[Image credit: ”The Elements of Statistical Learning”]

Intro ML (UofT) CSC311-Lec1 40 / 53

How to choose k? Small k

Good at capturing fine-grained patterns

May overfit, i.e. be sensitive to random idiosyncrasies in the

training data Large k

Makes stable predictions by averaging over lots of examples

May underfit, i.e. fail to capture important regularities

Balancing k:

The optimal choice of k depends on the number of data points n.

Nice theoretical properties if k → ∞ and nk → 0. 2

Rule of thumb: Choose k = n2+d .

We will explain another way to choose k using our data.

Intro ML (UofT) CSC311-Lec1 41 / 53

Underfitting and Overfitting

[Image credit: https://cambridgecoding.wordpress.com/2016/03/24/

misleading- modelling- overfitting- cross- validation- and- the- bias- variance- trade- off/

Intro ML (UofT) CSC311-Lec1 42 / 53

Choice of k: Generalization

We would like our algorithm to generalize to data it hasn’t seen before.

We can measure the generalization error (error rate on new examples) using the test set.

[Image credit: ”The Elements of Statistical Learning”]

Intro ML (UofT) CSC311-Lec1 43 / 53

The problem with using the test set

Here, k is an example of a hyperparameter, something we can’t fit as part of the learning algorithm itself.

What are potential issues with using the test set to choose k?

Intro ML (UofT) CSC311-Lec1 44 / 53

The problem with using the test set

Here, k is an example of a hyperparameter, something we can’t fit as part of the learning algorithm itself.

What are potential issues with using the test set to choose k?

The test set is meant to simulate unseen data, in order to estimate how

well the model generalizes.

We therefore shouldn’t use the test set to make any decisions about any aspects of the model!

Keep the test set unseen, so that we can estimate model accuracy.

Intro ML (UofT) CSC311-Lec1 44 / 53

The problem with using the test set

Here, k is an example of a hyperparameter, something we can’t fit as part of the learning algorithm itself.

What are potential issues with using the test set to choose k?

The test set is meant to simulate unseen data, in order to estimate how

well the model generalizes.

We therefore shouldn’t use the test set to make any decisions about any aspects of the model!

Keep the test set unseen, so that we can estimate model accuracy. What can we do instead?

Intro ML (UofT) CSC311-Lec1 44 / 53

Validation and Test Sets

Partition the data into three subsets

Training set: used to train the model

Validation set: used to tune hyperparameters

Test set: used to evaluate final performance once, after hyperparameters are chosen

Intro ML (UofT) CSC311-Lec1 45 / 53

Pitfalls: Normalization

Nearest neighbors can be sensitive to the ranges of different features. Often, the units are arbitrary:

Simple fix: normalize each dimension to be zero mean and unit variance. I.e., compute the mean μj and standard deviation σj, and take

x ̃ j = x j − μ j σj

Caution: depending on the problem, the scale might be important!

Intro ML (UofT) CSC311-Lec1 46 / 53

Pitfalls: Computational Cost

Number of computations at training time: 0

Number of computations at test time, per query (na ̈ıve algorithm)

Calculuate D-dimensional Euclidean distances with N data points: O(ND)

Sort the distances: O(N log N )

This must be done for each query, which is very expensive by the

standards of a learning algorithm!

Need to store the entire dataset in memory!

Tons of work has gone into algorithms and data structures for efficient nearest neighbors with high dimensions and/or large datasets.

Intro ML (UofT) CSC311-Lec1 47 / 53

Example: Digit Classification

Decent performance when lots of data

[SlideIcnrterodiMt:LD(.UCoflTau)s] CSC311-Lec1 48/53

Example: Digit Classification

KNN can perform a lot better with a good similarity measure.

Example: shape contexts for object recognition. In order to achieve invariance to image transformations, they tried to warp one image to match the other image.

Distance measure: average distance between corresponding points on warped images

Achieved 0.63% error on MNIST, compared with 3% for Euclidean KNN. Competitive with conv nets at the time, but required careful engineering.

[Belongie, Malik, and Puzicha, 2002. Shape matching and object recognition using shape contexts.]

Intro ML (UofT) CSC311-Lec1 49 / 53

Example: 80 Million Tiny Images

80 Million Tiny Images was the first extremely large image dataset. It consisted of color images scaled down to 32×32.

With a large dataset, you can find much better semantic matches, and KNN can do some surprising things.

Note: this required a carefully chosen similarity metric.

[Torralba, Fergus, and Freeman, 2007. 80 Million Tiny Images.]

Intro ML (UofT) CSC311-Lec1 50 / 53

Example: Colorizing 80 Million Tiny Images

[Torralba, Fergus, and Freeman, 2007. 80 Million Tiny Images.]

Intro ML (UofT) CSC311-Lec1 51 / 53

Conclusions

Simple algorithm that does all its work at test time — in a sense, no learning!

Can be used for regression too, which we encounter later. Can control the complexity by varying k

Next time: decision trees, another approach to regression and classification

Intro ML (UofT) CSC311-Lec1 52 / 53

Questions?

Intro ML (UofT) CSC311-Lec1 53 / 53