# CS代考 Lecture 3: K-Nearest Neighbors – cscodehelp代写

Lecture 3: K-Nearest Neighbors

Introduction to Machine Learning Semester 2, 2021

Copyright @ University of Melbourne 2021. All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the author.

Acknowledgement:

Last time… Machine Learning concepts

• data, features, classes • models, training

• practical considerations

Today… Our first machine learning algorithm

• K-nearest neighbors

• Application to classification • Application to regression

Also: the topic of your first assignment!

• Released on Canvas on Wed 4th August at 7pm!

• Questions: assignment1 discussion board (don’t share solutions!)

Introduction

K-Nearest Neighbors: Example

Your ’photographic memory’ of all handwritten digits you’ve ever seen:

K-Nearest Neighbors: Example

Your ’photographic memory’ of all handwritten digits you’ve ever seen:

Given a new drawing, determine the digit by comparing it to all digits in your ’memory’.

K-Nearest Neighbors: Visualization

K nearest neighbors = K closest stored data points

K-Nearest Neighbors: Visualization

1 nearest neighbor = single closest stored data point

K-Nearest Neighbors: Visualization

4 nearest neighbors = 4 closest stored data points

K-Nearest Neighbors: Algorithm

• Store all training examples

• Compute distance of test instance to all training data points

• Find the K closest training data points (nearest neighbors)

• Compute target concept of the test instance based on labels of the training instances

K-Nearest Neighbors: Target Concepts

KNN Classification

• Return the most common class label among neighbors • Example: cat vs dog images; text classification; …

KNN Regression

• Return the average value of among K nearest neighbors • Example: housing price prediction;

Four problems

1. How to represent each data point?

2. How to measure the distance between data points? 3. What if the neighbors disagree?

4. HowtoselectK?

Feature Vectors

A data set of 6 instances (a…f) with 4 features and a label

Feature Vectors

A data set of 6 instances (a…f) with 4 features and a label

We can represent each instance as a feature vector

Feature (or attribute) Types

Recall, from last lecture?

1. Nominal

• set of values with no intrinsic ordering • possibly boolean

2. Ordinal

• explicitly ordered

3. Numerical

• real-valued, often no upper bound, easily mathematical manipulatable • vector valued

1. How to represent each data point?

2. How to measure the distance between data points? 3. What if the neighbors disagree?

4. HowtoselectK?

Comparing Nominal Feature Vectors

First, we convert the nominal features into numeric features.

apple banana cherry

red yellow red

round curved round

taste size

sweet – sweet – sweet small

red yellow round sweet curved small

apple 1 0 1 1 0 ? banana 0 1 0 1 1 ? cherry 1 0 1 1 0 1

Comparing Nominal Features: Hamming Distance

instance features

red yellow round sweet curved small

apple 1 0 1 1 0 ? banana 0 1 0 1 1 ? cherry 1 0 1 1 0 1

The number of differing elements in two ‘strings’ of equal length.

Comparing Nominal Features: Hamming Distance

instance features

red yellow round sweet curved small

apple 1 0 1 1 0 ? banana 0 1 0 1 1 ? cherry 1 0 1 1 0 1

The number of differing elements in two ‘strings’ of equal length.

d (apple, banana) = 4

Comparing Nominal Features: Simple Matching Distance

instance features

red yellow round sweet curved small

apple 1 0 1 1 0 ? banana 0 1 0 1 1 ? cherry 1 0 1 1 0 1

The number of matching features divided by the number of all features in the sample

• d: distance

d = 1 − mk • k : number of matching features

• m: total number of features

Comparing Nominal Features: Simple Matching Distance

instance features

red yellow round sweet curved small

apple 1 0 1 1 0 ? banana 0 1 0 1 1 ? cherry 1 0 1 1 0 1

The number of matching features divided by the number of all features in the sample

• d: distance

d = 1 − mk • k : number of matching features

• m: total number of features

d (apple, banana) = 1− 62 = 46

Comparing Nominal Feature Vectors: Jaccard Distance

instance features

red yellow round sweet curved small

apple 1 0 1 1 0 ? banana 0 1 0 1 1 ? cherry 1 0 1 1 0 1

Jaccard similarity J: intersection of two sets divided by their union. d=1−J

= 1 − |A ∩ B| |A∪B|

= 1 − |A ∩ B| |A|+|B|−|A∩B|

Comparing Nominal Feature Vectors: Jaccard Distance

instance features

red yellow round sweet curved small

apple 1 0 1 1 0 ? banana 0 1 0 1 1 ? cherry 1 0 1 1 0 1

Jaccard similarity J: intersection of two sets divided by their union. d=1−J

= 1 − |A ∩ B| |A∪B|

= 1 − |A ∩ B| |A|+|B|−|A∩B|

d (apple, banana) = 1− 51 = 45

Comparing Numerical Feature Vectors: Manhattan Distance

Manhattan Distance (or: L1 distance)

• Given two instances a and b, each with a set of numerical features, e.g., a = [2.0, 1.4, 4.6, 5.5]

b = [1.0, 2.4, 6.6, 2.5]

• Their distance d is the sum of absolute differences of each feature

d(a,b)=|ai −bi| (1)

d(a,b) = |2.0−1.0|+|1.4−2.4|+|4.6−6.6|+|5.5−2.5| =1+1+2+3

Comparing Numerical Feature Vectors: Euclidean Distance

Euclidean Distance (or: L2 distance)

• Given two instances a and b, each with a set of numerical features, e.g., a = [2.0, 1.4, 4.6, 5.5]

b = [1.0, 2.4, 6.6, 2.5]

• Their distance d is the distance in Euclidean space (2-dimensional space). Defined as the squared root of the sum of squared differences of

each feature

(ai −bi)2 (2)

d(a,b) = (2.0 − 1.0)2 + (1.4 − 2.4)2 + (4.6 − 6.6)2 + (5.5 − 2.5)2 √√

= 1+1+4+9= 15 = 3.87

Comparing Numerical Feature Vectors: Cosine distance

Cosine Distance

• Cosine similarity = cosine of angle between two vectors (= inner product of the normalized vectors)

• Cosine distance d: one minus cosine similarity a·b i aibi

cos(a, b) = |a||b| = i ai2i bi2 d(a, b) = 1 − cos(a, b)

• Cosine distance is normalized by the magnitude of both feature vectors, i.e., we can compare instances of different magnitude

→ word counts: compare long vs short documents → pixels: compare high vs low resolution images

Comparing Numerical Feature Vectors: Cosine distance

cos(a, b) = |a||b| = i ai2i bi2

feature doc1 doc2 doc3

d(a, b) = 1 − cos(a, b)

word1 word2 word3

200 300 300 200 200 100

200×300+300×200+200×100 cos(doc1, doc2) = √ √

2002 + 3002 + 2002 3002 + 2002 + 1002 d(doc1, doc2) = 0.07

300×50+200×40+100×25 cos(doc2, doc3) = √ √

3002 +2002 +1002 502 +402 +252 d(doc2, doc3) = 0.01

Comparing Ordinal Feature Vectors

Normalized Ranks

• sort values, and return a rank r ∈ {0…m}

• map ranks to evenly spaced values between 0 and 1

• compute a distance function for numeric features (e.g., Euclidean

Example: Customer ratings

1. Sorted ratings: { -2: , 2.Ranks: { 0,

-1: , 1,

1: , 2: }

Comparing Ordinal Feature Vectors

Normalized Ranks

• sort values, and return a rank r ∈ {0…m}

• map ranks to evenly spaced values between 0 and 1

• compute a distance function for numeric features (e.g., Euclidean

Example: Customer ratings

1. Sorted ratings: { -2: 2.Ranks: { 0,

, -1: , 1,

1: , 2: }

Four problems

1. How to represent each data point?

2. How to measure the distance between data points? 3. What if the neighbors disagree?

4. HowtoselectK?

Majority Voting

Equal weights (=majority vote)

Voting Example (k=4)

• w1 = w2 = w3 = w4 = 1

Majority Voting

Equal weights (=majority vote)

Voting Example (k=4)

• w1 = w2 = w3 = w4 = 1

blue: 1+1+1=3

Weighted KNN: Inverse Distance

Inverse Distance wj= 1

dj +ε with ε ≈ 0, e.g., 1e − 10

Voting Example (k=4)

• d1=0; d2=1;d3=d4=1.5 • ε=1e−5

Weighted KNN: Inverse Distance

Inverse Distance wj= 1

dj +ε with ε ≈ 0, e.g., 1e − 10

Voting Example (k=4)

• d1=0; d2=1;d3=d4=1.5

red: 1 = 100000

blue: 1 + 1 + 1 1+ε 1.5+ε 1.5+ε

= 1.0 + 0.67 + 0.67 = 2.34

Weighted K-NN: Inverse Linear Distance

Inverse Linear distance

wj = dk −dj dk −d1

d1 = min d among neighbors dk = max d among neighbors dj = distance of jth neighbor

Voting Example (k=4)

• d1=0; d2=1;d3=d4=1.5

Weighted K-NN: Inverse Linear Distance

Inverse Linear distance

wj = dk −dj dk −d1

d1 = min d among neighbors dk = max d among neighbors dj = distance of jth neighbor

Voting Example (k=4)

• d1=0; d2=1;d3=d4=1.5

red: 1.5−0 =1 1.5−0

blue: 1.5−1 + 1.5−1.5 + 1.5−1.5 = 0.3+0+0 = 0.3 1.5−0 1.5−0 1.5−0

Four problems

1. How to represent each data point?

2. How to measure the distance between data points? 3. What if the neighbors disagree?

4. HowtoselectK?

Selecting the value of K

Selecting the value of K

Selecting the value of K

• jagged decision boundary

• we capture noise

• lower classifier performance

• smooth decision boundary

• danger of grouping together unrelated classes

• also: lower classifier performance!

• what if K == N? (N=number of training instances)

Draw validation error:

Breaking Ties

What if more than K neighbors have the same (smallest) distance?

• select at random

• change the distance metric

What if two classes are equally likely given the current neighborhood?

• avoid an even K

• random tie breaking

• include K + 1th neighbor

• use class with highest prior probability

pollev.com/comp90049

• Intuitive and simple

• No assumptions

• Supports classification and regression

• No training: new data → evolve and adapt immediately

• How to decide on best distance functions? • How to combine multiple neighbors?

• HowtoselectK?

• Expensive with large (or growing) data sets

Lazy vs Eager Learning

Lazy Learning (also known as Instance-based Learning)

• store the training data

• fixed distance function

• fixed prediction rule (majority, weighting, …) • compare test instances with stored instances • no learning

Lazy vs Eager Learning

Eager Learning

• train a model using labelled training instances

• the model will generalize from seen data to unseen data • use the model to predict labels for test instances

• we will look at a variety of eager models and their learning algorithms over the next couple of weeks

Today… Our first machine learning algorithm

• K-nearest neighbors

• Application to classification • Application to regression

Also: the topic of your first assignment!

Next: Probabilities (recap) and probabilistic modeling

Further Reading

• Data Mining: Concepts and Techniques, 2nd ed., and , , 2006. Chapter 2, Chapter 9.5.

• The elements of statistical learning, 2nd ed., , and . : Springer series in statistics, 2001. Chapter 2.3.2