# CS代考 WiSe 2021/22 – cscodehelp代写

WiSe 2021/22

Machine Learning 1/1-X

Lecture 1 Intro + Bayes

Copyright By cscodehelp代写 加微信 cscodehelp

Introduction: Machine Learning

What is Machine Learning?

� Designing algorithms (machines) that efficiently convert finite sets of data (observations) into models with predictive capability.

Why Machine Learning?

� Autonomous Decision Making: Substitute/support human decision making for achieving gains in efficiency in practical applications.

� Knowledge Discovery: Finding laws that explain empirical phenomena.

Autonomous Decision Making

Modern economies involve taking good decisions in com- plex multi-dimensional problems. In many cases, for effi- ciency reasons, it is necessary that these decisions are taken autonomously, or with the help of predictive models.

� Supply Chain Management (demand forecasting, planning)

� End-User Services (recommendations, personalization,

translation, monitoring)

� Manufacturing

(quality control, materials

optimization)

� Finance/Insurance

(risk management, forecasting)

Autonomous Decision Making

Modern economies involve taking good decisions in com- plex multi-dimensional problems. In many cases, for effi- ciency reasons, it is necessary that these decisions are taken autonomously, or with the help of predictive models.

� Supply Chain Management

(demand forecasting, planning)

� End-User Services (recommendations, personalization,

translation, monitoring)

� Manufacturing

(quality control, materials

optimization)

� Finance/Insurance

(risk management, forecasting)

Demand Forecasting

Source: IBM

Autonomous Decision Making

Modern economies involve taking good decisions in com- plex multi-dimensional problems. In many cases, for effi- ciency reasons, it is necessary that these decisions are taken autonomously, or with the help of predictive models.

� Supply Chain Management (demand forecasting, planning)

� End-User Services (recommendations, personalization,

translation, monitoring)

� Manufacturing

(quality control, materials

optimization)

� Finance/Insurance

(risk management, forecasting)

Elderly Care

Adapted from Islam et al. (2020) Deep Learning Based Systems Developed for Fall

Detection: A Review

Autonomous Decision Making

Modern economies involve taking good decisions in com- plex multi-dimensional problems. In many cases, for effi- ciency reasons, it is necessary that these decisions are taken autonomously, or with the help of predictive models.

� Supply Chain Management (demand forecasting, planning)

� End-User Services (recommendations, personalization,

translation, monitoring)

� Manufacturing

(quality control, materials

optimization)

� Finance/Insurance

(risk management, forecasting)

Surface Inspection

Source: MvTec

Autonomous Decision Making

Modern economies involve taking good decisions in com- plex multi-dimensional problems. In many cases, for effi- ciency reasons, it is necessary that these decisions are taken autonomously, or with the help of predictive models.

� Supply Chain Management (demand forecasting, planning)

� End-User Services (recommendations, personalization,

translation, monitoring)

� Manufacturing

(quality control, materials

optimization)

� Finance/Insurance

(risk management, forecasting)

Forecasting (prices/claims)

Image source: https://www.kaggle.com/robikscube/m5- forecasting-starter-data-exploration

1st-Gen Decision Systems

� The human programs the decision system by hand (e.g. using if/else controls) in a way that it replicates his own decision strategy.

� If the system performs as expected on the few available test cases, it is then run autonomously on new instances.

Source: https://sites.google.com/site/keremitgsnotes

1st-Gen Decision Systems

� The human programs the decision system by hand (e.g. using if/else controls) in a way that it replicates his own decision strategy.

� If the system performs as expected on the few available test cases, it is then run autonomously on new instances.

� What if the user is not able to translate his own decision behavior into an actual program? (e.g. how does one detect objects in natural images?)

? wood scratch

2nd-Gen Decision Systems

Idea: The human collects a dataset of examples, and la- bels them according to his own decision strategy:

Good examples: Examples with scratches:

A machine learning model is trained to map each example (i.e. the array of pixels received as input) to the correct class.

Bayes Decision Theory

Bayesian Decision Theory

A theoretical framework for building decision models that:

� Yields optimal classifiers (under some favourable conditions). � Helps us to understand how

� properties of the data distribution (e.g. Gaussian-distributed),

� prior probabilities of classes,

� the criterion to optimize (e.g. maximum classification accuracy),

affect the classifier qualitatively and quantitatively.

Bayes Decision Theory

Example: (from the book Duda et al. 2000) � Fishes of various species (salmon and

sea bass) arrive on a conveyor belt.

� Sensors placed on the conveyor belt produce a collection of measurements for each observed fish (e.g. length, lightness).

� We would like to build a decision model that assign each fish to one of the two possible classes (salmon and sea bass).

Bayesian Decision Theory

� ω1,ω2,… set of classes (e.g. salmon, sea bass, …),

We are also given the probability laws:

� P(ωj ): probability of being of class j,

� p( | ωj ): density function of measurements for each class,

� p( ): density function of measurements (marginalized),

but what we are truly interested in is:

� P(ωj | ): probability of being of a certain class after observing .

∈ Rd vector of observations (e.g. x1 is the length and x2 is the lightness).

Bayesian Decision Theory

Bayes Theorem:

P(ωj| )=p( |ωj)·P(ωj) p( )

Image source: Duda et al. (2000)

Optimally Accurate Classifier

optimal decision function: arg max P (ωj | ) j

Image source: Duda et al. (2000)

Optimally Accurate Classifier

optimal decision function: arg max P (ωj | ) j

Alternate formulations of the decision:

Multivariate Normal Distributions

p( |ωj)=� 1 exp�−1( −μ)�Σ−1( −μ)� (2π)ddet(Σj) 2 j j j

� μj is the mean (center of the data distribution)

� Σj is the covariance (elongation of the data distribution and correlation between dimensions).

Image source: Duda et al. (2000)

Classifier for Gaussians (Σ1 = Σ2)

Recall: The optimal classifier is arg maxj [log p( |ωj ) + logP(ωj)], and we have the data distributions:

p( |ωj)=� 1 exp�−1( −μ)�Σ−1( −μ)� (2π)ddet(Σj) 2 j j j

Classifier for Gaussians (Σ1 = Σ2)

Image source: Duda et al. (2000)

� Decision boundary is linear and oriented by mean and covariance. � Offset is controlled by class prior probabilities.

Classifier for Gaussians (Σ1 �= Σ2)

� When covariances Σ1 and Σ2 are not the same, the decision boundary is quadric instead of linear. Quadrics include circle, ellipse, parabola, hyperbola, and degenerate forms.

Image source: Duda et al. (2000)

Classifying Non-Numerical Data

Example: Spam Classifier

� Observation: Messages M that need to be handled by the spam classifier come as text and not as numerical vectors.

� Common approach: Represent a message M as a collection of binary predicates testing for typical spam

∈ {0, 1}d

1{“claim”∈M} .

words, and forming a vector containing these predicates:

1{“gift”∈M}

1{“relief ”∈M} = 1{“pain”∈M}

Classifying Binary Data

� Assume that our data is binary, i.e. ∈ {0, 1}d , with each dimension generated independently according to some Bernoulli distribution:

P(xi =0|ωj)=1−qij P(xi =1|ωj)=qij

where qij are the parameters.

� The probability of the whole multivariate observation can be written as:

P( |ωij)=�d �qijxi +(1−qij)·(1−xi)� i=1

� Question: How to express the optimal decision boundary

argmaxP(ωj| ) j

Classifying Binary Data

Recall: The optimal classifier is arg maxj [log p( |ωj ) + logP(ωj)], and we have the data distributions:

P( |ωj)=�d �qijxi +(1−qij)·(1−xi)� i=1

Minimum Cost Decisions

Example: Buying a Car

� Suppose you would like to purchase a second-hand car. After observing the car (collecting a vector of measurements ), you assess that it has a defect with probability

P(defect | ) = 0.1 P(no defect | ) = 0.9

� Concretely, the decision you need to take is not to classify the whether the car has a defect or not, but whether to buy the car or not.

� For this, we need to evaluate the cost of each scenario, e.g.

cost ( buy | defect ) = 100.0 cost ( buy | no defect ) = −20.0

cost(not buy | defect) = 0.0 cost(not buy | no defect) = 0.0

and take the action with the lowest expected cost.

Minimum Cost Decisions

General problem formulation:

� Let (αk )k be the set of actions. The expected cost “λ” of taking a certain action is given by

whereλ(αk|ωj)isthecostoftakingactionαk giventheclassωj. � The optimal action to take is therefore: arg mink λ(αk | )

λ(αk|ωj)P(ωj| )

Minimum Cost Decisions

General problem formulation:

� Let (αk )k be the set of actions. The expected cost “λ” of taking a certain action is given by

� The optimal action to take is therefore: arg mink λ(αk | ) Car example: (cf. previous slide)

λ(αk| ) = whereλ(αk|ωj)isthecostoftakingactionαk giventheclassωj.

λ(αk|ωj)P(ωj| )

Classification Accuracy Special Case

Show that the problem of maximum accuracy classification is a special instance of expected cost minimization with a particular set of actions (αk)k and a particular cost function λ(αk |ωj ).

Recall: λ(αk| )=

λ(αk|ωj)P(ωj| )

Measuring Classification Error

� So far, we have studied what the decision boundary should be in order to predict optimally.

� However, in certain cases, it is also important to determine what is the expected error of the classifier (e.g. to determine whether the classifier is good enough for practical use).

� The expected error is the probability that the data is of a different class than the one predicted, e.g. for a binary classifier:

P(Err| )=� P(ω1| ) if “decideω2” P(ω2 | ) if “decide ω1”

� For the Bayes optimal classifier, this reduces to

P(Err| ) = min{P(ω1 | ),P(ω2 | )}

Measuring Classification Error

� The expected error of this maximally accurate classifier is computed as the integral of its error probability over the distribution p( ).

P(Err) = � P(Err | )p( )d

= min{P(ω1 | ), P(ω2 | )}p( )d

This is also known as the Bayes error rate.

� Generally, this integral cannot be solved analytically, because of the min function. Error must instead be evaluated numerically/empirically, or it can also be bounded analytically.

Bounding the Error of the Classifier

Very basic bound

� Observe for binary classification that P(ω2 | ) = 1 − P(ω1 | ).

� The error of an optimal binary classifier can be bounded as:

P(Err) = � min{P(ω1 | ), P(ω2 | )}p( )d =� min{P(ω1| ),1−P(ω1| )}p( )d

≤ 0.5p( )d

i.e. the classifier predicts the correct class at least 50% of the time.

� Note that, unlike an empirical evaluation, this result is general and independent on the data distributions.

Bounding the Error of the Classifier

Another simple bound

P(Err) = � min{P(ω1 | ), P(ω2 | )}p( )d

=� min{P(ω1| )p( ),P(ω2| )p( )}d

Recall: P(ωj | )p( ) = p( |ωj)P(ωj)

= � min �p( | ω1)P(ω1), p( | ω2)P(ω2)�d

≤ � min ��j {p( | ωj )}P(ω1), �j {p( | ωj )}P(ω2)�d

= ��j � {p( | ωj )}d � · min{P(ω1), P(ω2)}

= 2 · min{P(ω1), P(ω2)}

Additional insight: The optimal classifier improves its accuracy when one class prior probability is strongly dominant over to the other class.

Machine Learning

� Paradigm that provides a solution to the practically highly relevant problem of autonomous decision making.

� Avoids to the user the task of specifying the decision function at hand, and instead, infers it automatically from the data.

Bayes Decision Theory

� Framework that allows to build optimal machine learning classifiers, assuming we have full knowledge knowledge about the class probabilities and the data distributions.

� Bayesian decision theory highlights the effect of class priors, parameters of the data distribution, and specification of the cost function, on the optimal decision function and the expected cost.

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com