# CS计算机代考程序代写 Exercises for the course

Exercises for the course

Machine Learning 1

Winter semester 2020/21

Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de

Exercise Sheet 5

Exercise 1: Bias and Variance of Mean Estimators (20 P)

Assume we have an estimator θˆ for a parameter θ. The bias of the estimator θˆ is the difference between the true value for the estimator, and its expected value

Bias(θˆ) = Eθˆ − θ.

If Bias(θˆ) = 0, then θˆ is called unbiased. The variance of the estimator θˆ is the expected square deviation

from its expected value

Var(θˆ) = E(θˆ − E[θˆ])2. The mean squared error of the estimator θˆ is

Error(θˆ) = E(θˆ − θ)2 = Bias(θˆ)2 + Var(θˆ).

Let X1, . . . , XN be a sample of i.i.d random variables. Assume that Xi has

Calculate the bias, variance and mean squared error of the mean estimator:

1 N

mean μ and

variance σ2.

μˆ = α · N

Exercise 2: Bias-Variance Decomposition for Classification (30 P)

where α is a parameter between 0 and 1.

The bias-variance decomposition usually applies to regression data. In this exercise, we would like to obtain similar decomposition for classification, in particular, when the prediction is given as a probability distribution over C classes. Let P = [P1, . . . , PC ] be the ground truth class distribution associated to a particular input pattern. Assume a random estimator of class probabilities Pˆ = [Pˆ1,…,PˆC] for the same input pattern. The error function is given by the expected KL-divergence between the ground truth and the estimated probability distribution:

Error = EDKL(P ||Pˆ) = E Ci=1 Pi log(Pi/Pˆi).

First, we would like to determine the mean of of the class distribution estimator Pˆ. We define the mean as the distribution that minimizes its expected KL divergence from the the class distribution estimator, that is, the distribution R that optimizes

min EDKL(R||Pˆ). R

(a) Show that the solution to the optimization problem above is given by expElogPˆi

R=[R1,…,RC] where Ri=jexpElogPˆj ∀1≤i≤C.

(Hint: To implement the positivity constraint on R, you can reparameterize its components as Ri = exp(Zi),

i=1

X i

and minimize the objective w.r.t. Z.) (b) Prove the bias-variance decomposition

Error(Pˆ) = Bias(Pˆ) + Var(Pˆ) where the error, bias and variance are given by

Error(Pˆ) = EDKL(P ||Pˆ), Bias(Pˆ) = DKL(P ||R), Var(Pˆ) = EDKL(R||Pˆ). (Hint: as a first step, it can be useful to show that E[log Ri − log Pˆi] does not depend on the index i.)

Exercise 3: Programming (50 P)

Download the programming files on ISIS and follow the instructions.