# CS计算机代考程序代写 Exercises for the course

Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 5
Exercise 1: Bias and Variance of Mean Estimators (20 P)
Assume we have an estimator θˆ for a parameter θ. The bias of the estimator θˆ is the difference between the true value for the estimator, and its expected value
Bias(θˆ) = E􏰤θˆ − θ􏰥.
If Bias(θˆ) = 0, then θˆ is called unbiased. The variance of the estimator θˆ is the expected square deviation
from its expected value
Var(θˆ) = E􏰤(θˆ − E[θˆ])2􏰥. The mean squared error of the estimator θˆ is
Error(θˆ) = E􏰤(θˆ − θ)2􏰥 = Bias(θˆ)2 + Var(θˆ).
Let X1, . . . , XN be a sample of i.i.d random variables. Assume that Xi has
Calculate the bias, variance and mean squared error of the mean estimator:
1 􏰃N
mean μ and
variance σ2.
μˆ = α · N
Exercise 2: Bias-Variance Decomposition for Classification (30 P)
where α is a parameter between 0 and 1.
The bias-variance decomposition usually applies to regression data. In this exercise, we would like to obtain similar decomposition for classification, in particular, when the prediction is given as a probability distribution over C classes. Let P = [P1, . . . , PC ] be the ground truth class distribution associated to a particular input pattern. Assume a random estimator of class probabilities Pˆ = [Pˆ1,…,PˆC] for the same input pattern. The error function is given by the expected KL-divergence between the ground truth and the estimated probability distribution:
Error = E􏰤DKL(P ||Pˆ)􏰥 = E􏰤 􏰂Ci=1 Pi log(Pi/Pˆi)􏰥.
First, we would like to determine the mean of of the class distribution estimator Pˆ. We define the mean as the distribution that minimizes its expected KL divergence from the the class distribution estimator, that is, the distribution R that optimizes
min E􏰤DKL(R||Pˆ)􏰥. R
(a) Show that the solution to the optimization problem above is given by expE􏰤logPˆi􏰥
R=[R1,…,RC] where Ri=􏰂jexpE􏰤logPˆj􏰥 ∀1≤i≤C.
(Hint: To implement the positivity constraint on R, you can reparameterize its components as Ri = exp(Zi),
i=1
X i
and minimize the objective w.r.t. Z.) (b) Prove the bias-variance decomposition
Error(Pˆ) = Bias(Pˆ) + Var(Pˆ) where the error, bias and variance are given by
Error(Pˆ) = E􏰤DKL(P ||Pˆ)􏰥, Bias(Pˆ) = DKL(P ||R), Var(Pˆ) = E􏰤DKL(R||Pˆ)􏰥. (Hint: as a first step, it can be useful to show that E[log Ri − log Pˆi] does not depend on the index i.)
Exercise 3: Programming (50 P)