# CS计算机代考程序代写 Exercises for the course

Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 4
Exercise 1: Fisher Discriminant (10 + 10 + 10 P)
The objective function to find the Fisher Discriminant has the form max w⊤SBw
w w⊤SWw
where SB = (m2 − m1) (m2 − m1)⊤ is the between-class scatter matrix and SW is within-class scatter matrix, assumed to be positive definite. Because there are infinitely many solutions (multiplying w by a scalar doesn’t change the objective), we can extend the objective with a constraint, e.g. that enforces w⊤SW w = 1.
(a) Reformulate the problem above as an optimization problem with a quadratic objective and a quadratic constraint.
(b) Show using the method of Lagrange multipliers that the solution of the reformulated problem is also a solution of the generalized eigenvalue problem:
SBw = λSW w
(c) Show that the solution of this optimization problem is equivalent (up to a scaling factor) to
w⋆ = S−1(m1 − m2) W
Exercise 2: Bounding the Error (10 + 10 P)
The direction learned by the Fisher discriminant is equivalent to that of an optimal classifier when the class- conditioned data densities are Gaussian with same covariance. In this particular setting, we can derive a bound on the classification error which gives us insight into the effect of the mean and covariance parameters on the error.
Consider two data generating distributions P (x|ω1) = N (μ, Σ) and P (x|ω2) = N (−μ, Σ) with x ∈ Rd. Recall that the Bayes error rate is given by:
􏰊
P (error|x) p(x) dx
P (error|x) ≤ 􏰄P (ω1|x)P (ω2|x) (b) Show that the Bayes error rate can then be upper-bounded by:
P (error) ≤ 􏰄P (ω1 )P (ω2 ) · exp 􏰆 − 1 μ⊤ Σ−1 μ􏰇 2
Exercise 3: Fisher Discriminant (10 + 10 P)
Consider the case of two classes ω1 and ω2 with associated data generating probabilities 􏰀􏰀−1􏰁 􏰀2 0􏰁􏰁 􏰀􏰀+1􏰁 􏰀2 0􏰁􏰁
p(x|ω1)=N −1 , 0 1 and p(x|ω2)=N +1 , 0 1
(a) Find for this dataset the Fisher discriminant w (i.e. the projection y = w⊤x under which the ratio between
inter-class and intra-class variability is maximized).
(b) Find a projection for which the ratio is minimized.
Exercise 4: Programming (30 P)