Machine Learning

Tuesday, 11. January 2021

Example exam questions

1. Kernels

Remember that a function k is a kernel function if and only if any kernel matrix generated by k is positive semidefinite. To show that a matrix K is positive definite, please show that vTKv > 0 for all vectors v.

(a) Suppose k is a kernel function. Prove or disprove that α2k is also a kernel function for any α ̸= 0. 5 points

(b) Suppose k and k′ are kernel functions. Prove or disprove that k + k′ is also a kernel function. 5 points

(c) Suppose k and k′ are kernel functions. Prove or disprove that k − k′ is also a kernel function. Hint: you can use the fact that k(a, b) = 1 is a kernel function. 5 points

2. K-means

Consider a dataset in two dimensions with four points x1, x2, x3, x4 that form a rectangle. Assume that the means are initialized by randomly picking K of the data points (each with equal probability). Note, that if during the E-step a data point has equal distances to several cluster centers, pick one of these centers randomly (each with equal probability).

(a) What are the possible 2-means solutions (i.e. K = 2)? What is the probability of each of the solutions? What happens if the rectangle is a square? 5 points

Consider now a different implementation where the K means are initialized randomly from a Gaussian distribution with mean (x1 + x2 + x3 + x4)/4 and covariance matrix the identity matrix.

(b) Running this changed implementation for K = 2, is it always guaranteed that we get two clusters? If not, why not? 5 points

3. Graphical models

(a) What is wrong with the following equation?

p(a, b, c) = p(a|b) p(b|c) 5 points

(b) Draw the directed graphical model for the probability distribution defined by

p(a, b, c, d, e) = p(e|b) p(d|c, b) p(c|a) p(b|a) p(a). 5 points

(c) Draw all six directed acyclic graphical models that are possible for three random variables (omitting the ones that can be obtained by permuting the variables). Which one of those six graphs has the most, which one has the least independence assump- tions? Can one of those six graphs represent any probability distribution with three variables? If yes, which of the graphs? 5 points

4. Gaussian processes

Thesumf+goftwoindependentGaussianprocessesf ∼GP(mf,kf)andg∼GP(mg,kg) is also a Gaussian process, i.e. f +g ∼ GP(mf+g,kf+g). What is the mean function mf+g and the kernel function kf +g of f + g? You don’t need to prove your answer. 5 points

5. Eigenfunctions

Which of the following expressions imply that φ is an Eigenfunction of k?

∫k(x,x′)φ(x′)dx=λφ(x)

∫ k(x, x′) φ(x) dx = λ φ(x′)

∫ k(x, λ) φ(x) dx = φ(λ)

∫ k(x, x′) φ(x) dx = λ φ(x)

6. Neural networks

Let f : {0,1}2 → {0,1} be the xor-function, i.e. f(1,1) = f(0,0) = 0 and f(1,0) = f(0,1) = 1. Let g be the step function, i.e. g(x) = 1 if x > 0 and g(x) = 0 otherwise. Draw a neural network with two nodes in the hidden layers that calculates f. Use g as the nonlinearity. Write down the weights at the edges. Also write down the equation for the neural network. You might need the bias.

(a) The problem to maximize N αi − 1 N αiαjyiyjxTi xj subject to C ≥ αi ≥ 0

for i = 1,…,N and Ni=1 αiyi = 0 is

the primal of the dual for the separable case

the dual of the primal for the separable case

the primal of the dual for the non-separable case the dual of the primal for the non-separable case

(b) Assume α = [α1, . . . , αN ]T is the solution of the above problem. What are the support vectors? 5 points

8. Code mystery

Which algorithms have been implemented by the following two code snippets of Matlab? randn() and rand() sample random numbers from a normal and a uniform distribution. normpdf(x) evaluates the Gaussian pdf N (x; 0, 1). The statement x = [x, xi] appends element xi to list x. It is sufficient to write down the correct names of the algorithms.

for i=1:1000

xi = randn();

pi = p(xi);

qi = normpdf(xi);

if rand() <= c*pi/qi x = [x, xi]; (b)po=0; x=[]; for i = 1:1000 xi = randn(); if p(xi) >= rand()*po

x = [x, xi];

x = [x, x(end)];

po = p(xi);