Exercises for the course

Machine Learning 1

Winter semester 2021/22

fu ̈r Softwaretechnik und theoretische ̈at IV, ̈at Berlin Prof. Dr. Klaus- ̈ller Email:

Copyright By cscodehelp代写 加微信 cscodehelp

Exercise Sheet 12

Exercise 1: Neural Network Optimization (15 + 15 P)

Consider the one-layer neural network

y = w⊤x + b

applied to data points x ∈ Rd, and where w ∈ Rd and b ∈ R are the parameters of the model. We consider

the optimization of the objective:

1 2 J(w)=Epˆ 2(1−y·t) ,

where the expectation is computed over an empirical approximation pˆ of the true joint distribution p(x, t) and t ∈ {−1,1}. The input data follows the distribution x ∼ N(μ,σ2I) where μ and σ2 are the mean and variance.

(a) Compute the Hessian of the objective function J at the current location w in the parameter space, and as a function of the parameters μ and σ of the data.

(b) Show that the condition number of the Hessian is given by: λ Exercise 2: Neural Network Regularization (10 + 10 + 10 P)

For a neural network to generalize from limited data, it is desirable to make it sufficiently invariant to small local variations. This can be done by limiting the gradient norm ∥∂f/∂x∥ for all x in the input domain. As the input domain can be high-dimensional, it is impractical to minimize the gradient norm directly. Instead, we can minimize an upper-bound of it that depends only on the model parameters.

We consider a two-layer neural network with d input neurons, h hidden neurons, and one output neuron. Let W be a weight matrix of size d×h, and (bj)hj=1 a collection of biases. We denote by Wi,: the ith row of the weight matrix and by W:,j its jth column. The neural network computes:

a =max(0,W⊤x+b) (layer1) j :,jj

f(x) = j sjaj (layer 2)

where sj ∈ {−1, 1} are fixed parameters. The first layer detects patterns of the input data, and the second

layer computes a fixed linear combination of these detected patterns.

(a) Show that the gradient norm of the network can be upper-bounded as:

√ ∂x≤ h·∥W∥F

(b) Let ∥W∥Mix = i ∥Wi,:∥21 be a l1/l2 mixed matrix norm. Show that the gradient norm of the network can be upper-bounded by it as:

∂x ≤ ∥W∥Mix

(c) Show that the mixed norm provides a bound that is tighter than the one based on the Frobenius norm, i.e.

= 1 + σ2 .

show that: .

Exercise 3: Programming (40 P)

Download the programming files on ISIS and follow the instructions.

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com