# 程序代写 Exercises for the course – cscodehelp代写

Exercises for the course
Machine Learning 1
Winter semester 2021/22
fu ̈r Softwaretechnik und theoretische ̈at IV, ̈at Berlin Prof. Dr. Klaus- ̈ller Email:

Exercise Sheet 12
Exercise 1: Neural Network Optimization (15 + 15 P)
Consider the one-layer neural network
y = w⊤x + b
applied to data points x ∈ Rd, and where w ∈ Rd and b ∈ R are the parameters of the model. We consider
the optimization of the objective:
􏰌1 2􏰍 J(w)=Epˆ 2(1−y·t) ,
where the expectation is computed over an empirical approximation pˆ of the true joint distribution p(x, t) and t ∈ {−1,1}. The input data follows the distribution x ∼ N(μ,σ2I) where μ and σ2 are the mean and variance.
(a) Compute the Hessian of the objective function J at the current location w in the parameter space, and as a function of the parameters μ and σ of the data.
(b) Show that the condition number of the Hessian is given by: λ Exercise 2: Neural Network Regularization (10 + 10 + 10 P)
For a neural network to generalize from limited data, it is desirable to make it sufficiently invariant to small local variations. This can be done by limiting the gradient norm ∥∂f/∂x∥ for all x in the input domain. As the input domain can be high-dimensional, it is impractical to minimize the gradient norm directly. Instead, we can minimize an upper-bound of it that depends only on the model parameters.
We consider a two-layer neural network with d input neurons, h hidden neurons, and one output neuron. Let W be a weight matrix of size d×h, and (bj)hj=1 a collection of biases. We denote by Wi,: the ith row of the weight matrix and by W:,j its jth column. The neural network computes:
a =max(0,W⊤x+b) (layer1) j :,jj
f(x) = 􏰈j sjaj (layer 2)
where sj ∈ {−1, 1} are fixed parameters. The first layer detects patterns of the input data, and the second
layer computes a fixed linear combination of these detected patterns.
(a) Show that the gradient norm of the network can be upper-bounded as:
√ 􏰏􏰏∂x􏰏􏰏≤ h·∥W∥F
(b) Let ∥W∥Mix = 􏰐􏰈i ∥Wi,:∥21 be a l1/l2 mixed matrix norm. Show that the gradient norm of the network can be upper-bounded by it as:
􏰏􏰏∂x􏰏􏰏 ≤ ∥W∥Mix
(c) Show that the mixed norm provides a bound that is tighter than the one based on the Frobenius norm, i.e.
= 1 + σ2 .
show that: .
Exercise 3: Programming (40 P)