# CS代考 ECE599/CS539 Nonlinear Optimization (Spring 2020) Midterm – cscodehelp代写

ECE599/CS539 Nonlinear Optimization (Spring 2020) Midterm

(Due: 11:59pm on May 22, Friday)

Instruction: Students should provide enough detail of the logical procedure of deriving answers. Answers without sufficient justification will receive partial or no credit. For questions involving MATLAB experiments, provide codes with comments. The maximum possible score is 100.

1. Convergence of Gradient Descent with Armijo’s Rule: Convex Optimization [30 points]. Suppose that f is in C2, i.e., twice continuously differentiable, and there exist A,a > 0 such that for every x ∈ Rn,

Copyright By cscodehelp代写 加微信 cscodehelp

a ≤ all eigenvalues of F(x) ≤ A, (1) where F(x) denotes, as usual, the Hessian of f at x. This implies that f is a strictly convex function.

Consider the unconstrained optimization problem:

min f(x). (2)

In this question, we will analyze convergence of the Gradient Descent algorithm employing Armijo’s

rule as the line search method.

Recall the Armijo’s rule with parameters ε ∈ (0, 1/2) and η > 1: in the k-th iteration, we choose the

step size αk ≥ 0 such that it satisfies

(i) f(xk − αkg(xk)) ≤ f(xk) − ε · αkg(xk)T g(xk), and

(ii) f(xk − ηαkg(xk)) > f(xk) − ε · ηαkg(xk)T g(xk)

where g(xk ) denotes ∇f (xk )T . Now, let {xk } denote the sequence generated by Gradient Descent

with Armijo’s rule. Solve the following questions related to convergence of this sequence.

(Pages 240 and 241 of the textbook contain the convergence analysis of Gradient Descent with Exact Line Search and Gradient Descent with Armijo’s rule. I strongly recommend you to read those two pages before you start working on this question. You will be able to get some hints. Note, however, that you are required to provide full derivations in your answers with enough justification, even for those results that were derived in pages 240-241)

(a) Prove that when 0 ≤ α < 1/A, f(xk − αg(xk)) ≤ f(xk) − εα∥g(xk)∥2. (b) Using the result of part (a), prove that ηαk ≥ A1 .
(c) Using the result of part (b) and the fact that αk satisfies (3), prove that
f(xk+1) ≤ f(xk) − ε ∥g(xk)∥2, (4)
f(xk+1) − f∗ ≤ f(xk) − f∗ − ε ∥g(xk)∥2. (5) ηA
or equivalently,
where f∗ := minx∈Rn f(x).
(d) Prove that
−∥g(xk)∥2 ≤ 2a(f∗ − f(xk)).
(e) Combine the above results to prove that
∗ 2εa ∗
f(xk+1)−f ≤ 1− ηA (f(xk)−f ). (7)
Based on the above inequality, discuss the convergence of the Gradient Descent algorithm with Armijo’s rule when applied to solve the convex optimization problem.
2. Gauss-Southwell Method [35 points]. Consider an unconstrained optimization problem: min f(x)
where the objective function f is continuously differentiable. A coordinate descent method is an iterative descent method where in each iteration, we select one variable and minimize the objective function f over this single variable while fixing all other variables to their values from the previous iteration. Specifically, a coordinate descent method has the following structure:
• Initialize: Set an initial point x0, and let k = 0.
• Iteration: Until the stopping criterion is satisfied, repeat the following steps.
P Coordinate Selection: Determine the coordinate ik (ik ∈ {1, 2, . . . , n}) according to a pre- determined criterion.
P Bidirectional Line Search: xk+1 = xk + αk · eik where αk ∈ R satisfies f(xk +αk ·eik)=minf(xk +α·eik)
P Increase k by 1. • Return xk
In the above, ej denotes the vector in Rn with its j-th entry equal to 1 while all other entries are zeros. Note that in the line search step, we search for an optimal step size in R, i.e., we are not restricting αk to be nonnegative. Adding αk · eik to xk only changes the ik-th entry of xk, so the bidirectional line search step corresponds to minimizing f(x) over x(ik) alone (which denotes the ik-th entry of x), while fixing all other entries of x to be the same as that of xk. See Section 8.9 of the textbook for more explanation on coordinate descent methods.
In many optimization problems, optimizing f along a single coordinate while fixing all other coordinates turns out to be a very simple task (sometimes, a closed-form solution can be obtained). In such cases, the coordinate descent method is popularly used due to the simplicity of the line search step. There are various types of coordinate descent methods employing different criteria in choosing the coordinate ik to update in the k-th iteration. For instance, the simplest type is the cyclic coordinate descent method wherein ik cycles through all coordinates, i.e., the sequence {ik} is
1,2,...,n,1,2,...,n,1,2,...,n,1,2,...,n,...
In this question, we will analyze a coordinate descent method called Gauss-Southwell method wherein the coordinate to update in the k-th iteration is chosen based on the gradient g(xk) := ∇f(xk)T. Specifically, in the k-th iteration, Gauss-Southwell method sets ik to be a maximizer of
max |gj (xk )| (9) j
where gj(xk) denotes the j-th entry of the gradient g(xk). In other words, in Gauss-Southwell method, we set ik to the coordinate at which the magnitude of the gradient entry is maximum. In the rest of this problem, we will focus our attention to a quadratic optimization problem where the objective function is
f(x)= 12xTQx−bTx (10)
with Q being a positive definite matrix. Let A and a denote the largest and smallest eigenvalues of Q, respectively. We will analyze convergence of Gauss-Southwell method when applied to solve this quadratic optimization problem.
(a) Let gk denote g(xk). And, let g ̄k denote the vector in Rn with its ik-th entry equal to the ik-th entry of gk while all other entries are zeros. Show that the update in the k-th iteration of Gauss- Southwell method is equivalent to the following: xk+1 = xk − α ̄k · g ̄k where α ̄k is a nonnegative step size satisfying f(xk − α ̄k · g ̄k) = minα≥0 f(xk − α · g ̄k).
(b) In homework 4, we solved Exercise 19 in Chapter 8 of the textbook. Therein, we proved that if the direction dk of an iterative descent method with exact line search satisfies (i) dTk gk < 0, and (ii) (dTk gk)2 ≥ β(dTk dk)(gkT gk) with β ∈ (0, 1], then
E(xk+1) ≤ 1 − β Aa E(xk) (11)
where E(x) := 12(x − x∗)T Q(x − x∗) (x∗ denotes the unique global minimum point).
Apply this result to prove that the sequence {xk} generated by Gauss-Southwell method satisfies the following inequality:
E(xk+1)≤1− a E(xk) (12) nA
Based on this inequality, discuss about global and local convergence of Gauss-Southwell method when applied to the quadratic optimization. (Hint: use the result of part (a) with dk = −g ̄k.)
(c) (MATLAB experiment) Consider the unconstrained quadratic optimization with
1 0.5 0 −1
Q=0.5 1 0.25, b= 1 . (13)
0 0.25 1 −2
Implement Gauss-Southwell method to solve this problem. You can use the stopping criterion ∥g(xk)∥ ≤ ε with ε being a very small positive real number (e.g., 10−6). Plot ∥g(xk)∥ versus k. Plot E(xk) versus k. When plotting these curves, make the y-axis be in log-scale so that you can observe the order of convergence from the plots. Interpret the plots.
3. Application: Power System State Estimation [35 points]. In controlling and monitoring a power grid, it is of utmost importance to have an accurate estimate of the system state. Consider the simple system with 2 buses in Figure 1. For this system, the system state is consisting of three real variables V1, V2, and θ2, which are bus 1 voltage magnitude, bus 2 voltage magnitude, and bus 2 voltage phase angle respectively (the bus 1 voltage phase angle θ1 is set to zero, i.e., the reference angle).
Figure 1: A power system consisting of two buses.
In practice, the system state is unknown and often not measured directly; most of time, we need to estimate the system state (V1,V2,θ2) ∈ R3 based on available measurements, and this procedure is
called state estimation. Suppose that we have real and imaginary power flow measurements of the line 1 − 2 in both directions, which are related to the state as follows:
P12 = g(V1V2 cos θ2 − V12) − bV1V2 sin θ2 =: h1(V1, V2, θ2) Q12 = −b(V1V2 cos θ2 − V12) − gV1V2 sin θ2 =: h2(V1, V2, θ2) P21 = g(V2V1 cos θ2 − V22) + bV2V1 sin θ2 =: h3(V1, V2, θ2) Q21 = −b(V2V1 cos θ2 − V22) + gV2V1 sin θ2 =: h4(V1, V2, θ2)
where P12 and Q12 are real and imaginary parts, respectively, of measurement of the complex power flow from bus 1 to bus 2. Similarly, P21 and Q21 are real and imaginary parts, respectively, of measurement of the complex power flow from bus 2 to bus 1. In addition, g and b are conductance and susceptance of the line 1 − 2 respectively. In practice, we have measurements corrupted by additive random noise; however, in this question, we consider noiseless measurements for simplicity.
In order to estimate the state (V1,V2,θ2) from the measurements, a typical approach is to solve the unconstrained optimization problem
min f(V1, V2, θ2) (15) (V1 ,V2 ,θ2 )∈R3
where f(V1,V2,θ2) is defined as the sum of the squared estimation residues: f(V1, V2, θ2)
:= (P12 − h1(V1, V2, θ2))2 + (Q12 − h2(V1, V2, θ2))2 + (P21 − h3(V1, V2, θ2))2 + (Q21 − h4(V1, V2, θ2))2 The solution to (15) is taken as the estimate of the system state. Since we are considering the noiseless
case, our estimate is expected to be exactly the same as the true state.
Throughout this question, suppose that the line conductance and susceptance are as follows:
g=1, b=−5.
(a) Obtain the expression of the gradient ∇f (V1 , V2 , θ2 )T .
(b) In power systems, the system state (V1, V2, θ2) is known to be close to (1, 1, 0) which is called “flat start”. So, (1, 1, 0) is a good candidate as the initial point of iterative descent algorithms. Suppose that the true state is
xtrue = (V1, V2, θ2) = (1.05, 0.98, 0.1).
Generate the measurements (P12, Q12, P21, Q21) for this state using the equations (14).
Based on the measurements you generated, implement a Gradient Descent algorithm to solve (15) and eventually obtain an estimate of xtrue. Use the flat start point as the initial point, i.e., x0 = (1, 1, 0). For the line search method, use a backtracking method (explain your parameters). Let xk denote the solution point obtained in the k-th iteration of the Gradient Descent. Plot ∥∇f(xk)∥ versus k. Plot ∥xk − xtrue∥2 (i.e., estimation error) versus k. When plotting these curves, make the y-axis log-scale so that you can observe the order of convergence in the plot. Interpret the plots. Discuss global convergence and local convergene behaviors. What is the order of local convergence?
程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com