Week 9

Lagrangian relaxation

This week we will cover an alternative way of solving programs that involves relaxing certain constraints and lifting them to the objective function. We introduce the high level idea behind La- grangian relaxation with a simple application, and then apply the technique to the traveling salesman problem. Towards the end, we lay down the theoretical foundations of Lagrangian relaxation.

9.1 An introduction to Lagrangian relaxation

We introduce the main ideas of the technique with a concrete prob- lem. Let G = (V,E) be a graph, c : E → R be an edge cost function, b : E → R be a benefit function, and B be a benefit threshold. The objective is to find a spanning tree T of G with total benefit at least B having minimum cost.1 If we let T be the set of all spanning trees of G, the problem we want to solve is

1 This problem has an interesting application. Imagine each edge e ∈ E has associated a failure probability re. We want to pick a minimum cost tree such that the probability of having no failures is at least some threshold θ.

If failures are independent of one another, then we want a tree T such that

(1 − re) ≥ θ e∈T

By taking logarithms on both sides we

get

log(1−re) ≥ logθ, e∈T

so setting be = log(1 − re) and

B = log θ we can reduce this problem to (IP.1).

© Copyright 2021 Julián Mestre.

This work is licensed under a Creative

Commons Attribution-NonCommercial- ShareAlike 4.0 International License.

minimize subject to

e∈T ce

e∈T be ≥ B

T∈T

To that end, let us define for any λ ≥ 0

t(λ)=min ce+λB− be. T∈T e∈T e∈T

Let us make a few observation about t(λ). Observation 1. For any λ ≥ 0 we have t(λ) ≤ z.

Indeed, if we let T ∗ be the optimal solution to (IP.1). Then

(9.1)

t(λ) ≤

e∈T∗

e∈T∗

e∈T∗

ce + λ

B −

be

≤

ce = z,

(IP.1)

Let us denote with z the cost of (IP.1). In order to run a branch and bound algorithm for (IP.1) we need to have a good lowerbound of z.

where the second inequality follows from the fact that since T ∗ is a feasible solution to (IP.1), its benefit is at least B.

week 9 – lagrangian relaxation

discrete optimization 2

Observation 2. For any λ ≥ 0 we can evaluate t(λ) in polynomial time. To see why, we first re-write the definition on t(λ) as follows

t(λ) = min (ce−λbe) +λB.

T∈T e∈T

To evaluate the first term we compute a minimum weight spanning

tree on the modified edge weights ce − λbe.

Observation 3. The function t : R+ → R defined in (9.1) is piece-wise

linear, continuous, and concave.

For a fixed T ∈ T we can interpret ce +λB− be as e∈T e∈T

a linear function of λ. Therefore, the function t(λ) can be regarded as a lower envelope of a bunch of lines. Therefore, t(λ) is piece-wise linear, continuous, and concave.

At this point we can just do binary search on λ to find maxλ≥0 t(λ), which is the strongest lowerbound on z that this approach yields. While binary search works fine with only one variable, it will be useful to look at another method for optimizing t that generalizes

to the case of many variables.

9.2 Unconstrained optimization

Let us take a short detour through the land of unconstrained op- timization. Suppose that we had a continuous, differentiable, and concave2 function f : R → R that we want to maximize. Steepest ascent is a a very simple algorithm for optimizing this type of func- tions. Starting with an arbitrary solution x(0) we repeatedly update the solution according to the following rule

x(k+1) = x(k) + α(k)f′(x(k)),

where α(k) ≥ 0 is a parameter that determines the update step size. If one is careful when choosing the step sizes, we are guaranteed to converge to a maximizer.

Theorem 9.1. Let f : R → R be a continuous, differentiable, and concave function. The following criteria guarantee the convergence of x(k) to a maximizer of f as k tends to infinity:

1. lim α(k) =0andlim k α(i) =∞,or k→∞ k→∞ i=1

2. α(k) = ρβk for large enough ρ and β < 1 close enough to 1.
The advantage of this approach over binary search is that it generalizes to functions of many variables. Let f : Rn → R be a continuous, differentiable, and concave3 function that we want to maximize. The steepest ascent update rule is as follows:
x(k+1) = x(k) + α(k)∇f(x(k)),
where α(k) ≥ 0 is a parameter that determines the update step size,
and ∇f(x) = ∂f(x),... ∂f(x) is the gradient of f at x. ∂x1 ∂xn
t(λ)
λ∗ λ
2 A function f : R → R of one variable is concave in the domain [a, b] if for any x, y ∈ [a, b] and any γ ∈ [0, 1] we have
f(γx+(1−γ)y) ≥ γf(x)+(1−γ)f(y).
In other words, the function is always on or above the straight-line segment connecting (x, f(x)) and (y, f(y)).
f(·)
(x, f(x))
axy
(y, f(y))
b
3 A function f : Rn → R of many variables is concave over the domain S ⊆ Rn ifforanyx,y ∈ Sand
γ ∈ [0,1] we have
f(γx+(1−γ)y) ≥ γf(x)+(1−γ)f(y).
week 9 – lagrangian relaxation
discrete optimization 3
Theorem 9.1 still holds for functions of many variables. In fact, with a small modification of the algorithm we can also maximize continuous convex functions f : Rn → R that are not necessarily differentiable. We say a vector s ∈ Rn is a sub-gradient of f at x if
f(y) ≤ f(x)+s·(x−y) for all y ∈ Rn.
As long as we update the current solution along the direction of a
sub-gradient4 we guaranteed the convergence of Theorem 9.1.
9.3 Lagrangian relaxation for the traveling salesman problem
Consider the following integer linear formulation for the traveling salesman problem5.
4 Notice that this is consistent with our previous discussion since if the function is concave and differentiable then the gradient is the only sub- gradient.
5 Recall that in the traveling salesman problem we are given a complete graph G = (V,E) with edge weights w : E → R+ andwewanttofind
a tour of the vertices with minimum weight.
minimize subjectto
we xe
xe =2
xe ≥ 2
xe ∈{0,1}
e∈E
e∈δ(u)
e∈δ(S)
∀u∈V
∀ ∅ ⊂ S ⊂ V
∀e∈E
(IP.2)
Let us denote with z the value of (IP.2). Before we can apply Lagrangian relaxation, we will turn the above program into a form that will make solving the relaxed problem more computationally tractable. For this, we will arbitrarily select a node v ∈ V.
First, we note we can restrict our attention to connectivity con-
straints of sets S that do not contain v, since
equivalence relation for any S ⊆ V
xe = xe. e∈δ(V S)
e∈δ(S)
Second, we claim that the degree constraints imply the following
e∈E(S)
xe ≤ |S| − 1 ⇐⇒ xe ≥ 2. e∈δ(S)
Third, and last, we note that the degree constraints imply
xe =|V|−2. e∈E(V −v)
Putting all these facts together we get another program, which is equivalent to (IP.2):
minimize subject to
e∈E
e∈δ(u)
e∈E(V −v)
xe ≤|S|−1 ∀∅⊂S⊆V−v e∈E(S)
xe ∈{0,1} ∀e∈E
we xe
xe =2 ∀u∈V xe =|V|−2
(IP.3)
week 9 – lagrangian relaxation
discrete optimization 4
At this point we can use Lagrangian relaxation to get rid of the degree constraints of all vertices, except v. For any λ ∈ Rn, define t(λ) to be the value of the following program6
6 Strictly speaking, since we do not relax the constraint corresponding to v we should only have |V| − 1 multipliers.
On the other hand, since we keep the constraint for v, the contribution of the λv term to the objective is always 0, so its addition does not change anything. Having λv around, however, simplifies the presentation a bit.
minimize subject to
e∈E
wexe + λu u∈V
2 −
e∈δ(u) xe
∀∅⊂S⊆V−v
e∈δ(v)
e∈E(V −v)
e∈E(S)
xe =2
xe =|V|−2
xe ≤|S|−1
xe ∈{0,1}
Program (IP.4) can be split into two independent parts:
(IP.4) 1. The degree constraint for v tells us that we should pick two
edges incident on the vertex v.
2. The subtour elimination constraints and the cardinality con-
straint tell us that we should pick a spanning tree of V − v.
Let R be the collection of subsets of edges formed by a spanning tree of V −v plus two edges on v. Then
t(λ) = min we+ λu2−degR(u). (9.2) R∈R e∈R u∈V
Let us make a few observation about t(λ). Observation 4. For any λ ∈ Rn we have t(λ) ≤ z.
Indeed, if we let T ∗ be the optimal solution tour. Note that T ∗ ∈ R, and so
λu 2−degT∗(u) = we = z, e∈T∗
t(λ) ≤ we + e∈T∗
u∈V
where the first equality follows from the fact that since T ∗ is a tour,
so every vertex has degree 2 in T∗.
Observation 5. For any λ ∈ Rn we can evaluate t(λ) in polynomial
time.
To see why, we first re-write (9.2) as follows
t(λ) = min (wuv −λu −λv)+2 λu. R∈R (u,v)∈R u∈V
To compute the first term we compute a minimum weight spanning tree for V − v on the modified edge weights wuv − λu − λv, and then pick the two cheapest edges on v under these modified weights.
Observation 6. The function t : R+ → R defined in (9.2) is piece-wise linear, continuous, and concave.
∀e∈E
week 9 – lagrangian relaxation
discrete optimization 5
ForafixedR∈Rwecanthinkof e∈R
we+
as a linear function of λ. Therefore, t(λ) can be regarded as a lower
envelope of a bunch of hyperplanes. Therefore, t(λ) is piece-wise linear, continuous, and concave.
To compute the set of Lagrangian multipliers λ ∈ Rn maximiz- ing t(λ), we use the following rule
R(k) = minimizer for t(λ(k)),
λ(k+1) = λ(k) +α(k) ·2−deg (k)(u) for all u ∈ V.
uuR
Notice that we are using as our sub-gradient, the gradient of the cost of R(k):
e∈R(k)
9.4 A theory of Lagrangian relaxation
we + λu 2 − degR(k) (u) . u∈V
How good is the lower bound provided by Lagrangian relaxation? What are we really optimizing when we solve maxλ t(λ)? In this section we answer these questions and lay down the mathematical foundation of Lagrangian relaxation.
Let z be the value of the following integer program: minimize c · x
subject to Ax ≥ b Dx ≥ d
(IP.5)
x ∈ Zn
Suppose we apply Lagrangian relaxation to the first set of con-
straints to obtain the following integer program minimize c·x+λ·(b−Ax)
subject to Dx ≥ d x ∈ Zn
(IP.6)
where A ∈ Rm×n and therefore λ ∈ Rm+ . We let t(λ) be the value of this program for a given choice of λ. If we let
L={x∈Zn :Dx≥d} then we can define t(λ) as
t(λ) = minc·x+λ·(b−Ax) x∈L
Theorem 9.2. The value t = maxλ∈Rn+ t(λ) is given by the following program
minimize c · x
subject to Ax ≥ b
x ∈ CH(L)
λu2−deg ∗(u) u∈V T
week 9 – lagrangian relaxation
discrete optimization 6
This theorem is very informative. First, it tells that t is a lower- bound for z, since the latter value can be viewed as optimizing over x ∈ L, instead of x ∈ CH(L). Second, it tells us precisely how good a lowerbound this is. For example, if the polyhedron
L={x∈Rn :Dx≥d}
is integral then t is the value of the linear relaxation of (IP.5).
To simplify the presentation, we will assume L is a finite set
L = x(1), x(2), . . . , x(k) . The theorem is still true for infinite L, but it requires a bit more work to prove. We rewrite t as follows
maximize t
subject to t+λ·(Ax(i) −b) ≤ c·x(i) ∀x(i) ∈ L t free
λ ∈ R n+
By the Strong Duality Theorem, the value of this linear program
equals the value of its dual:
minimize subject to
x(i) ∈L
minimize subject to
c ·
A
yi x(i)
x(i) ∈L
(i) yi (c · x )
yi = 1 yi (Ax(i) −b) ≥ 0
x(i) ∈L
A quick re-write yields the following equivalent program
yi ≥ 0
∀ x(i) ∈ L
x(i) ∈L
yi = 1
x(i) ∈L
x(i) ∈L
yix(i) ≥b yi ≥ 0
∀ x(i) ∈ L Finally, we note that x(i) yix is a point in CH(L), and there-
(i)
fore, t is the value of the following linear program:
minimize c · x
subject to Ax ≥ b
x ∈ CH(L)
as claimed in Theorem 9.2. Exercises
week 9 – lagrangian relaxation
discrete optimization 7
1. Prove that the problem of finding a minimum weight spanning tree subject to a benefit constraint is weakly NP-hard.
2. The partial vertex cover problem is similar to regular vertex cover except that we do not want to cover all edges, bur rather just a p fraction of them, for some input parameter p ∈ (0, 1).
3. Model the partial vertex cover with an integer program. Use Lagrangian relaxation to remove the constraint that at least p|E| edges are covered. What kind of optimization problem are you left with?
4. The k-median program is similar to the facility location except that there no costs for opening facilities. Instead we are only al- lowed to open k facilities. The objective is to minimize the connec- tion cost of the remaining points.
Model the k-median problem with an integer program. Use La- grangian relaxation to remove the constraint that at most k facility are chosen. What kind of optimization problem are you left with?
5. Lagrangian relaxation can also be used on linear programs. Sup- pose that given a linear program
min c · x : Ax ≥ b, x ≥ 0,
we use Lagrangian relaxation to relax all the constraints. What kind of optimization problem are we left with?