CS代考计算机代写 information theory data structure algorithm Prof. Mark Bun
Prof. Mark Bun
CAS CS 591 B: Communication Complexity
Lecture Notes 8: Disjointness Lower Bound
Fall 2019
Reading.
• Rao-Yehudayoff Chapter 6
The lower bound of Ω(n) on the randomized communication complexity of Disjointness is perhaps the most impactful result in the entire area. It has consequences in circuit complexity, property testing, algorithmic game theory, extension complexity, data structures, etc. All known proofs of the tight lower bound use information theory one way or another, but the ideas we’ve developed in our study of information complexity give a particularly simple and intuitively satisfying proof. The presentation here follows the proof of Bar-Yossef, Jayram, Kumar, and Sivakumar, “An information statistics approach to data stream and communication complexity.”
It will be more convenient to work with the complement of the Disjointness function, which we’ll call the Set-Intersection function. Define
n
INTn(x, y) = (xi ∧ yi).
i=1
The idea of the proof will be to show that for some distribution ζ over {0,1}2, a protocol for computing INTn over inputs drawn from ζ⊗n will have to reveal n times as much information as a protocol for computing AND2 over inputs drawn from ζ. We will then exhibit a distribution ζ over {0,1}2 which requires Ω(1) bits of information to compute AND2.
1 From INTn to AND2
Define the distribution ζ over {0, 1}2 × {a, b} as follows. Let D be uniformly random from {a, b}. If D = a, let A = 0 and B be uniform over {0, 1}. If D = b, let B = 0 and A be uniform over {0, 1}. Then A and B are independent conditioned on D.
Lemma 1 (Information Cost Decomposition). Let ζ be the distribution defined above, and let ((A, B), D⃗ ) ∼ ζ⊗n. Then for any protocol Π,
n
I(AB; Π(A, B)|D⃗ ) ≥ I(AiBi; Π(A, B)|D⃗ ). i=1
1
Proof. Abbreviating Π = Π(A, B), we have
I(AB; ΠD⃗ ) = H(AB|D⃗ ) − H(AB|ΠD⃗ ) n
= H(AiBi|D⃗ ) − H(AB|ΠD⃗ ) i=1
nn
≥ H(AiBi|D⃗ ) − H(AiBi|ΠD⃗ )
i=1 i=1 n
= I ( A i B i ; Π | D⃗ ) . i=1
since the Ai, Bi’s are independent given D by subadditivity of entropy
Lemma 2 (Reduction Lemma). Let Π compute INTn with probability at least 1 − ε on every input. Let ζ be the distribution defined above and let ((A, B), D⃗ ) ∼ ζ⊗n and ((U, V ), D) ∼ ζ. Then for every i ∈ [n],
I(AiBi;Π(A,B)|D⃗) ≥ infI(UV;P(U,V)|D), P
where the infimum is taken over protocols P computing AND2 with probability at least 1−ε on every input.
Proof. Let Π be a protocol computing INTn with probability at least 1 − ε on every input. Fix an index i. By the definition of conditional mutual information,
I(AiBi;Π(A,B)|D⃗)=Ed⃗∼{a,b}n−1[I(AiBi;Π(A,B)|Di,D⃗−i =d⃗)].
So it suffices to show that for every fixed d⃗= (d1,…,di−1,di+1,…,dn) ∈ {a,b}, there is a protocol for AND2 with information cost I(AiBi; Π(A, B)|Di, D⃗ −i = d⃗).
We now describe such a protocol Pi,d⃗ for AND2. On input u, v, Alice and Bob set xi = u, yi = v and every other (xj,yj) to a sample from ζ conditioned on dj (using private randomness). They then run the protocol Π(x,y) and output its result. This protocol computes the AND2 function since u = v = 1 iff x and y are intersecting inputs.
Next, we analyze the conditional information cost of Pi,d⃗. One can check by inspection that the joint distribution of (U, V, D, Pi,d⃗(U, V )) is equal to that of (Ai, Bi, Di, Π(A, B)) conditioned on
D⃗ − i = d⃗ . Hence
I(UV;Pj,d⃗(U,V)|D)=I(AiBi;Π(A,B)|Di,D⃗−i =d⃗). Theorem 3. Let ζ be the distribution defined above and let ((A, B), D) ∼ ζ⊗n. Then
BPPpub(INT )≥n·infI(AB;P(A,B)|D) εnP
where the infimum is taken over protocols P computing AND2 with probability at least 1 − ε.
Proof. Follows by combining the Reduction Lemma and the Information Cost Decomposition, to- gether with the fact that conditional external information I(AB; Π(A, B)|D⃗ ) is a lower bound on communication cost.
2
2 Why Not a Product Distribution?
Alice and Bob’s inputs when sampled from ζ⊗n are independent conditioned on D, but they are not
fully independent. Why couldn’t we use a fully independent distribution to prove a lower bound
for Set-Intersection? It turns out that Set-Intersection is easy for product distributions: Babai,
Frankl, and Simon showed that for every product distribution μ, there is a distributional protocol
for Set-Intersection showing that Dμ(INTn) = O(
since there is a distributional protocol with respect to ζ⊗n as well: all inputs in its support are non- intersecting, so nothing needs to be communicated. The magic in this proof lies in the fact that any protocol for computing INTn on every input with high probability must reveal a lot of information when run on the distribution ζ⊗n, even though that distribution isn’t even supported on yes inputs.
To see why this argument wouldn’t work for when Alice and Bob are fully independent, we argue that for any product distribution there is a protocol which succeeds on every input but has low expected communication when run on that distribution. Specifically, let’s fix a product distribution on (A, B) and consider the following protocol for Set-Intersection. For some parameter ε > 0, Alice finds a coordinate (if one exists) such that H(Ai) ≥ ε and H(Bi) ≥ ε. If one is found, the parties exchange the coordinate, output 1 if both coordinates are 1, condition on the values seen and repeat. When they run out of high-entropy coordinates, then the remaining entropy in A and B must be at most ε · n, so they can just transmit their sets with roughly this many bits of communication.
How many rounds should we expect this protocol to last? If H(Ai) ≥ ε, then Pr[Ai = 1] ≥ Ω(ε/log(1/ε)). So the probability of finding an intersection is at least about ε2. Thus with high probability, the protocol will not last for more than about 1/ε2 rounds when actually run on the distribution (A, B). Setting ε ≈ n−1/3 gives expected total communication roughly n2/3.
3 Information Complexity of AND
Our goal is now to prove a lower bound on the information complexity of any protocol computing AND2. In this section, we’ll give the proof as a consequence of a sequence of lemmas, and give the proofs of those lemmas in the following section.
Theorem 4. Let P be a protocol which computes AND(u, v) with probability at least 1 − ε. Let
√
n log n). But this may still not be so convincing,
((U,V),D)∼ζ. Then
I(UV;P(U,V)|D)≥ 1(1−2√ε). 4
To begin analyzing this, we expand the quantity on the right to make it easier to work with. Let Z ∈ {0, 1} be uniformly random. Then by definition
I(UV;P(U,V)|D)= 1I(UV;P(U,V)|D=a)+1I(UV;P(U,V)|D=b) 22
= 1I(Z;P(0,Z))+ 1I(Z;P(Z,0)). 22
For (u,v) ∈ {0,1}2, let puv denote the distribution of transcripts when running P(u,v). The next step is to relate these mutual information quantities to distances between distributions. The distance which will be convenient for us to use is the Hellinger distance.
3
Definition 5. The squared Hellinger distance between two probability distributions p, q over domain X is defined by
h2(p, q) = 1 − p(x)q(x) = 1 (p(x) − q(x))2. x∈X 2 x∈X
Lemma 6 (Hellinger Lower Bound). Let P be a protocol computing AND with probability at least 1 − ε. Then for Z ∈ {0,1} uniform,
I(Z; P (0, Z)) ≥ h2(p00, p01) and I(Z; P (Z, 0)) ≥ h2(p00, p10). Continuing our calculation,
I(UV;P(U,V)|D) = ≥ ≥ ≥
1I(Z;P(0,Z))+ 1I(Z;P(Z,0)) 22
1h2(p00,p01)+ 1h2(p00,p10) 22
1(h(p00,p01)+h(p00,p10))2 4
1h2(p01,p10) 4
Cauchy-Schwarz Triangle Inequality
At this point, it is not clear why we should expect the distance between p01 and p10 to be large, as both are 0-inputs to the AND function. This is the point where we exploit the fact that these are distributions over transcripts, and that the set of inputs resulting in any given transcript is a rectangle:
Lemma 7 (Cut-and-Paste Lemma). Let P be a randomized protocol over X × Y . Then for every x,x′ ∈X and every y,y′ ∈Y, we have h(Pxy,Px′,y′)=h(Px,y′,Px′,y).
Applying the Cut-and-Paste Lemma lets us conclude that
I(UV;P(U,V)|D)≥ 1h2(p00,p11). 4
And now, we should expect to be done, because any accurate protocol for AND must induce very different distributions on (0, 0) ∈ AND−1(0) and (1, 1) ∈ AND−1(1). Indeed, we have
Lemma 8 (Distinguishing Lemma). If P computes a function f with probability 2/3 on every input, and (x,y) and (x′,y′) are inputs such that f(x,y) ̸= f(x′,y′), then
h2(pxy, px′y′ ) ≥ 1 − 2√ε. This allows us to conclude the proof of Theorem 4.
4 Deferred Proofs
“Proof” of Hellinger Lower Bound Lemma 6. The statement is true as given, but the proof is more complicated and requires introducing a few more information-theoretic quantities. We will prove
4
the weaker statement that if Z ∈ {0, 1} is uniform and if P is a randomized function on one bit,
then
where p is the distribution of P(Z) and q0 and q1 are the distributions of P(0) and P(1) respectively.
Hence
T
I(Z;P(Z)) = Ez∼ZD(qz∥p)
= 1(D(q0∥p) + D(q1∥p))
I(Z;P(Z)) ≥ logeh2(p,q), 2
For any pair of distributions q, p, we have D(q∥p) = −q(T)log p(T)
q(T)
p(T)
= 2(log e)h2(q, p)
For any random variables X, Y with joint distribution p, we may write
≥q(T)·(2loge) 1− q(T) T
sincelny≤y−1
I(X; Y ) = =
x,y
p(x, y) log
p(x,y) p(x)p(y)
p(x|y) p(x)
p(x|y) log = Ey∼Y D(p(·|y)∥p).
p(y) yx
2
≥ log e · (h2(q0, p) + h2(q1, p))
≥ loge(h(q0,p)+h(q1,p))2 2
≥ logeh2(q0,q1) 2
by Cauchy-Schwarz by Triangle Inequality.
Proof of Cut-and-Paste Lemma 7. Let P be a private coin protocol and let pxy denote the distri- bution on transcripts when P is run on (x,y). Recall that for any P, we can, for every transcript T, decompose Pr[P(x,y) = T] = qA(x,T)·qB(y,T) for some functions qA,qB.
Then
1−h2(pxy,px′y′) = Pr[P(x,y) = T]·Pr[P(x′,y′) = T] T
= qA(x,T)qB(y,T)qA(x′,T)qB(y′,T) T
= Pr[P (x′, y) = T ] · Pr[P (x, y′) = T ] T
= 1 − h2(px′y, px′y). 5
Proof of Distinguishing Lemma 8. We’ll actually begin by showing that pxy and px′,y′ are far in total variation distance, where we recall that the total variation distance between two distributions p,qoverT is
TV(p,q)=max(p(S)−q(S))= 1 |p(T)−q(T)|. S⊆T 2 T∈T
LetSbethesetoftranscriptsonwhichP outputsf(x,y). Thenpxy(S)≥1−εandpx′y′(S)≤ε. Hence TV(pxy,px′y′) ≥ 1−2ε.
Now we relate total variation distance to Hellinger distance as follows:
1 2
TV2(p,q) = |p(T)−q(T)|
4
T
1 2
= (p(T ) + q(T ))(p(T ) − q(T ))
4
T
1
≤ (p(T ) + q(T ))2 (p(T ) − q(T ))2 Cauchy-Schwarz
4
1
≤ 2h2(p,q)· 2+2p(T)q(T) T
= h2(p, q)(2 − h2(p, q)).
This allows us to conclude that h2(p, q) ≥ 1 − 2√ε.
TT
6