# CS代考计算机代写 AI COMP760, SUMMARY OF LECTURE 17.

COMP760, SUMMARY OF LECTURE 17.
HAMED HATAMI
1. How to do compression?
In this lecture we will discuss how to compress a protocol with low information cost, but possibly very high communication cost, to a protocol with low communication cost. It will be important to understand how much information is revealed at every step of the protocol. Thus we will take a closer look at the protocol tree, and we shall try to break the information cost into an expression in terms of the amounts of the information that are revealed nodes of the tree.
Consider a protocol π, and fix the public randomness to R = r (recall that we denoted this protocol by πr) so that there is no public randomness anymore. Note that every internal node v of the protocol tree at distance j from the root corresponds to a value for the partial transcript Π≤j. Let Lj denote the set of the internal nodes at distance j from the root, Vint denote the internal nodes of the tree, and let l be the number of rounds of the protocol. Using the formula I(A;B|C) = Ec∼CI(A;B|C = c), we have
I(X;Π|Y) = I(X;Π1|Y)+I(X;Π2|YΠ1)+…+I(X;Πl|YΠ≤l−1)
= I(X; Π1|Y ) + 􏰕 Pr[πr reaches v]I(X; Π2|Y v) + . . . + 􏰕 Pr[πr reaches v]I(X; Πl|Y v)
v∈L1 v∈Ll−1
= 􏰕 Pr[πr reaches v]I(X;Πv|Yv), v∈Vint
where Πv is the message sent at the node v. Repeating the same argument for I(Y;Π|X), we conclude that
IC(πr)= 􏰕 Pr[πr reachesv](I(X;Πv|Yv)+I(Y;Πv|Xv)), v∈Vint
If Bob is the owner of the node v, then I(X;Πv|Yv) = 0, and if Alice is the owner, then I(Y ; Πv|Xv) = 0. Hence we have the following proposition.
Proposition 1. Let π be a protocol, and πr be the protocol obtained from π by setting the public randomness to R = r. If Vint denotes the internal nodes of the protocol tree of πr, then
(1) IC(πr) = 􏰕 Pr[πr reaches v]I(P;Πv|Qv), v∈Vint
where P =X and Q=Y if Alice owns v, and P =Y and Q=X otherwise. Similarly
(2) ICext(πr)= 􏰕 Pr[πr reachesv]I(XY;Πv|v), v∈Vint
Proposition 1 breaks the information cost to a sum over the terms that correspond to the amount of information that is revealed at node v.
As we saw in Lecture 15, given an input (x,y), every internal node u, defines a probability distribution on the set of its children. This probability distribution is only known to the owner of
1

2 HAMED HATAMI
the node u. If u is owned by Alice, then she knows the probability distribution pxu(·), and if v is
owned by Bob, then he knows the probability distribution pyv(·). Given a leaf t of the protocol tree,
the probability that t will be reached by the protocol for a given pair of inputs x, y is given by 
 􏰖 x   􏰖 y  x y PrRA ,RB [Πxy (t)] =  pvi (vi+1 ) ×  pvi (vi+1 ) = pA (t)pB (t),
 i∈[l−1]   i∈[l−1]  Alice owns vi Bob owns vi
where (v1,v2,…,vl(= t)) is the unique path from the root to the leaf t. The reason that Alice and Bob need to communicate is that one of them knows some of the terms in the product and the other one knows the rest. Indeed if they knew the probabilities PrRA ,RB [Πxy (t)], then they could use the public randomness to mutually sample a leaf according to this probability distribution, and that would be a simulation of πr with no communication. The idea behind the compression is that Alice and Bob will try to communicate in order to be able to choose a leaf with the right probability distribution PrRA ,RB [Πxy (t)], and they will try to save the amount of the communication using the assumption that the information cost of the original protocol is low.
Consider πr and an input (x,y). Let u be a node owned by Alice, and thus she knows the probability distribution pxu(·). Bob, in general, does not know what this probability distribution is, as he does not know x, but he can have an estimate of this probability distribution. Bob knows y, rB and the partial transcript u, and considering them, he has a posterior probability distribution for w (and prior to knowing x). More precisely
quy(w) = PrXY,RA [au(x, RA, r) = w|u, Y = y] = Ex∼μ|u,Y =y [pxu(w)]
is Bob’s best estimate for pxu, and similarly if v is an internal node owned by Bob then
qvx(w)=PrXY,RB[bv(y,RB,r)=w|v,X =x]=Ey∼μ|v,X=x [pyv(w)]
is Alice’s best estimate for pyv. We will think of the probabilities pxu and pyv as the “correct” probabilities and of quy and qvx as the estimate of the other party of these probabilities. Now with this notation we can state Proposition 1 in terms of divergence.
Theorem 2. Let π be a protocol, and let Vint denote the internal nodes of the protocol tree. Then 
 ICμ(π) = ErExy∼μ 􏰕 Pr[πr reaches v]D(pv∥qv) ,
v∈Vint
where pv = pxv and qv = qvy if v is owned by Alice, and pv = pyv and qv = qvx otherwise.
Proof. The proof is by applying the formula
I(A; B|C) = Eac∼AC [D (B|A=a,C=c ∥ B|C=c)]
from Lecture 13 to each term in (1). 􏰮
1.1. Braverman-Rao’s correlated sampling. Theorem 2 shows that if the information cost of a protocol tree is low, then for many nodes on the tree, the divergence of the correct distribution (known to the owner) from the other party’s prior of that distribution is small. Let us look at an internal node u of of the tree, and let us assume that it is owned by Alice. Suppose that u has many children, and thus at this point Alice is likely to send a long message. Note that for example in a protocol with bounded number of rounds, most nodes of the tree will have many children. Now Alice is going to pick a child according to the distribution pxu, which is known to her. If pxu is

COMP760, SUMMARY OF LECTURE 17. 3
close to Bob’s estimate quy, then can they use a small amount of communication to mutually pick a child according to the probability distribution pxu, rather than first Alice picking the child and then sending the possibly long message to Bob? Note for example if pxu = quy and if they both know this fact, then they do not need to communicate at all. Instead they can use public randomness to choose a child randomly according to pxu (which is known to both in this case). Note also the difficultly here that even if pxu = quy, Alice and Bob might not be aware of it. Braverman and Rao [BR14] discovered a protocol whose communication is bounded by a function of the divergence D(pxu∥qux) and allows Alice and Bob to do this sampling mutually.
1.1.1. The correlated sampling protocol. Braverman-Rao’s correlated sampling is very fundamental, and it is interesting even outside the realm of information complexity. Consider two distributions p and q over a universe U. Suppose that p is known to Alice and q is known to Bob. They want to use as little communication as possible to mutually pick a random element from U according to the distribution p.
Alice and Bob will use the public randomness to generate a random sequence (a1, a2, . . .) where each ai = [xi,αi] is sampled independently by picking xi ∈ U and αi ∈ [0,1] both uniformly at random (and independently of each other). Let aj = [xj,αj] be the first element in the sequence with αj ≤ p(xj). Obviously Alice is the only one who knows what aj is, as Bob does not know p. Note that Pr[xj = x] = p(x), which shows that xj has the right distribution p. So what remains is that Alice has to give enough hints to Bob, so that he can figure out what j is. First note that with high probability j = O(|U|). Indeed note that 1[αi≤p(xi)] are independent Bernoulli random variable with expected values 1 . Hence
|U|
Pr[j>m|U|]= 1−|U| p(u)}. On other hand this conditioning does not affect the distribution of (aj+1, aj+1, . . .). Note that in either case l < j, or l > j,
2t Pr[l ∈ Qt|j] ≤ Pr[l ∈ Qt] = |U|.
q(xj) number of rounds is bounded by

COMP760, SUMMARY OF LECTURE 17.
5
Moreover by the property of the hash functions Pr[hi(xl)=hi(xj)∀i=1,…,s+t|j,l,xl ̸=xj]≤2−s−t.
Hence
2−s ε Pr[l∈Qt,xl̸=xj,(hi(xl)=hi(xj)∀i=1,…,s+t)|j]≤ |U| ≤|U|.
Applying the union bound completes the proof. 􏰮
Setting t to be terminating round, shows that the protocol π performs the required task, has error probability at most ε, and its average communication complexity is at most 5 + log(1/ε) + 2D(p∥q). Unfortunately the price 2D(p∥q) is too much to pay for applications such as information equals amortized communication. The coefficient 2 comes from the fact that at each round Bob has to say “fail” so that Alice will send him another hash value. This can be easily remedied. They can combine many rounds together. In other words at each round Alice would send many new hash values to Bob.
• Alice finds the first j with the first αj ≤ p(xj ). • AlicesendsBobk=􏰩 j 􏰪.
• AlicesendsBobthehashvalueshi(xj)fori=1,…,s,wheres=2+⌈log(1/ε)⌉. • Repeat until Bob produces an output, beginning with t = 0:
– Bob defines
Qt ={i∈W :αi ≤2t2q(xi)}.
|U|
– If any element in Qt matches the hash value Bob responds “success” and outputs that value.
– Otherwise he responds “fail”, and Alice sends him another 2t + 3 new hash values.
– Set t = t + 1
This new protocol terminates after fewer rounds as Qt grows much faster. Indeed it will terminate after at most T := 􏰩􏰜log p(xj ) 􏰪 rounds. The analysis of the error probability is exactly as the
q(xj)
previous protocol. The number 2t + 3 is chosen for the convenience so that after t rounds the
total number of hash functions is s + (t + 1)2 − 1. Hence using T ≤ 􏰜log p(xj ) + 1 the average q(xj)
communication is at most CCavg(π) ≤
E􏰥⌈log(k)⌉+s−1+(T +1)2 +2T􏰦
≤ 5 + log(1/ε) + Ex∼p
􏰧 p(x) 􏰝 p(x)􏰨 log q(x) + 5 log q(x)
= D(p∥q) + 5 + log(1/ε) + 5Ex∼p 􏰝
􏰧􏰝 p(x) 􏰨 log q(x)
􏰊 p(x)􏰋 log q(x)
≤ D(p∥q) + 5 + log(1/ε) + 5 Ex∼p
= D(p∥q) + log(1/ε) + O(􏰛D(p∥q) + 1).
This finally finishes the analysis of the correlated sampling, and proves the following theorem.

6 HAMED HATAMI
Theorem 4. Given probability distributions p and q over the same universe, known to Alice and Bob respectively, there exists a protocol with expected communication at most
D(p∥q) + log(1/ε) + O(􏰛D(p∥q))
such that with probability at least 1 − ε both parties output the same output x which is distributed
exactly according to p.
Next we show that how one can use this sampling to compress a protocol that has bounded
number of rounds.
1.2. Compression using this sampling. The idea is that Alice and Bob start from the root and use Theorem 4 at every node to sample a child according to the correct distribution. If no error occurs, then they will reach a mutual leaf which will have the same distribution as in the original protocol, and furthermore Theorem 2 shows that the expected number of communicated bits is bounded by a function of the information cost. However there is one complication. What if an error occurs and Alice and Bob lose synchronization and they end up on different nodes, and thus traverse different paths down the tree. The problem here is that in this case we do not have any control on the amount of communication anymore. In this case, Alice and Bob will do correlated sampling with two completely irrelevant distribution as one thinks that they are on one node of the tree and the other thinks they are on some other node. That is why in the following theorem, there is a conditioning on an event E. Here E is basically the event that all the correlation samplings will go right, and Alice and Bob stay synchronized.
Theorem 5. Consider a k-round protocol π, and ε > 0. There is a public coin protocol τ and and event E(x, y, rπ, rτ ) such that
Pr[E|x,y,rπ]≥1−ε ∀x,y,rπ,
where rπ is the public randomness of π and rτ is the public randomness of τ. Moreover conditioned
on E the following two statements hold:
• τ(x,y) has the same distribution as πr(x,y) for every x,y,r. √ • The expected number of communicated bits is at most I + O(k log(k/ε) +
the information cost of π.
Ik), where I is
Proof. Let δ = ε/k. Alice and Bob use Theorem 4 (with error parameter δ) at every node to sample a child according to the correct distribution. At a node v, this requires the expected communication at most
D(pv∥qv) + log(1/δ) + O(􏰛D(pv∥qv)).
While they mutually sampled a child they move to that child and continue this process until they reach a leaf. Let E be the event that no error occurs in these correlated samplings, and let v1, . . . , vk+1 be the path from the root to the leaf transversed by them. Since there only k rounds, the probability that an error happens in the correlated samplings is at most kδ = ε. Hence Pr[E|x,y,rπ] ≥ 1−ε. On the other hand conditioned on E, the expected communication is at most
k k 􏰭􏰬k  􏰕D(pv ∥qv )+log(1/δ)+O(􏰛D(pv ∥qv ))≤klog(1/δ)+􏰕D(pv ∥qv )+O 􏰬􏰫k􏰕D(pv ∥qv ) ,
ii ii iiii i=1 i=1 i=1
where we used the Cauchy-Schwarz inequality. Taking the expected value with respect to xy ∼ μ, and the public randomness rπ, we conclude that conditioned on E, the expected number of

communicated bits is at most
E [communication|E] ≤ k log(1/δ)+E 􏰕 D(p xy∼μ,r xyrπ
􏰬􏰫kE xyrπ
􏰕 D(p i=1
∥q
COMP760, SUMMARY OF LECTURE 17. 7
􏰧k 􏰨􏰭􏰬􏰧k 􏰨
∥q
where we used concavity of √x to take the expected value inside the square root. Now the proof is
i=1
vi
vi
)
+O

vi
vi
)

,
completed as
Exyr using Theorem 2.
 􏰕 D(pv ∥qv ) = Exyr 􏰕 Pr[π reaches v]D(pv∥qv)
􏰧k􏰨 ii
= ICμ(π),
i=1
v∈Vint
Remark 6. Here basically E is the event that things go right in the simulation, and thus condi- tioned on E, τ will be a perfect simulat√ion of π and furthermore its expected communication will beboundedbyK:=I+klog(k/ε)+O( Ik).NotethatthegainisthatIisoutsideO(·),andthus we know the exact constant in front of I. On the other hand when E does not happen we have no control on the communication. Obviously we can always truncate the protocol and terminate after K/ε bits of communication. Since conditioned on E the expected communication is K, by Markov’s inequality, the probability that it will be larger than K/ε will be bounded by ε. Hence this way we will get a simulation of π with error at most 2√ε and worst case communication
K/ε=Oε(I+klogk+ kI), as it was discussed in the previous lecture.
References
[BR14] Mark Braverman and Anup Rao, Information equals amortized communication, IEEE Trans. Inform. Theory 60 (2014), no. 10, 6058–6069. MR 3265014
School of Computer Science, McGill University, Montre ́al, Canada