# CS计算机代考程序代写 algorithm deep learning Outline

Outline
Covered content:
� Products of Experts (PoE)
� Restricted Boltzmann Machine (RBM) � Structure of an RBM
� RBM learning algorithms
� Application of RBMs
Reference:
� Hinton, Geoffrey E. (2002). ”Training Products of Experts by Minimizing Contrastive Divergence” (PDF). Neural Computation. 14 (8): 1771–1800
1/24

Beyond Clustering
Mixture model Product of experts (last week’s lecture) (today’s lecture)
2/24

Mixture model vs. Product of experts
Mixture Model: Mixture models are commonly used for clustering problems and the probability is defined as a weighted sum of probability distributions:
�C
αk ·p(x|θk)
with αk denoting the mixing coefficients subject to the constraints �Ck=1 αk = 1 and αk ≥ 0.
Product of Experts (PoE): Products of experts are commonly used for learning dis- tributed representations (e.g. unsupervised feature extraction) and are defined as a product
of functions.
p(x|θ) =
where Z is a normalization term set to ensure that p(x|θ) integrates to 1.
p(x|α,θ)=
k=1
1 �H
g(x|θj ) Z j=1 � �� �
jth expert
3/24

Example 1: Product of Gaussians
Consider the experts to be univariate Gaussians spe- cialized on a particular input dimension
g(x|θi) = 1 exp � − (xi − μi)2 � (2π σi2 )1/2 2σi2
The product of experts (PoE) can be developed as:
�d
p(x|θ) = g(x|θi)
i=1 � � = 1 exp − 1(x−μ)�Σ−1(x−μ)
with Σ = diag(σ12 , . . . , σd2 ), i.e. a multivariate Gaussian distribution without correlation between features.
(2π)d/2|Σ|1/2 2
4/24

Example 2: Product of Gaussian Mixtures
Let the experts be the univariate mixtures:
�C k=1
π−1/2 exp(−(xi − μik)2)
g(x|θi) =
The product of experts (PoE) can be developed as:
1 �d �C
p(x|θ) = Z π−1/2 exp(−(xi − μik)2)
i=1 k=1
1�C �C�d
= Z ··· π−1/2 exp(−(xi −μiki)2) k1=1 kd=1 i=1
� �� �
multivariate Gaussian
i.e. a mixture of exponentially many (Cd) multivariate Gaussians. Therefore, a PoE can encode many varia- tions with few parameters.
5/24

Example 3: Product of t-Student Distributions
Define experts to be t-Student distributions in some projected space (z = wj�x):
g(x|θj) = 1 �wj� = 1 α j + ( w j� x ) 2
The resulting product of experts
p ( x | θ ) = 1 �H 1
with H the number of experts, produces a non-Gaussian multivariate distribution, which can be useful to model e.g. image or speech data. This PoE has connections to other analyses, e.g. independent component analysis (ICA).
Zj=1αj +(wj�x)2
6/24

The Restricted Boltzmann Machine
latent 0 1 0 0 1 1
input 01001
The restricted Boltzmann machine (RBM) is a joint probability model defined over input
features x ∈ {0, 1}d and latent variables h ∈ {0, 1}H .
1��d�H �H� p(x,h|θ) = Z exp xiwijhj + bjhj
i=1 j=1 j=1
The parameter wij can be interpreted as the connection strength between input feature xi and latent variable hj. The larger wij the stronger xi and hj co-activate.
7/24

The Restricted Boltzmann Machine (PoE View)
Connection between RBM and PoE
The RBM, when marginalized over its hidden units, has the structure of a Product of Experts with g(x, θj ) = (1 + exp(wj�x + bj )).
Proof:

p(x|θ) = =
p(x, h|θ)
�1��d�H �H�
h∈{0,1}d
h∈{0,1}d i=1 j=1 j=1
Z exp xiwijhj + bjhj 1 � �H
= Z exp((wj�x+bj)·hj) h∈{0,1}d j=1
=1�H � exp((wj�x+bj)·hj) Z j=1 hj ∈{0,1}
1 �H
= Z (1+exp(wj�x+bj))
j=1
8/24

Interpreting the RBM Experts
The experts
g(x, θj ) = 1 + exp(wj�x + bj )
forming the RBM implement two behaviors:
� wj�x+bj >0:g(x;θj)�1: theexamplexisin the area of competence of the expert and the latter speaks in favor of x.