CS代考计算机代写 STAT 513/413: Lecture 9 The “other” random numbers: distributions
STAT 513/413: Lecture 9 The “other” random numbers: distributions
(transforming uniformly distributed numbers to something else) Rizzo 3.1, 3.2, 3.3, 3.4, 3.5
Almost all of them are somewhere out there
Naming convention in R
ddistrib(x,…) – density function
pdistrib(x,…) – cumulative distribution function
qdistrib(p,…) – quantile function
rdistrib(n,…) – random numbers
If no other parameter is specified, it assumes the “standard” version of the distribution
For instance, rnorm(100) generates 100 random numbers from normal distribution with mean 0 and standard deviation 1
And runif(50) generates 50 random numbers from the “standard random generator”: random numbers with uniform distribution on (0, 1)
And almost all of the distributions can be found somewhere out there, if not in base R – but beware: not all packages are made equal!
1
Discrete distributions
For discrete distributions with finitely many outcomes, sample() is a way to go – but beware whether it is sampling with replacement or without (default)!
> sample(1:20,10)
[1] 918131519 420 3 616
> sample(1:5,10)
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when ’replace = FALSE’
> sample(1:5,10,replace=TRUE)
[1] 2 3 3 5 4 3 1 4 3 3
> sample(1:20,10,replace=TRUE)
[1] 7 115 2 71710191519
> sample(c(0,1),20,replace=TRUE,prob=c(0.1,0.9))
[1] 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0
2
1. Inversion method
Method: apply quantile function of the desired distribution on the standard uniform random numbers
Quantile function: the inverse to the cumulative distributioin function, Q(u) = F−1(u)
Standard uniform random numbers: uniform on (0,1), runif() Example: exponential distribution, f(x) = λe−λx
F(x) = 1 − e−x Q(u) = − log(1 − u)
Generator: − log(1 − U) where U is uniform on (0, 1)
Inverse method works well, but sometimes may be hard to come by
3
f(x) = 1 e−(x−μ)2
2σ2 2πσ
Example: normal distribution
There is no closed form F, and neither that of Q – but we can compute Q numerically – in R it is function qnorm()
So for μ=1 and σ=2 we get
> y=1+2*qnorm(runif(10000))
although we could use also
> z=qnorm(runif(10000),1,2)
Let us check that on a qqplot; the special version, when normal distribution is checked, is
> qqnorm(y)
> qqline(y)
If we want to check whether two version above come to the same, we do (See Lecture 7 again if necessary)
> qqplot(y,z)
> abline(0,1)
4
Seems like it works
-5 0 5
z
-5 0 5
y
5
Why does inversion work?
A very short, but illuminating proof: we show what is the cumulative distribution function of Q(U) = F−1(U). For simplicity, we may assume that F is strictly increasing, F(u) < F(v), whenever u < v - so there is no problem with inverting F and so on. (The principle, however, works in greater generality)
P[Q(U) x] = P[F−1(U) x]
and apply F on both sides of the inequality F−1(U) x; as F is
nondecreasing, the inequality is preserved, which means that P[F−1(U) x] = P[F(F−1(U)) F(x)]
Now F(F−1(U)) = U, by the definition of F−1(u), and
P[F−1(U) x] = P[F(F−1(U)) F(x)] = P[U F(x)] = F(x)
as the last probability is for U uniform on (0,1) equal to F(x). We obtained that P[F−1(U) x] = F(x), that is, the cumulative distribution function F−1(U) of the generated random number is equal to the desired F(x) - which was desired
6
The default for rnorm(): inversion To learn more about random number generators in R:
> ?RNG
> RNGkind()
[1] “Mersenne-Twister” “Inversion”
These are defaults. The second one specifies the way of generating random numbers with normal distribution – we will discuss that later. … normal.kind can be “Kinderman-Ramage”, “Buggy Kinderman-Ramage” (not for set.seed), “Ahrens-Dieter”, “Box-Muller”, “Inversion” (the default), or “user-supplied”. (For inversion, see the reference in qnorm.) The Kinderman-Ramage generator used in versions prior to 1.7.0 (now called “Buggy”) had several approximation errors and should only be used for reproduction of old results. The “Box-Muller” generator is stateful (sic!) as pairs of normals are generated and returned sequentially. The state is reset whenever it is selected (even if it is the current normal generator) and when kind is changed.
7
2. Acceptance-Rejection method
Example: obtain random numbers with a density
f(x)=3x(2−x), x∈[0,2] 4
x is between a=0 and b=2, y between 0 and c=0.75
a
b
c
X=U accepted U rejected
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
U
Important: the argument of the density is within [a,b], the values within [0,c]. The desired random number is produced as follows: 1. Generate U uniform on [a,b] and V uniform on [0,c]
2. if f(U)V, take X=U (accept)
3. if not, repeat (that is, reject, return to 1, and repeat until once X is obtained, that is, accepted)
8
V
0.0 0.2 0.4 0.6 0.8
It works!
Inversion
0.0 0.5 1.0 1.5 2.0
0.0 0.5 1.0 1.5 2.0
Acceptance/Rejection
9
Theoretical justification
Notation: a bit tricky, while otherwise straightforward. Let the density of accepted U be h(x). We would like to show that it is actually f(x), the density we would like the random numbers to have
The density of V is fV(x) = 1 on [0,c] c
The density of U is fU(x) = 1 on [a,b] b−a
the latter is not the same as the conditional density we are to start with, but for simplicity we also use fU:
h(x) = fU(x|x accepted) =
= fU(x)P[f(x)V] = b−a c = f(x)
fU(x)P[x = U & f(U) V] fU(z)P[z=U & f(U)V]dz
1 f(x)1
fU(z)P[f(z) V] dz 1 f(z)1 dz f(z) dz
b−a c
= f(x) = f(x) as was desired 1
10
3. Transformation methods
When the desired distribution is a distribution of transformed variable with other distribution, and the transformation is not difficult – then why not to go this way
Example: Example:
Chi-square – sum of k squares of standard normal Box-Muller transform. If U,V are independent random
variables with uniform distribution on (0,1), then √−2logUcos(2πV) and √−2logUsin(2πV)
are independent random variables with standard normal distribution
11
4. What else?
As a special case of transformation methods, one can consider also convolutions (that is, sums) and mixtures
Example. For instance, it was once popular to generate random numbers with normal distribution by a sum of k random numbers with uniform distribution on (0, 1), Usually, k was set to be 12 or 48, as in such case the sum had convenient variance (you may want to exercise yourself to see why). The motivation came from the central limit theorem: with growing k, the distribution of sums of identically distributed random variables is better and better approximated by normal distribution.
In fact, already for k = 12 the approximation is not that bad – in the centre (this is why the limit theorem is called “central”). But not on tails: sum of 12 uniform random numbers cannot be larger than 12 and less than 0. Thus, nowadays other methods are preferred for generating normal random numbers
12
Another example: Poisson-Gamma mixture
> lambda = rgamma(6,4,3)
> lambda
[1] 1.4796001 3.4630581 0.4897232 0.5519801 0.4214378 2.1400598 > set.seed(1)
> rpois(6,lambda)
[1] 1 3 0 2 0 4
> set.seed(1)
> rpois(6,lambda[1])
[1] 1 1 2 3 0 3
> set.seed(1)
> x=NULL
> for (k in (1:length(lambda))) x[k] = rpois(1,lambda[k])
>x
[1] 1 3 0 2 0 4
Beware: it usually works, but better check… Finally, Rizzo again: 3.1, 3.2, 3.3, 3.4, 3.5
13