The exponential distribution compared to the Central Limit Theorem

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. We set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials with a thousand simulations. This was the first part for the course project for the Statistical Inference course, part of the Data Science Specialization by Johns Hopkins University on Coursera.

1. The sample mean compare to the theoretical mean of the distribution

First, we calculate the theoretical mean according to the general formula:

lambda <- 0.2
mean.exp.dist <- 1/lambda
## [1] 5

Now, we perform the simulation and obtain the sample mean:

lambda <- 0.2
n <- 40
num.sim <- 1000

#simulation and calculation of mean
exp.sim <- replicate(num.sim, rexp(n,lambda))
sim.means <- apply(exp.sim, 2, mean)
sample.mean <- mean(sim.means)
## [1] 5.006442

So, both resulst are very close. Theorical mean is 5, while mean of the simulated sample is 5.006442.

Let’s see it in a graph:

cum.mean <- cumsum(sim.means)/seq_along(sim.means)

plot(seq_along(sim.means), cum.mean, type = "l", lwd=2,
     main="Simulation of exponential function means", xlab="Iteration",
abline(h=mean.exp.dist, col="red", lwd=2)
legend("bottomright", lty=c(1,1), col= c("red", "black"),
       legend = c("Theoretical", "Simulated"))


As the graph shows, according to the Central Limit Theorem (CLT) the simulated means should converge to the theoretical mean.

2. How variable the sample is (via variance) compare to the theoretical variance of the distribution

As previously, we calculate the theoretical variance according to the general formula:

lambda <- 0.2
sd.exp.dist <- (1/lambda/sqrt(n))
var.exp.dist <- sd.exp.dist^2
## [1] 0.625

Now, we calculate the variance for the simulated means: <- sd(sim.means)
sample.var <-^2
## [1] 0.6028838

So, the variance are relatively close. Theoretical variance is 0.625, while variance of the simulated sample is 0.6028838.

We can see that in a graph:

cum.var <- cumsum((sim.means-sample.mean)^2)/(seq_along(sim.means)-1)
plot(seq_along(sim.means), cum.var,type="l", lty=1, lwd=2,
     main="Simulation of exponential function variances", xlab="Iteration",
abline(h=var.exp.dist, col="blue",lwd=2)
legend("topright", lty=c(1,1), col=c("blue","black"),
       legend=c("Theoretical", "Simulated"))


As the graph shows, according to the Central Limit Theorem (CLT) the simulated variances should converge to the theoretical variance.

3. The distribution is approximately normal

In order to demonstrate that, we make a histogram of the empirical mean, with the density curve for the simulated data and the theoretical distribution curve.

hist(sim.means, freq=FALSE, breaks= 35, col="green",
     main= "Distribution of Exponential function mean 
     (simulated vs theoretical)",
     xlab="Simulated means")
curve(dnorm(x, mean=mean.exp.dist, sd=sqrt(var.exp.dist)), add=TRUE, col="blue",
lines(density(sim.means), lty="dashed", lwd=2)
abline(v=mean.exp.dist, lwd=3, col="red")
legend("topright", lty=c(1,2,1), col=c("blue","black","red"),
       legend=c("Theoretical distribution", "Simulated distribution",
                "Theoretical Mean"))


As the graph shows, the simulated distribution has a clear Gaussian behaviour.

Written on February 24, 2016