The exponential distribution compared to the Central Limit Theorem
In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. We set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials with a thousand simulations. This was the first part for the course project for the Statistical Inference course, part of the Data Science Specialization by Johns Hopkins University on Coursera.
1. The sample mean compare to the theoretical mean of the distribution
First, we calculate the theoretical mean according to the general formula:
lambda <- 0.2 mean.exp.dist <- 1/lambda mean.exp.dist
##  5
Now, we perform the simulation and obtain the sample mean:
lambda <- 0.2 n <- 40 num.sim <- 1000 #simulation and calculation of mean set.seed(8) exp.sim <- replicate(num.sim, rexp(n,lambda)) sim.means <- apply(exp.sim, 2, mean) sample.mean <- mean(sim.means) sample.mean
##  5.006442
So, both resulst are very close. Theorical mean is 5, while mean of the simulated sample is 5.006442.
Let’s see it in a graph:
cum.mean <- cumsum(sim.means)/seq_along(sim.means) plot(seq_along(sim.means), cum.mean, type = "l", lwd=2, main="Simulation of exponential function means", xlab="Iteration", ylab="Mean") abline(h=mean.exp.dist, col="red", lwd=2) legend("bottomright", lty=c(1,1), col= c("red", "black"), legend = c("Theoretical", "Simulated"))
As the graph shows, according to the Central Limit Theorem (CLT) the simulated means should converge to the theoretical mean.
2. How variable the sample is (via variance) compare to the theoretical variance of the distribution
As previously, we calculate the theoretical variance according to the general formula:
lambda <- 0.2 sd.exp.dist <- (1/lambda/sqrt(n)) var.exp.dist <- sd.exp.dist^2 var.exp.dist
##  0.625
Now, we calculate the variance for the simulated means:
sample.sd <- sd(sim.means) sample.var <- sample.sd^2 sample.var
##  0.6028838
So, the variance are relatively close. Theoretical variance is 0.625, while variance of the simulated sample is 0.6028838.
We can see that in a graph:
cum.var <- cumsum((sim.means-sample.mean)^2)/(seq_along(sim.means)-1) plot(seq_along(sim.means), cum.var,type="l", lty=1, lwd=2, main="Simulation of exponential function variances", xlab="Iteration", ylab="Variance") abline(h=var.exp.dist, col="blue",lwd=2) legend("topright", lty=c(1,1), col=c("blue","black"), legend=c("Theoretical", "Simulated"))
As the graph shows, according to the Central Limit Theorem (CLT) the simulated variances should converge to the theoretical variance.
3. The distribution is approximately normal
In order to demonstrate that, we make a histogram of the empirical mean, with the density curve for the simulated data and the theoretical distribution curve.
x=seq(0,8,0.01) hist(sim.means, freq=FALSE, breaks= 35, col="green", main= "Distribution of Exponential function mean (simulated vs theoretical)", xlab="Simulated means") curve(dnorm(x, mean=mean.exp.dist, sd=sqrt(var.exp.dist)), add=TRUE, col="blue", lwd=2) lines(density(sim.means), lty="dashed", lwd=2) abline(v=mean.exp.dist, lwd=3, col="red") legend("topright", lty=c(1,2,1), col=c("blue","black","red"), legend=c("Theoretical distribution", "Simulated distribution", "Theoretical Mean"))
As the graph shows, the simulated distribution has a clear Gaussian behaviour.