# The exponential distribution compared to the Central Limit Theorem

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. We set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials with a thousand simulations. This was the first part for the course project for the Statistical Inference course, part of the Data Science Specialization by Johns Hopkins University on Coursera.

#### 1. The sample mean compare to the theoretical mean of the distribution

First, we calculate the theoretical mean according to the general formula:

``````lambda <- 0.2
mean.exp.dist <- 1/lambda
mean.exp.dist
``````
``````##  5
``````

Now, we perform the simulation and obtain the sample mean:

``````lambda <- 0.2
n <- 40
num.sim <- 1000

#simulation and calculation of mean
set.seed(8)
exp.sim <- replicate(num.sim, rexp(n,lambda))
sim.means <- apply(exp.sim, 2, mean)
sample.mean <- mean(sim.means)
sample.mean
``````
``````##  5.006442
``````

So, both resulst are very close. Theorical mean is 5, while mean of the simulated sample is 5.006442.

Let’s see it in a graph:

``````cum.mean <- cumsum(sim.means)/seq_along(sim.means)

plot(seq_along(sim.means), cum.mean, type = "l", lwd=2,
main="Simulation of exponential function means", xlab="Iteration",
ylab="Mean")
abline(h=mean.exp.dist, col="red", lwd=2)
legend("bottomright", lty=c(1,1), col= c("red", "black"),
legend = c("Theoretical", "Simulated"))
`````` As the graph shows, according to the Central Limit Theorem (CLT) the simulated means should converge to the theoretical mean.

#### 2. How variable the sample is (via variance) compare to the theoretical variance of the distribution

As previously, we calculate the theoretical variance according to the general formula:

``````lambda <- 0.2
sd.exp.dist <- (1/lambda/sqrt(n))
var.exp.dist <- sd.exp.dist^2
var.exp.dist
``````
``````##  0.625
``````

Now, we calculate the variance for the simulated means:

``````sample.sd <- sd(sim.means)
sample.var <- sample.sd^2
sample.var
``````
``````##  0.6028838
``````

So, the variance are relatively close. Theoretical variance is 0.625, while variance of the simulated sample is 0.6028838.

We can see that in a graph:

``````cum.var <- cumsum((sim.means-sample.mean)^2)/(seq_along(sim.means)-1)

plot(seq_along(sim.means), cum.var,type="l", lty=1, lwd=2,
main="Simulation of exponential function variances", xlab="Iteration",
ylab="Variance")
abline(h=var.exp.dist, col="blue",lwd=2)
legend("topright", lty=c(1,1), col=c("blue","black"),
legend=c("Theoretical", "Simulated"))
`````` As the graph shows, according to the Central Limit Theorem (CLT) the simulated variances should converge to the theoretical variance.

#### 3. The distribution is approximately normal

In order to demonstrate that, we make a histogram of the empirical mean, with the density curve for the simulated data and the theoretical distribution curve.

``````x=seq(0,8,0.01)
hist(sim.means, freq=FALSE, breaks= 35, col="green",
main= "Distribution of Exponential function mean
(simulated vs theoretical)",
xlab="Simulated means")
curve(dnorm(x, mean=mean.exp.dist, sd=sqrt(var.exp.dist)), add=TRUE, col="blue",
lwd=2)
lines(density(sim.means), lty="dashed", lwd=2)
abline(v=mean.exp.dist, lwd=3, col="red")
legend("topright", lty=c(1,2,1), col=c("blue","black","red"),
legend=c("Theoretical distribution", "Simulated distribution",
"Theoretical Mean"))
`````` As the graph shows, the simulated distribution has a clear Gaussian behaviour.

Written on February 24, 2016