ToothGrowth dataset analysis

In this project we’re going to analyze the ToothGrowth data in the R datasets package. This dataset corresponds to the the effect of vitamin C on tooth growth in guinea pigs. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC). This was the second part of the course project for the Statistical Inference course, part of the Data Science Specialization by Johns Hopkins University on Coursera.

1. Some basic exploratory data analyses and summary of the ToothGrowth data

First, we explore the ToothGrowth data to obtain a first impression:

data("ToothGrowth")
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
tail(ToothGrowth)
##     len supp dose
## 55 24.8   OJ    2
## 56 30.9   OJ    2
## 57 26.4   OJ    2
## 58 27.3   OJ    2
## 59 29.4   OJ    2
## 60 23.0   OJ    2

A general summary:

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Now, the summary of length splitted by supplement:

split.supp <- split(ToothGrowth$len, ToothGrowth$supp)
summary.supp <- sapply(split.supp, summary)
summary.supp
##            OJ    VC
## Min.     8.20  4.20
## 1st Qu. 15.52 11.20
## Median  22.70 16.50
## Mean    20.66 16.96
## 3rd Qu. 25.72 23.10
## Max.    30.90 33.90

And summary of length splitted by dose:

split.dose <- split(ToothGrowth$len, ToothGrowth$dose)
summary.dose <- sapply(split.dose, summary)
summary.dose
##            0.5     1     2
## Min.     4.200 13.60 18.50
## 1st Qu.  7.225 16.25 23.52
## Median   9.850 19.25 25.95
## Mean    10.600 19.74 26.10
## 3rd Qu. 12.250 23.38 27.83
## Max.    21.500 27.30 33.90

Finally, boxplots by supplement, dose and supplement & dose:

boxplot(len~supp, ToothGrowth, col=ToothGrowth$len, xlab="Supplement",
        ylab="Length", main="Boxplot of lenght by supplement")

_config.yml

boxplot(len~dose, ToothGrowth, col=ToothGrowth$len, xlab="Dose", ylab="Length",
        main="Boxplot of lenght by dose")

_config.yml

boxplot(len ~ supp * dose, ToothGrowth, col=c("orange","purple"),
ylab="Tooth Length", xlab="Supplement & Dose",
main="Boxplots by supplement type and dose")

_config.yml

2. Confidence intervals and hypothesis tests to compare tooth growth by supp and dose.

Comparing length by supplement and dose, with t-test, we can check if the is significant differences from a statistical point of we.

We assume that populations are normally distributed, standard deviations of the populations are equal and the samples were random.

First null hypothesis: There is no significant difference in length by supplement

t.test(split.supp$OJ, split.supp$VC)
## 
##  Welch Two Sample t-test
## 
## data:  split.supp$OJ and split.supp$VC
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

As p-value is 0.06063, higher 0.5, we can’t reject the null hypothesis. In addition, the confidence interval contains zero.

Second null hypothesis: There is no significant difference in lenght by dose

t.test(split.dose$`0.5`,split.dose$`1`)
## 
##  Welch Two Sample t-test
## 
## data:  split.dose$`0.5` and split.dose$`1`
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean of x mean of y 
##    10.605    19.735
t.test(split.dose$`1`,split.dose$`2`)
## 
##  Welch Two Sample t-test
## 
## data:  split.dose$`1` and split.dose$`2`
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean of x mean of y 
##    19.735    26.100
t.test(split.dose$`0.5`,split.dose$`2`)
## 
##  Welch Two Sample t-test
## 
## data:  split.dose$`0.5` and split.dose$`2`
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean of x mean of y 
##    10.605    26.100

All the comparisons by dose has a p-value far below 0.05 (virtually 0). In addition any of the confidence contains zero.The null hypothesis can be rejected. There is a significant difference in lenght by dose.

3. Conclusions

We performed an exploratory analysis of the GrothTooth dataset in R. We splitted the dataset by supplement and dose, and we checked with t.test if there was some statistically significant diferences in length by dose or by supplement.

The conclusions is that there is not statistical evidence of significant differences in length by supplement. On the contrary, there is strong statistical evident of significant differences in lenght by increasing doses.

Written on February 25, 2016