# Fuel consumption in cars depending on transmission

The study focuses in fuel consumption (MPG, miles per gallon), which variables are more significant and, specifically, if manual or automatic transmission are better for MPG. We elaborated a multivariable linear model that explains the relationship with MPG. This model shows that cars with manual transmission get more MPG that those with automatic transmission, which is also confirmed with a t-test and a boxplot. This was the course project for the Regression Models course, part of the Data Science Specialization by Johns Hopkins University on Coursera.

## Exploratory Data Analysis

First, we explore the mtcars dataset:

``````library (datasets)
data(mtcars)
mtcars[1:3,]
``````
``````##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
``````

In the Appendix there is a matrix of plots for the dataset (Figure 1) and a boxplot of MPG vs Transmission (Figure 2). In order to perform the analysis, the variables cyl (number of cylinders), vs (V/S), gear (number of forward gears), carb (number of carburetors) and am (transmission) were transformed from numerical to factor.

## Analysis (inference and regression model)

In general, boxplot in Figure 2 (Appendix) shows that there is a difference in MPG according to transmission. With a t-test, assuming that the transmission data has a normal distribution, we can reject de null hypothesis that there is no diferences in MPG with transmission:

``````t.test(mpg~am, mtcars)
``````
``````##
##  Welch Two Sample t-test
##
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual
##                17.14737                24.39231
``````

But, as Figure 1 (Appendix) shows, there are several variables that correlates with MPG. We need to build a multivariable model for MPG. In order to get the best model, we used the step function, that choose a model according to the Akaike Information Criterion (AIC) in a stepwise Algorithm. Our initial model includes all the variables.

``````lm.initial <- lm(mpg ~ ., data = mtcars)
lm.aic <- step(lm.initial, direction = "both")
summary(lm.aic)
``````

With AIC the best model as a predictor of fuel consumption (MPG) includes the variables cyl, hp, wt and am. We also tested the Bayesian Information Criterion (BIC) to choose the best model:

``````lm.bayesian <- step(lm.initial, k=log(nrow(mtcars)))
``````

With BIC, the best model includes the variables wt, qsec and am:

``````summary(lm.bayesian)
``````
``````##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.4811 -1.5555 -0.7257  1.4110  4.6610
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   9.6178     6.9596   1.382 0.177915
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## amManual      2.9358     1.4109   2.081 0.046716 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11
``````

We selected the BIC best model because it’s more simple (it has fewer variables), while it explain the same variability (R-squared: BIC= 0.834, AIC= 0.84), and has the same p-value near 0. In Figure 3 (Appendix) there are the graphics for this model, showing that residuals are normally distributed (Normal Q-Q), verify the independence condition (Residuals vs Fitted), have constant variance (Scale-Location) and there are no disrupting outliers (Residuals vs Leverage).

According to this model, when wt and qsec are constant, cars with manual transmission (amManual) add 2.936 more miles per gallon (MPG) to the cars with automatic transmission (Intercept).

So the conclusion is that cars with manual transmission are more efficient in fuel consumption (they have a 30.5% better MPG) than those with automatic transmission.

# Appendix

## Figure 1 ## Figure 2 ## Figure 3

! Written on March 12, 2016