# The identical output of lm() and aov() and lm() and t.test() in R

## 2022/11/21

This one hardly counts as a blog post; it is more a reference for myself.

For some time, I’ve known on a conceptual level that a linear regression model (one “fit” using `lm()` in the statistical software and programming language R), ANOVA (`aov()`), and t-test (`t.test()`) can be conceptualized as being closely-related—though they’re often taught as separate techniques.

I’m going to add the code now, hoping to return to and expand on this later.

Let’s use a data set about penguins.

``````library(tidyverse)
library(palmerpenguins)

penguins %>%
glimpse()``````
``````## Rows: 344
## Columns: 8
## \$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## \$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## \$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## \$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## \$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## \$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## \$ sex               <fct> male, female, female, NA, female, male, female, male…
## \$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…``````

## `lm()` and `aov()`

Note how the `F-value, df, and p-value are identical.

``summary(lm(bill_length_mm ~ island, data = penguins))``
``````##
## Call:
## lm(formula = bill_length_mm ~ island, data = penguins)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -12.0677  -3.8559   0.2958   3.8175  14.3425
##
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)      45.2575     0.3897 116.127  < 2e-16 ***
## islandDream      -1.0897     0.5970  -1.825   0.0688 .
## islandTorgersen  -6.3065     0.8057  -7.827 6.44e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.036 on 339 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.154,  Adjusted R-squared:  0.149
## F-statistic: 30.86 on 2 and 339 DF,  p-value: 4.86e-13``````
``summary(aov(bill_length_mm ~ island, data = penguins))``
``````##              Df Sum Sq Mean Sq F value   Pr(>F)
## island        2   1566   782.8   30.86 4.86e-13 ***
## Residuals   339   8599    25.4
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness``````

## `lm()` and `t.test()`

Note the t-value and its df and p-value are identical (for the sex variable).

``summary(lm(bill_length_mm ~ sex, data = penguins))``
``````##
## Call:
## lm(formula = bill_length_mm ~ sex, data = penguins)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -11.2548  -4.7548   0.8452   4.3030  15.9030
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  42.0970     0.4003 105.152  < 2e-16 ***
## sexmale       3.7578     0.5636   6.667 1.09e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.143 on 331 degrees of freedom
##   (11 observations deleted due to missingness)
## Multiple R-squared:  0.1184, Adjusted R-squared:  0.1157
## F-statistic: 44.45 on 1 and 331 DF,  p-value: 1.094e-10``````
``t.test(bill_length_mm ~ sex, var.equal = TRUE,data = penguins)``
``````##
##  Two Sample t-test
##
## data:  bill_length_mm by sex
## t = -6.667, df = 331, p-value = 1.094e-10
## alternative hypothesis: true difference in means between group female and group male is not equal to 0
## 95 percent confidence interval:
##  -4.866557 -2.649027
## sample estimates:
## mean in group female   mean in group male
##             42.09697             45.85476``````

I may come back to this later to flesh this out more…