Last updated: 2025-01-10
On Wikipedia, a Generalised Linear Model is described as follows:
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
Predicting a the miles per gallon (mpg
) of cars based on
their weight (wt
) using the mtcars
The glm()
function can handle a variety of models by
specifying a family (e.g., Gaussian, binomial, Poisson). For a basic
linear regression, the family = gaussian()
is used.
glm_model <- glm(mpg ~ wt, data = mtcars, family = gaussian())
glm(formula = mpg ~ wt, family = gaussian(), data = mtcars)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 9.277398)
Null deviance: 1126.05 on 31 degrees of freedom
Residual deviance: 278.32 on 30 degrees of freedom
AIC: 166.03
Number of Fisher Scoring iterations: 2
Miles per gallon decreases -5.3445 for a unit gained in weight. In order words, the heavier the car, the less distance travelled per gallon of fuel.
The lm()
function performs linear regression.
lm_model <- lm(mpg ~ wt, data = mtcars)
lm(formula = mpg ~ wt, data = mtcars)
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Both functions produce similar results for simple linear regression
because lm()
is essentially a special case of
with family = gaussian()
provides supports for various distributions and
link functions but requires specifying the family (e.g., Gaussian,
is designed specifically for linear regression and
assumes the Gaussian distribution and identity link by default.
Use glm()
when working with non-normal response
variables or when needing other link functions. A link function is a
mathematical function that connects the linear predictor of a
generalised linear model (GLM) to the mean of the response variable’s
distribution. It allows the model to handle a wide range of response
variable types (e.g., binary, count, continuous) by transforming the
expected value of the response variable to a scale that matches the
linear predictor.
