Inferences for Multiple Linear Regression

Objectives

Interpret inferential statistics for various types of regression models, including those with indicator variables, quadratic terms, and interaction terms.
Interpret and communicate results from a multiple regression analysis.

Multiple Linear Regression (MLR) with a Single Predictor: Polynomial Regression

This is still considered a linear model.
- It can capture curvilinear trends.
- It is linear in the \(\beta_i X^j\) term: for every one-unit increase in \(X^j\), \(Y\) changes by a constant \(\beta_i\).
- If the quadratic coefficient is significant but the linear term is not, the linear (lower-order) terms should still be included because they form a singular idea.
- Interpretation can be difficult, especially with higher-order terms.
In interpreting a quadratic model:
- The vertex (absolute maximum or minimum) can be found using: \[ x_{\text{vertex}} = \frac{-\beta_1}{2\beta_2} \]
- This value of \(X\) represents the point at which \(Y\) stops increasing or decreasing, based on the quadratic regression model.

Coefficient of Determination (\(R^2\))

\(R^2\) is a measure of effect size for both simple linear regression (SLR) and MLR.
It represents the proportion of variance in the response variable explained by the model.

Calculation in MLR

Multiple \(R^2\): This is the squared correlation between the observed values of \(Y\) and the predicted values of \(Y\).
Partition of sums of squares: The total variation in \(Y\) (\(\text{SS}_{\text{total}}\)) can be split into the variation explained by the model (\(\text{SS}_{\text{regression}}\)) and the variation left unexplained (\(\text{SS}_{\text{residual}}\)): \[ \text{SS}_{\text{total}} = \text{SS}_{\text{regression}} + \text{SS}_{\text{residual}} \] This partition leads to: \[ R^2 = \frac{\text{SS}_{\text{total}} - \text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} = \frac{\text{SS}_{\text{regression}}}{\text{SS}_{\text{total}}} = 1 - \frac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} \]

Relationship with the Overall F-test

The F-test statistic can be rewritten in terms of \(R^2\).
Testing whether all parameters equal zero is equivalent to testing whether \(R^2 = 0\).

\(R^2\) and Parsimony

Adding more predictors will always increase \(R^2\).
An oversaturated model can have \(R^2 = 1\) even if the predictors do not contribute meaningfully.
The goal is to create a parsimonious model—one that is as simple as possible while still maximizing \(R^2\).

Adjusted \(R^2\)

Adjusted \(R^2\) accounts for the number of predictors (\(k\)) and the sample size (\(n\)).
The formula is: \[ R^2_{\text{adj}} = R^2 - (1 - R^2) \frac{k}{n-k-1} \]
Adjusted \(R^2\) penalizes unnecessary predictors and small sample sizes.
It rewards variables that significantly improve model fit, resulting in a net increase if a predictor explains a substantial proportion of the variance.
It is generally a better measure of effect size than \(R^2\) in MLR.

Overfitting

A value of \(R^2\) equal to 1 can always be achieved, but this does not indicate good predictive power.
Overfitting occurs when the model fits both the noise and the signal in the sample data.
A high \(R^2\) value does not guarantee that the model will generalize well to new data.

Controlling for Other Variables in MLR

When we hold \(X_2\) constant (i.e., look at a subpopulation with similar \(X_2\) values), we can examine the effect of changes in \(X_1\) on \(Y\).

Overall F-test

The overall F-test evaluates whether the model is statistically significant.
Hypotheses:
- \(H_0\): All slopes are equal to 0 (i.e., \(R^2 = 0\)).
- \(H_a\): At least one slope is not zero.
F-statistic formula: \[ F = \frac{\text{SS}_{\text{regression}} / k}{\text{SS}_{\text{residual}} / (n - k - 1)} = \frac{\text{MS}_{\text{regression}}}{\text{MS}_{\text{error}}} \]
This test can also be expressed in terms of \(R^2\).

Partial F-test (Extra Sum of Squares)

The partial F-test is used when a subset of \(k\) predictors does not have statistically significant slopes. In other words, it examines a reduced model with \(g\) of the \(k\) predictors.
Hypotheses:
- \(H_0\): The extra \(k-g\) predictors do not contribute. (All their slopes are zero.)
- \(H_a\): At least one of the extra predictors has a nonzero slope.
This test simultaneously evaluates whether additional predictors improve the model.

Example: Partial F-test Using Extra Sum of Squares

Full model:
\(\log(\text{cost}) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \beta_4 X_4 + \beta_5 X_5 + \beta_6 X_6\)

Source	DF	SS	MS	F	p-value
Model	6	1532	255	164	<0.0001
Error	778	1208	1.55
Total	784	2740

Reduced model:
\(\log(\text{cost}) = \beta_0 + \beta_3 X_3 + \beta_5 X_5 + \beta_6 X_6\)

Source	DF	SS	MS	F	p-value
Model	3	1523	508	326	<0.0001
Error	781	1217	1.56
Total	784	2740

Hypotheses:

\(H_0\): The extra coefficients are all zero (e.g., \(\beta_1 = \beta_2 = \beta_4 = 0\)).
\(H_a\): At least one of the extra coefficients is nonzero.

Partial F-test comparison:

Model	Error DF	SSE	MSE	F	p-value
Model	3	9	3	1.94	0.1216
Full (Error)	778	1208	1.55
Reduced (Total)	781	1217

Partial F-test results:

Critical value (\(\alpha = 0.05\)): \(F_{0.05, 3, 778} = 2.62\)
Observed statistic: \(F = 1.94\)
Right-tail p-value: \(p = 0.1216\)
Decision: Since \(F_{\text{obs}} < F_{\text{crit}}\) and \(p > 0.05\), fail to reject \(H_0\).

Interpretation:
There is not enough evidence to suggest that the extra predictors (e.g., \(X_1\), \(X_2\), \(X_4\)) explain a significant additional portion of variability in \(\log(\text{cost})\) beyond the reduced model.

**Partial F-test.** F-distribution with \(df_1=3\) and \(df_2=778\) for the partial F-test. The vertical dashed line marks the observed statistic (\(F = 1.94\)), and the shaded region shows the right-tail p-value. For \(\alpha = 0.05\), the critical value is \(F_{0.05, 3, 778} \approx 2.62\).

Significance Tests for Each Predictor (t-tests)

Hypotheses:
- \(H_0: \beta_i = 0\) (the predictor has no effect).
- \(H_a: \beta_i \neq 0\) (the predictor has an effect).
Test statistic: \[ t = \frac{b_i}{SE_{b_i}}, \quad df = n - k - 1 \]
Standard error calculation: \[ SE = \frac{\hat{\sigma}}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2}}, \quad \hat{\sigma} = \sqrt{\frac{\text{SSR}}{n - k - 1}} \]

Prediction in MLR

Predicted values in MLR are the same as estimated means.
To obtain confidence intervals for the mean or prediction intervals for individual predictions, statistical software must be used because all variables in the model must be specified.

Least Squares Estimates and Standard Errors

The least squares estimates of the \(\beta\)s appear in the coefficient column of the parameter estimate table.
The estimate of \(\sigma^2\) is calculated as: \[ \frac{\text{sum of squared residuals (SSE)}}{\text{df} (n-p)} = \text{MSE} \]
The square root of MSE gives the residual standard error (or RMSE).

Testing Intercept and Slope Differences in Regression Models (Bat Echolocation Example)

Question of Interest 1: Equality of Slopes Across All Groups

Question: Are the slopes equal across all three groups—birds, echolocating bats (Ebat), and non-echolocating bats (NEbat)—after accounting for body size?

Approach:
We use an Extra Sum of Squares F-test. If slopes are equal (holding body mass constant), the estimated difference in energy expenditure should be the same across groups.

Full model: \[ \begin{aligned} \mu(\text{lenergy} \mid \text{lmass}, \text{TYPE}) &= \beta_0 + \beta_2 \,\text{bird} + \beta_3 \,\text{Ebat} + \beta_1 \,\text{lmass} \\ &\quad + \beta_4 (\text{lmass} \times \text{bird}) + \beta_5 (\text{lmass} \times \text{Ebat}) \end{aligned} \]

Alternative expression: \[ \mu(\text{lenergy} \mid \text{lmass}, \text{TYPE}) = (\beta_0 + \beta_2 \,\text{bird} + \beta_3 \,\text{Ebat}) + (\beta_1 + \beta_4 \,\text{bird} + \beta_5 \,\text{Ebat}) \,\text{lmass} \]

Group-specific models: \[ \begin{aligned} \text{NEbat:} & \quad \mu = \beta_0 + \beta_1\,\text{lmass} &\quad &\text{slope: } \beta_1 \\ \text{Ebat:} & \quad \mu = (\beta_0 + \beta_3) + (\beta_1 + \beta_5)\,\text{lmass} &\quad &\text{slope: } \beta_1 + \beta_5 \\ \text{bird:} & \quad \mu = (\beta_0 + \beta_2) + (\beta_1 + \beta_4)\,\text{lmass} &\quad &\text{slope: } \beta_1 + \beta_4 \end{aligned} \]

Hypotheses:
\[ \begin{aligned} H_0 &: \text{The slopes are equal} &&\Leftrightarrow \beta_1 = \beta_1 + \beta_5 = \beta_1 + \beta_4 \\ & &&\implies \beta_4 = \beta_5 = 0 \quad\text{(lines are parallel)} \\ H_a &: \text{The slopes are not all equal} &&\Leftrightarrow \beta_4 \text{ and } \beta_5 \text{ are not both zero} \end{aligned} \]

Extra Sum of Squares F-test

The tables below show the full model, reduced model, and Extra Sum of Squares results used to test whether slopes differ among birds, Ebats, and NEbats.

Full Model ANOVA Table

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	5	29.4699	5.8940	163.44	<.0001
Error	14	0.5049	0.0361
Corrected Total	19	29.9748

Reduced Model ANOVA Table

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	3	29.4215	9.8072	283.59	<.0001
Error	16	0.5533	0.0346
Corrected Total	19	29.9748

Extra Sum of Squares Table

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	2	0.0484	0.0242	0.6704	0.5272
Error (Full)	14	0.5049	0.0361
Total (Reduced)	16	0.5533

Interpretation: From the output, there is not enough evidence to suggest that the lines are not parallel and have different slopes, p-value = 0.5272 from the Extra Sum of Squares F-test.

Question of Interest 2: Equality of Intercepts for Bat Types

Question: Are the intercepts for Ebat and NEbat different after knowing that the slopes are equal?

Hypotheses:
\[ \begin{aligned} H_0 &: \text{The intercepts are equal} &&\Leftrightarrow \beta_3 = 0 \quad \text{(no difference between NEbat and Ebat)} \\ H_a &: \text{The intercepts are not all equal} &&\Leftrightarrow \beta_3 \neq 0 \end{aligned} \]

Equivalent form of \(H_0\) in intercept terms: \[ \begin{aligned} &\text{If the intercept for Ebat is equal to that of NEbat,} \\ &\text{then } \beta_0 + \beta_3 = \beta_0 \quad \text{or} \quad \beta_3 = 0 \end{aligned} \]

Parameter Estimate Table

Parameter	Estimate	Standard Error	t Value	Pr >
Intercept	-1.5764	0.2872	-5.49	<.0001
lmass	0.8150	0.0445	18.30	<.0001
Type Ebat	0.0787	0.2027	0.39	0.7030
Type bird	0.1023	0.1142	0.90	0.3837
Type NEbat	0.0000	.	.	.

Interpretation:
Data are consistent with the hypothesis of equal intercepts between Ebat and NEbat after accounting for mass (p-value = 0.7030).

Variance of the Sum and Difference of Random Variables

The variance of a linear combination of two random variables \(X\) and \(Y\) is:

\[ \text{Var}(aX + bY) = a^2 \,\text{Var}(X) + b^2 \,\text{Var}(Y) + 2ab\,\text{Cov}(X, Y) \]

Covariance (\(\text{Cov}\)) measures how two variables move together: positive (\(+\)), none (\(0\)), or negative (\(-\)).

Example with Regression Coefficients

Let \(\beta_1\) and \(\beta_2\) be regression estimates, treated as random variables.
We obtain their variances by squaring the \(\text{SE}\) values from the parameter estimate table.

Sum: \[ \begin{aligned} \text{Var}(\beta_1 + \beta_2) &= 1^2 \, \text{Var}(\beta_1) + 1^2 \, \text{Var}(\beta_2) + 2(1)(1)\text{Cov}(\beta_1, \beta_2) \\ &= \text{Var}(\beta_1) + \text{Var}(\beta_2) + 2 \, \text{Cov}(\beta_1, \beta_2) \\ \end{aligned} \]

Difference: \[ \begin{aligned} \text{Var}(\beta_1 - \beta_2) &= 1^2 \, \text{Var}(\beta_1) + (-1)^2 \, \text{Var}(\beta_2) + 2(1)(-1)\text{Cov}(\beta_1, \beta_2) \\ &= \text{Var}(\beta_1) + \text{Var}(\beta_2) - 2 \, \text{Cov}(\beta_1, \beta_2) \end{aligned} \]

We will use the difference formula for the next question of interest.

Question of Interest 3: Confidence Interval for the Difference in Intercepts (Birds vs. Ebats)

We want a 95% confidence interval for the difference in intercepts between birds and Ebats after accounting for mass.

Model equation:
\[ \text{lenergy} = \beta_0 + \beta_1\,\text{lmass} + \beta_2\,\text{bird} + \beta_3\,\text{Ebat} \]

Modified regression equations:

Birds: \(\mu = (\beta_0 + \beta_2) + \beta_1\,\text{lmass}\)
Ebats: \(\mu = (\beta_0 + \beta_3) + \beta_1\,\text{lmass}\)

Difference in intercepts:
\[ (\beta_0 + \beta_2) - (\beta_0 + \beta_3) = \beta_2 - \beta_3 \]

95% confidence interval formula: \[ \begin{aligned} &\hat{\beta}_2 - \hat{\beta}_3 \pm t_{0.975,\, 16} \times \text{SE}(\hat{\beta}_2 - \hat{\beta}_3) \\ &\hat{\beta}_2 - \hat{\beta}_3 \pm 2.12 \times 0.155 \end{aligned} \]

Finding the standard error:

From the parameter estimates table: \[ \text{Var}(\hat{\beta}_2) = 0.0130, \quad \text{Var}(\hat{\beta}_3) = 0.0410, \quad \text{Cov}(\hat{\beta}_2, \hat{\beta}_3) = 0.0150 \]

Apply the difference formula: \[ \begin{aligned} \text{Var}(\hat{\beta}_2 - \hat{\beta}_3) &= (1)^2 \, \text{Var}(\hat{\beta}_2) + (-1)^2 \, \text{Var}(\hat{\beta}_3) + 2(1)(-1)\text{Cov}(\hat{\beta}_2, \hat{\beta}_3) \\ &= \text{Var}(\hat{\beta}_2) + \text{Var}(\hat{\beta}_3) - 2 \, \text{Cov}(\hat{\beta}_2, \hat{\beta}_3) \\ &= 0.013 + 0.041 - 2(0.015) = 0.024 \end{aligned} \]

Thus: \[ \text{SE}(\hat{\beta}_2 - \hat{\beta}_3) = \sqrt{\text{Var}(\hat{\beta}_2 - \hat{\beta}_3)} = \sqrt{0.024} = 0.1549 \]

Final 95% CI:
\[ \begin{aligned} \hat{\beta}_2 - \hat{\beta}_3 &\pm 2.12 \times 0.1549 \\ (0.1023 - 0.0787) &\pm 2.12 \times 0.1549 = (−0.3048, 0.3520) \end{aligned} \]

Interpretation:
The 95% CI includes zero, so there is insufficient evidence to conclude a difference in intercepts between birds and Ebats after adjusting for body mass.

Alternative Comparison: Birds vs. NEbats

Under the current model coding, Type NEbat is the reference group.
This means the parameter estimates for Type bird and Type Ebat are interpreted relative to NEbats.
If the goal were to compare birds and Ebats directly (as in QOI 3), the model would need to be refit with Type Ebat set as the reference group.

Another way to find the difference in intercepts is to note that in the model:

\[ \mu(\text{lenergy} \mid \text{lmass}, \text{TYPE}) = \beta_0 + \beta_1\,\text{lmass} + \beta_2\,\text{bird} + \beta_3\,\text{Ebat} \]

The difference in intercepts between birds and the reference group (NEbat) is simply:

\[ (\beta_0 + \beta_2) - \beta_0 = \beta_2 \]

So you can use the parameter estimate \(\hat{\beta}_2\) directly from the table.

Parameter Estimate Table

Parameter	Estimate	Standard Error	t Value	Pr >
Intercept	-1.5764	0.2872	-5.49	<.0001
lmass	0.8150	0.0445	18.30	<.0001
Type bird	0.1023	0.1142	0.90	0.3837
Type Ebat	0.0787	0.2027	0.39	0.7030
Type NEbat	0.0000	.	.	.

95% Confidence interval calculation:
Since we are assuming the lines are parallel, we can use \(\hat{\beta}_2\) and its standard error to form the CI.

The 95% CI is:
\[ 0.1023 \pm 2.12 \times 0.1142 = (-0.139, 0.344) \] Interpretation: Since the interval contains zero, there is no significant difference in mean energy between birds and NEbats after accounting for mass.