Logistic Regression
Objectives
- Understand how logistic regression models a binary response.
- Understand the structure and purpose of the simple logistic regression model.
- Learn how to interpret regression coefficients for both numeric and categorical predictors.
- Revisit the connection between logistic regression and 2×2 contingency tables.
Simple Logistic Regression Model
Classification
- The main distinction from linear regression: the response variable is categorical
- Binary classification examples:
- Cancer or not
- Clicked advertisement or not
- Defaulted on payment or not
- Cancer or not
- Binary classification examples:
Predictive Models
- Continuous response: \(Y = f(x)\)
- Categorical response: \(P(Y = \text{Default} \mid X) = f(X)\)
- With categorical responses, we predict the probability of the response, not the value itself.
- We often write \(f(X)\) as \(p(X)\) to emphasize that it’s a probability given predictors \(X\).
- This is still a function—it just describes how the probability changes with the predictors rather than the response directly.
The Linearity Problem
- A linear equation for \(p(X)\) doesn’t make sense:
- Linear functions range over \((-\infty, \infty)\).
- But probabilities must stay between 0 and 1.
Visualizing \(p(X)\)
- One way to understand how \(p(X)\) behaves:
- Bin a continuous predictor into categories.
- For each bin, compute the proportion of the response.
- Bin a continuous predictor into categories.
The Path to Linearity
- Scales and transformations—we apply transformations to make the model linear on a new scale.
- \(p(X)\): range \((0,1)\)
- Odds \(= \frac{p(X)}{1 - p(X)}\): range \((0, \infty)\)
- Log odds (logit) \(= \ln\left(\frac{p(X)}{1 - p(X)}\right)\): range \((-\infty, \infty)\)
- \(p(X)\): range \((0,1)\)
- The logit function maps probabilities to the full real number line.
- This transformation enables linear modeling, because probabilities on the logit scale tend to be more linear with the predictor.
Model Statement
Logistic regression models the log odds as a linear function of \(X\):
\[ \ln\left(\frac{p(X)}{1 - p(X)}\right) = \beta_0 + \beta_1 X \]
Parameter Estimation
Multiple Linear Regression (MLR)
- Coefficient estimates minimize the residual sum of squares.
- Closed-form solution exists via matrix algebra.
Logistic Regression
- Coefficient estimates minimize the log loss (cross-entropy) error function.
- No closed-form solution, i.e., cannot use matrix multiplication to find the answer.
- Optimization (minimizing the log loss) is done numerically.
Hypothesis Testing
- To understand the relationship between a predictor and the response, test the slope coefficient:
- \(H_0\): \(\beta_1 = 0\)
- \(H_a\): \(\beta_1 \ne 0\)
- \(H_0\): \(\beta_1 = 0\)
- Test statistic (z-statistic): \(Z = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}\)
- 95% confidence interval: \(\hat{\beta}_1 \pm z_{\alpha/2} \cdot SE(\hat{\beta}_1)\)
- The null distribution is standard normal: \(Z \sim N(0, 1)\)
- Some software reports a chi-squared test instead of a z-test.
- It squares the z-statistic: \(Z^2\)
- Then uses a \(\chi^2\) distribution with 1 degree of freedom to compute the p-value.
- This gives the same p-value as the two-sided z-test.
Predicted Probabilities
- Logistic regression estimates the log odds, but we often want \(p(X)\): \[ \ln\left(\frac{p(X)}{1 - p(X)}\right) = \beta_0 + \beta_1 X \] \[ \frac{p(X)}{1 - p(X)} = e^{\beta_0 + \beta_1 X} \] \[ p(X) = \frac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} \]
Interpreting Coefficients: Continuous Predictors
Odds and Odds Ratios
- Logistic regression coefficients have good interpretational value.
- Logistic regression coefficients are interpreted using odds ratios.
- Whether the predictor is numeric or categorical determines the interpretation.
- For a continuous predictor \(X\), the odds of the event occurring are:
\[ \text{Odds}(X) = \frac{p(X)}{1 - p(X)} = e^{\beta_0 + \beta_1X} \]
Example:
- \(\text{Odds}(X = 500) = e^{\beta_0 + \beta_1 \cdot 500}\)
- \(\text{Odds}(X = 501) = e^{\beta_0 + \beta_1 \cdot 501}\)
- The odds ratio (OR) for a 1-unit increase in \(X\) is:
\[ \frac{e^{\beta_0 + \beta_1 \cdot 501}}{e^{\beta_0 + \beta_1 \cdot 500}} = e^{\beta_1} \] - Interpretation: For every 1-unit increase in \(X\), the odds of the event occurring change by a factor of \(e^{\beta_1}\).
Generalized Interpretation
- This interpretation does not depend on the specific value of \(X\).
- For a \(\Delta\)-unit increase in \(X\), the odds change by a factor of \(e^{\Delta \cdot \beta_1}\).
- Example:
- A $500 increase in balance increases the odds of default by a factor of 15.64.
- That is, the odds of defaulting with a $1,000 balance are 15.64 times higher than with a $500 balance.
- A $500 increase in balance increases the odds of default by a factor of 15.64.
Tips
- Odds ratios describe relative, not absolute, changes in risk.
- Use predicted probabilities to assess absolute likelihood.
- For clear communication, report odds ratios using meaningful values of \(X\) and real-world increments.
Interpreting Coefficients: Categorical Predictors
Dummy Variables
- Categorical variables are dummytized (converted into binary indicators).
- One coefficient is estimated for each level of the categorical variable, minus one.
- The omitted level is the reference category, and its effect is absorbed into the intercept.
- Coefficients represent odds ratios relative to this reference group.
Example: Student Status
- Predictor variable:
student- Yes (1), No (0)
- Odds if student: \(e^{\beta_0 + \beta_1 \cdot 1}\)
- Odds if not a student: \(e^{\beta_0 + \beta_1 \cdot 0} = e^{\beta_0}\)
- Odds ratio (OR):
\[ \text{OR} = \frac{e^{\beta_0 + \beta_1}}{e^{\beta_0}} = e^{\beta_1} \] - Interpretation: The odds of defaulting for a student are \(e^{\beta_1}\) times higher (or lower) than for a nonstudent.
Confidence Intervals for Odds Ratios
- 95% confidence interval for \(\beta_1\):
\[ \hat{\beta}_1 \pm z_{\alpha/2} \cdot SE(\hat{\beta}_1) \] - Exponentiate the interval endpoints to get a 95% confidence interval for the odds ratio:
\[ \left( e^{\hat{\beta}_1 - z_{\alpha/2} \cdot SE(\hat{\beta}_1)},\ e^{\hat{\beta}_1 + z_{\alpha/2} \cdot SE(\hat{\beta}_1)} \right) \] - Report the odds ratio and its confidence interval; interpretation is typically given for the point estimate (coefficient). Use the interval to communicate sampling error, without a lengthy worded interpretation.
General Workflow and Considerations
Assumptions
- Observations are independent.
- The log odds of the outcome are linearly related to continuous predictors.
- No important confounding variables have been omitted.
Assessing Model Fit
- Traditional residual plots are less informative with binary outcomes (variance changes are expected).
- Instead:
- Compare predicted probabilities to observed proportions.
- Do the estimated effects align with what you saw during EDA?
- Use the Hosmer-Lemeshow test:
- \(H_0\): model fits well
- \(H_a\): model does not fit well
- \(H_0\): model fits well
- Compare predicted probabilities to observed proportions.
If the Fit Is Poor
- Unusual trends in EDA may signal that the linearity of log odds assumption is violated.
- Try adding polynomial terms or interaction effects.
- Try adding polynomial terms or interaction effects.
- A rejected Hosmer-Lemeshow test does not always imply a problem with poor model fit, especially with large samples.
Recommended Workflow
- EDA
- Use LOESS curves for continuous predictors.
- Use mosaic plots or proportion comparisons for categorical predictors.
- Use LOESS curves for continuous predictors.
- Fit the model
- Assess model fit using diagnostic tests and visual checks.
- Assess model fit using diagnostic tests and visual checks.
- Inference
- Use a z-test to assess whether coefficients are significantly different from zero.
- Interpret coefficients using odds ratios.
- Report confidence intervals for the odds ratios.
- Choose a sensible unit change based on the variable’s scale and context.
- Use a z-test to assess whether coefficients are significantly different from zero.
Looking Ahead: Multiple Logistic Regression
- Like multiple linear regression, most applications of logistic regression involve multiple predictors.
- Key skills from this chapter:
- Understanding the logit function/transformation.
- Interpreting coefficients using odds ratios.
- Using EDA to guide model building and interpretation.