Logistic Regression

Objectives

  • Understand how logistic regression models a binary response.
  • Understand the structure and purpose of the simple logistic regression model.
  • Learn how to interpret regression coefficients for both numeric and categorical predictors.
  • Revisit the connection between logistic regression and 2×2 contingency tables.

Simple Logistic Regression Model

Classification

  • The main distinction from linear regression: the response variable is categorical
    • Binary classification examples:
      • Cancer or not
      • Clicked advertisement or not
      • Defaulted on payment or not

Predictive Models

  • Continuous response: \(Y = f(x)\)
  • Categorical response: \(P(Y = \text{Default} \mid X) = f(X)\)
  • With categorical responses, we predict the probability of the response, not the value itself.
  • We often write \(f(X)\) as \(p(X)\) to emphasize that it’s a probability given predictors \(X\).
  • This is still a function—it just describes how the probability changes with the predictors rather than the response directly.

The Linearity Problem

  • A linear equation for \(p(X)\) doesn’t make sense:
    • Linear functions range over \((-\infty, \infty)\).
    • But probabilities must stay between 0 and 1.

Visualizing \(p(X)\)

  • One way to understand how \(p(X)\) behaves:
    • Bin a continuous predictor into categories.
    • For each bin, compute the proportion of the response.

The Path to Linearity

  • Scales and transformations—we apply transformations to make the model linear on a new scale.
    • \(p(X)\): range \((0,1)\)
    • Odds \(= \frac{p(X)}{1 - p(X)}\): range \((0, \infty)\)
    • Log odds (logit) \(= \ln\left(\frac{p(X)}{1 - p(X)}\right)\): range \((-\infty, \infty)\)
  • The logit function maps probabilities to the full real number line.
  • This transformation enables linear modeling, because probabilities on the logit scale tend to be more linear with the predictor.

Model Statement

  • Logistic regression models the log odds as a linear function of \(X\):

    \[ \ln\left(\frac{p(X)}{1 - p(X)}\right) = \beta_0 + \beta_1 X \]

Parameter Estimation

Multiple Linear Regression (MLR)

  • Coefficient estimates minimize the residual sum of squares.
  • Closed-form solution exists via matrix algebra.

Logistic Regression

  • Coefficient estimates minimize the log loss (cross-entropy) error function.
  • No closed-form solution, i.e., cannot use matrix multiplication to find the answer.
  • Optimization (minimizing the log loss) is done numerically.

Hypothesis Testing

  • To understand the relationship between a predictor and the response, test the slope coefficient:
    • \(H_0\): \(\beta_1 = 0\)
    • \(H_a\): \(\beta_1 \ne 0\)
  • Test statistic (z-statistic): \(Z = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}\)
  • 95% confidence interval: \(\hat{\beta}_1 \pm z_{\alpha/2} \cdot SE(\hat{\beta}_1)\)
  • The null distribution is standard normal: \(Z \sim N(0, 1)\)
  • Some software reports a chi-squared test instead of a z-test.
    • It squares the z-statistic: \(Z^2\)
    • Then uses a \(\chi^2\) distribution with 1 degree of freedom to compute the p-value.
    • This gives the same p-value as the two-sided z-test.

Predicted Probabilities

  • Logistic regression estimates the log odds, but we often want \(p(X)\): \[ \ln\left(\frac{p(X)}{1 - p(X)}\right) = \beta_0 + \beta_1 X \] \[ \frac{p(X)}{1 - p(X)} = e^{\beta_0 + \beta_1 X} \] \[ p(X) = \frac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} \]

Interpreting Coefficients: Continuous Predictors

Odds and Odds Ratios

  • Logistic regression coefficients have good interpretational value.
  • Logistic regression coefficients are interpreted using odds ratios.
  • Whether the predictor is numeric or categorical determines the interpretation.
  • For a continuous predictor \(X\), the odds of the event occurring are:
    \[ \text{Odds}(X) = \frac{p(X)}{1 - p(X)} = e^{\beta_0 + \beta_1X} \]

Example:

  • \(\text{Odds}(X = 500) = e^{\beta_0 + \beta_1 \cdot 500}\)
  • \(\text{Odds}(X = 501) = e^{\beta_0 + \beta_1 \cdot 501}\)
  • The odds ratio (OR) for a 1-unit increase in \(X\) is:
    \[ \frac{e^{\beta_0 + \beta_1 \cdot 501}}{e^{\beta_0 + \beta_1 \cdot 500}} = e^{\beta_1} \]
  • Interpretation: For every 1-unit increase in \(X\), the odds of the event occurring change by a factor of \(e^{\beta_1}\).

Generalized Interpretation

  • This interpretation does not depend on the specific value of \(X\).
  • For a \(\Delta\)-unit increase in \(X\), the odds change by a factor of \(e^{\Delta \cdot \beta_1}\).
  • Example:
    • A $500 increase in balance increases the odds of default by a factor of 15.64.
    • That is, the odds of defaulting with a $1,000 balance are 15.64 times higher than with a $500 balance.

Tips

  • Odds ratios describe relative, not absolute, changes in risk.
  • Use predicted probabilities to assess absolute likelihood.
  • For clear communication, report odds ratios using meaningful values of \(X\) and real-world increments.

Interpreting Coefficients: Categorical Predictors

Dummy Variables

  • Categorical variables are dummytized (converted into binary indicators).
  • One coefficient is estimated for each level of the categorical variable, minus one.
  • The omitted level is the reference category, and its effect is absorbed into the intercept.
  • Coefficients represent odds ratios relative to this reference group.

Example: Student Status

  • Predictor variable: student
    • Yes (1), No (0)
  • Odds if student: \(e^{\beta_0 + \beta_1 \cdot 1}\)
  • Odds if not a student: \(e^{\beta_0 + \beta_1 \cdot 0} = e^{\beta_0}\)
  • Odds ratio (OR):
    \[ \text{OR} = \frac{e^{\beta_0 + \beta_1}}{e^{\beta_0}} = e^{\beta_1} \]
  • Interpretation: The odds of defaulting for a student are \(e^{\beta_1}\) times higher (or lower) than for a nonstudent.

Confidence Intervals for Odds Ratios

  • 95% confidence interval for \(\beta_1\):
    \[ \hat{\beta}_1 \pm z_{\alpha/2} \cdot SE(\hat{\beta}_1) \]
  • Exponentiate the interval endpoints to get a 95% confidence interval for the odds ratio:
    \[ \left( e^{\hat{\beta}_1 - z_{\alpha/2} \cdot SE(\hat{\beta}_1)},\ e^{\hat{\beta}_1 + z_{\alpha/2} \cdot SE(\hat{\beta}_1)} \right) \]
  • Report the odds ratio and its confidence interval; interpretation is typically given for the point estimate (coefficient). Use the interval to communicate sampling error, without a lengthy worded interpretation.

General Workflow and Considerations

Assumptions

  • Observations are independent.
  • The log odds of the outcome are linearly related to continuous predictors.
  • No important confounding variables have been omitted.

Assessing Model Fit

  • Traditional residual plots are less informative with binary outcomes (variance changes are expected).
  • Instead:
    • Compare predicted probabilities to observed proportions.
      • Do the estimated effects align with what you saw during EDA?
    • Use the Hosmer-Lemeshow test:
      • \(H_0\): model fits well
      • \(H_a\): model does not fit well

If the Fit Is Poor

  • Unusual trends in EDA may signal that the linearity of log odds assumption is violated.
    • Try adding polynomial terms or interaction effects.
  • A rejected Hosmer-Lemeshow test does not always imply a problem with poor model fit, especially with large samples.

Looking Ahead: Multiple Logistic Regression

  • Like multiple linear regression, most applications of logistic regression involve multiple predictors.
  • Key skills from this chapter:
    • Understanding the logit function/transformation.
    • Interpreting coefficients using odds ratios.
    • Using EDA to guide model building and interpretation.