flowchart TD A["Linear models<br>(constant variance,<br>normality assumptions)"] --> B["Errors are<br>independent"] B --> D["*t*-tests"] B --> E["ANOVA"] B --> F["SLR"] A --> C["Errors are<br>correlated"] C --> G["Time series"] C --> H["Repeated measures"] %% Arrow styling linkStyle default stroke:#333,stroke-width:1.5px; %% Node styling style A fill:#d1c4e9,stroke:#512da8,stroke-width:1.25px,color:#000 style B fill:#bbdefb,stroke:#1976d2,stroke-width:1.25px,color:#000 style C fill:#ffcdd2,stroke:#c62828,stroke-width:1.25px,color:#000 style D fill:#e3f2fd,stroke:#64b5f6,stroke-width:1.25px,color:#000 style E fill:#e3f2fd,stroke:#64b5f6,stroke-width:1.25px,color:#000 style F fill:#e3f2fd,stroke:#64b5f6,stroke-width:1.25px,color:#000 style G fill:#ffebee,stroke:#ef5350,stroke-width:1.25px,color:#000 style H fill:#ffebee,stroke:#ef5350,stroke-width:1.25px,color:#000
Repeated Measures
Objectives
This chapter explores repeated measures, which allow us to relax the independence assumption.
- Understand when repeated measures are appropriate.
- Learn about correlation structures, how to visualize them, and select the appropriate structure.
- Explore generalized least squares (GLS) and weighted least squares (WLS).
- Follow a repeated measures workflow to structure analysis.
What is Repeated Measures?
Family Tree of Linear Models
Repeated Measures
- Linear models assume constant variance and normality with independent errors.
- Examples: t-tests, ANOVA, simple linear regression (SLR)
- When errors are correlated:
- Examples: time series, repeated measures
- Ignoring correlation can produce misleading p-values and confidence intervals.
- We need to model the correlation among errors (i.e., include additional parameters in the model) for valid inference and better predictions.
- Examples: time series, repeated measures
- Repeated measures refers to multiple measurements on the same subject (i.e., dependent observations).
- For example, if a subject starts with high (or conversely low) values, their measurements tend to remain high (or low) over time.
Common Hypotheses in Repeated Measures:
- Compare every time point to a baseline (i.e., control), for each group (e.g., asymptomatic vs. symptomatic).
- Compare each time point between groups. (Baseline may not differ, but other time points might.)
Regression Perspective
Treat time as a categorical variable because measurements are made at discrete time points:
\[ Y = \text{Time} + \text{Status} + \text{Time} \times \text{Status} + \varepsilon \]
- Include an interaction since trends depend on status.
- Regression and contrasts help compare groups—we just need to model correlated errors.
Identifying Repeated Measures
- Ask how the data were collected.
- Repeated measures can apply to:
- Repeated measures one-way ANOVA
- Repeated measures MLR
Correlation Structures
Assessing Error Dependence
- Some methods include residual diagnostic tools to visualize correlated errors.
- Similar tools exist for repeated measures, but they aren’t as commonly used.
- The key idea is to explain how residuals are correlated using a correlation structure.
Correlogram (pseudo-code approach)
- Obtain residuals from a regular MLR (some from the same subject).
- For residuals that are one unit apart in time:
- Create a scatterplot (earlier time point on the \(x\)-axis, later time point on the \(y\)-axis).
- Compute the correlation, and store the result.
- Repeat for residuals that are two, three, four, \(\dots, k\) units apart.
- Plot the correlation values against the time lag.
Takeaway
Correlation tends to decrease as the distance between residuals increases and eventually levels off. Large correlations at short time lags are a clear indication of correlated errors.
Correlogram Interpretation
- Observations closer in time are more similar (i.e., more correlated).
- Observations farther apart in time may still be mildly correlated.
Common Correlation Structures
Correlation structures are theoretical models that describe the expected trend in the correlogram.
Compound symmetry (CS)
- Correlation is constant (i.e., flat, horizontal line) regardless of time.
- Often used when time ordering is not meaningful (e.g., students within the same school).
Autoregressive (AR(1))
- High correlation for nearby time points.
- Correlation decreases as time lag increases.
- Eventually approaches zero (like independent errors).
Gaussian
- Similar to AR(1), but the drop-off in correlation is even faster.
- Often used when time is continuous.
Generalized Least Squares (GLS)
- GLS generalizes OLS technique for MLR to handle correlated errors.
- You specify:
- A model with response and predictors
- A correlation structure (parameters estimated by software)
- GLS updates both:
- Regression coefficients
- Standard errors
- These estimates are more reliable than standard MLR if the chosen correlation structure is approximately correct.
Estimating Correlation in Practice
- Variograms and semivariograms (often used in spatial models) can be used as visual alternatives to correlograms.
- In practice, repeated measures datasets may be too small to visualize correlation clearly through the high variability.
- Analysts typically:
- Use theoretical justification for choosing a structure.
- Fit multiple structures and compare using AIC (like feature selection).
- Use correlograms as a visual guide.
- Instructor note (Turner’s experience):
- Compound symmetry works well for biological or human-based data.
- Use structures with rapid decay (e.g., Gaussian) sparingly unless time points are equally spaced and numerous.
- Compound symmetry works well for biological or human-based data.
Repeated Measures Workflow
- Identify that you are in a repeated measures setting.
- Perform EDA based on the equivalent MLR framework (determine the general theme: ANOVA, SLR, or MLR):
- Use boxplots or mean plots in ANOVA-style settings.
- Use scatterplots of predictor vs. response in SLR-style settings.
- Combine approaches for general MLR settings.
- Apply feature selection tools from MLR to assess model complexity and the bias-variance trade-off.
- Residual diagnostics:
- Check for normality and constant variance using residual plots.
- Update MLR to GLS:
- Try multiple correlation structures.
- Use AIC to select the best-fitting model.
- Optionally use a correlogram to guide your choice.
- Conduct inference on regression coefficients:
- Report p-values and confidence intervals.
- Results will be more trustworthy when correlation is accounted for.
Additional Notes on Correlation Structures
- Some correlation structures require a time component:
- Autoregressive (AR; exponential decay)
- Gaussian
- Linear
- Spherical
- Matérn (A general class that includes AR and Gaussian as special cases; more flexible but may overfit.)
- Some correlation structures do not require a time component:
- Compound symmetry (CS)
- Variance components
- Examples that don’t involve time:
- Texas STAR exam: multiple test scores from the same schools/school districts
- Hereditary studies: family studies where siblings or parents/offspring are clustered
Technical Details
Matrix Representation of MLR
\[ Y = X\beta + \varepsilon \]
where:
- \(Y\) is the response vector.
- \(X\) is the design matrix.
- \(\beta\) is the regression coefficient vector.
- \(\varepsilon\) is the error vector.
The model assumes properties of the errors:
\[ \varepsilon \sim \mathcal{N}(0, \sigma^2 I) \]
that is:
- \(\varepsilon\) is multivariate normal
- where \(\mathcal{N}\) denotes the multivariate normal distribution (MVN)
- Mean vector = \(0\), a \(n x 1\) matrix of \(0\)s
- Covariance matrix = \(\sigma^2 I\)
The covariance matrix:
\[ \Sigma = \begin{bmatrix} \sigma^2 & 0 & 0 \\ 0 & \sigma^2 & 0 \\ 0 & 0 & \sigma^2 \\ \end{bmatrix} = \sigma^2I \]
Interpretation:
- Each individual error has the same variance \(\sigma^2\)
- Traditional multiple linear regression (MLR) assumptions:
- Errors are independent, which implies they are also uncorrelated, so all off-diagonal covariances are 0.
Visualizing the Correlation Structure in \(\sigma^2 R\)
We can visualize the error term in repeated measures models as:
\[ Y = X\beta + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 R) \]
Each row in \(\varepsilon\) corresponds to a specific subject and time point:
| Subject | Time | Error Term |
|---|---|---|
| 1 | T1 | \(\varepsilon_1\) |
| 1 | T2 | \(\varepsilon_2\) |
| 1 | T3 | \(\varepsilon_3\) |
| 2 | T1 | \(\varepsilon_4\) |
| 2 | T2 | \(\varepsilon_5\) |
| 2 | T3 | \(\varepsilon_6\) |
The corresponding variance–covariance matrix \(\sigma^2 R\) has a block-diagonal structure and might look like this:
\[ \sigma^2 R = \sigma^2 \begin{bmatrix} 1 & \rho_{12} & \rho_{13} & 0 & 0 & 0 \\ \rho_{12} & 1 & \rho_{23} & 0 & 0 & 0 \\ \rho_{13} & \rho_{23} & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & \rho_{45} & \rho_{46} \\ 0 & 0 & 0 & \rho_{45} & 1 & \rho_{56} \\ 0 & 0 & 0 & \rho_{46} & \rho_{56} & 1 \\ \end{bmatrix} \]
Key points:
- The upper-left 3×3 block corresponds to Subject 1 and correlations among \(\varepsilon_1\), \(\varepsilon_2\), \(\varepsilon_3\).
- The lower-right 3×3 block corresponds to Subject 2 and correlations among \(\varepsilon_4\), \(\varepsilon_5\), \(\varepsilon_6\)
- Off-diagonal correlations exist within subjects (e.g., \(\rho_{12}, \rho_{13}, \rho_{23}\) for subject 1).
- Between-subject correlations are zero, indicated by the block-diagonal structure.
- The values of the \(\rho\) terms are determined by the assumed correlation structure (e.g., compound symmetry, AR(1), etc.).
This block structure is how repeated measures models encode dependency within subjects but maintain independence between subjects.
Correlation Structure Matrices
Different correlation structures determine the form of the matrix \(R\) in the repeated measures model.
Autoregressive (AR(1))
This structure assumes correlation decreases with increasing time lag:
\[ \sigma^2 \begin{bmatrix} 1 & \rho & \rho^2 & \rho^3 \\ \rho & 1 & \rho & \rho^2 \\ \rho^2 & \rho & 1 & \rho \\ \rho^3 & \rho^2 & \rho & 1 \\ \end{bmatrix} \]
- High correlation for adjacent time points
- Correlation decays exponentially with time separation
Compound Symmetry (CS)
This structure assumes constant correlation across all time points:
\[ \sigma^2 \begin{bmatrix} 1 & \rho & \rho & \rho \\ \rho & 1 & \rho & \rho \\ \rho & \rho & 1 & \rho \\ \rho & \rho & \rho & 1 \\ \end{bmatrix} \]
- Often used when time ordering is less meaningful or measurements are equally spaced
- Simpler structure, fewer parameters
Gaussian
- Similar to AR(1), but correlation drops off more quickly
- Often used when time is continuous or spacing between time points varies
Spherical Power
- Same mathematical form as AR(1), just different context or terminology
- Useful when modeling spatial or continuous time-based correlation
Model Fitting
Fitting repeated measures models typically uses restricted maximum likelihood (REML), a technique that improves estimation of variance components.
During model fitting, the software estimates:
- \(\sigma^2 R\): the variance–covariance matrix of the errors
- \(\beta\): the regression coefficient vector
The estimate of \(\beta\) under GLS is:
\[ \hat{\beta} = \left(X^T R^{-1} X\right)^{-1} X^T R^{-1} Y \]
- This formula generalizes the OLS estimator
- When \(R = I\), it reduces to the standard OLS solution
Variance of the GLS Estimator
In addition to estimating the coefficients \(\hat{\beta}\), we can also estimate their variances using the expression:
\[ \widehat{\text{Var}}(\hat{\beta}) = \hat{\sigma}^2 \left( X^T R^{-1} X \right)^{-1} \]
This is the generalized version of the variance formula used in OLS.
- When \(R = I\), this reduces to the traditional multiple linear regression (MLR) variance estimator:
\[ \widehat{\text{Var}}(\hat{\beta}) = \hat{\sigma}^2 \left( X^T X \right)^{-1} \]
Thus, the MLR estimator is a special case of GLS where the correlation structure is identity (i.e., no correlation among errors).
Summary: MLR as a Special Case of GLS
The general linear model framework used in GLS relaxes the independence assumption of traditional MLR and allows for arbitrary correlation structures among errors (via the matrix \(R\)).
- MLR is a special case of GLS when the correlation structure is \(R = I\).
- The structure of \(R\) determines how residuals are related (e.g., repeated measurements, clustered observations).
- Under GLS, we estimate:
- Regression coefficients \(\hat{\beta}\)
- Standard errors of \(\hat{\beta}\)
- The correlation structure (via estimated parameters in \(R\))
Key idea: Accounting for correlation leads to valid statistical inference and improved prediction.
Weighted Least Squares (WLS)
GLS can also be applied when independence holds, but the assumption of constant variance is violated (i.e., heteroscedasticity).
In this case, the error covariance matrix no longer looks like \(\sigma^2 I\). Instead, each observation has its own variance:
\[ \Sigma = \begin{bmatrix} \sigma_1^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2 \\ \end{bmatrix} \]
- Each variance term \(\sigma_i^2\) is unique.
- Independence still holds (covariances are 0), but the variances are not constant across observations.
We can re-express each unique variance in terms of a common variance component and a weight:
\[ \sigma_1^2 = \frac{1}{w_1} \sigma^2,\quad \sigma_2^2 = \frac{1}{w_2} \sigma^2,\quad \sigma_3^2 = \frac{1}{w_3} \sigma^2 \]
Each \(w_i\) is a weight term that serves as a multiplicative factor determining how different the variance of each observation is from the baseline \(\sigma^2\).
Weighted Error Structure in Matrix Form
Using the weights, the variance–covariance matrix of the errors can be written as:
\[ \Sigma = \begin{bmatrix} \sigma_1^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2 \\ \end{bmatrix} = \begin{bmatrix} \frac{\sigma^2}{w_1} & 0 & 0 \\ 0 & \frac{\sigma^2}{w_2} & 0 \\ 0 & 0 & \frac{\sigma^2}{w_3} \\ \end{bmatrix} = \sigma^2 \begin{bmatrix} \frac{1}{w_1} & 0 & 0 \\ 0 & \frac{1}{w_2} & 0 \\ 0 & 0 & \frac{1}{w_3} \\ \end{bmatrix} = \sigma^2 R \]
- This is a special case of GLS where the correlation matrix \(R\) is diagonal.
- WLS is GLS with unequal variances but uncorrelated errors.
We can use GLS techniques to obtain good estimates of the regression coefficients and their standard errors under this structure.
Estimating Weights in Practice
You can only apply WLS if you know (or can estimate) the weights. There are numerous strategies for estimating the weights, but the basic approach follows a common pattern:
- Fit a regular regression model.
- Plot the absolute values of your residuals against:
- One of your predictors that you suspect may explain the changing variance, or
- The predicted values from your model
Look for a pattern or trend in the spread of the residuals.
- One of your predictors that you suspect may explain the changing variance, or
- Fit a regression model to the absolute value of the residuals to estimate the mean behavior (i.e., trend) of the variability.
- Use the predicted values from that model to compute weights:
\[ w_i = \frac{1}{(\widehat{\text{predicted}})^2} \]
Code
weight.model = lm(abs(model1$residuals) ~ data1$x)
wts = 1 / fitted(weight.model)^2
wls.model = lm(y ~ x, data1, weights = wts)Interpretation
- Observations with higher variance are down-weighted.
- The fitted line will more closely follow observations with lower variance.
- The biggest difference is in the standard error estimates of the regression coefficients.
When to Use WLS
- When transformations do not fix the constant variance problem.
- When an exotic transformation works but makes the model hard to interpret.
Additional Notes
- Raw residual vs. fitted plots may look the same for OLS and WLS.
- Use studentized residuals to check whether the variance has been stabilized.