This chapter explores fundamental matrix concepts, including operations, applications in summarizing data, and their role in multiple linear regression.
Review fundamental matrix operations and properties.
Understand special cases of matrix multiplication.
Apply matrix concepts to summarize multivariate data (e.g., multivariate normal distribution).
Explore how matrix algebra supports multiple linear regression (MLR).
Review of Matrix Operations
Symmetric Matrices
A symmetric matrix has the same number of rows and columns (i.e, a square matrix).
Vectors
A vector is a matrix with a single column.
Variables in vector format: \(X = (X_1, X_2, X_3, X_4)\)
Transposing Matrices
The transpose of a matrix swaps its rows and columns.
If \(A\) is a matrix, then its transpose is denoted as \(A'\) or \(A^T\).
The first column becomes the first row in the transpose.
Matrix Addition and Subtraction
Matrices must have the same dimensions for addition or subtraction.
Operations are performed elementwise.
Matrix Multiplication
Not all matrices can be multiplied—matrix multiplication is only defined if the number of columns in the first matrix matches the number of rows in the second matrix.
If \(A\) is an \(m \times n\) matrix and \(B\) is an \(n \times p\) matrix, then the product \(AB\) is an \(m \times p\) matrix.
Each element in \(AB\) is obtained by computing the dot product of a row from \(A\) and a column from \(B\).
What is a Dot Product?
The dot product of two vectors is the sum of the element-wise multiplications of their corresponding entries.
For matrix multiplication, to compute the element at row \(i\), column \(j\) of \(AB\), we take:
1. Row \(i\) from \(A\)
2. Column \(j\) from \(B\)
3. Multiply corresponding elements and sum them: \[
AB_{i,j} = A_{i,1}B_{1,j} + A_{i,2}B_{2,j} + \dots + A_{i,n}B_{n,j}
\]
Since \(A\) has 3 columns and \(B\) has 3 rows, multiplication is valid, and the resulting matrix \(AB\) has dimension \(2 \times 2\).
To compute the element in row 1, column 1 of \(AB\): \[
AB_{1,1} = (1 \times 7) + (2 \times 9) + (3 \times 11) = 7 + 18 + 33 = 58
\]
To compute the element in row 1, column 2: \[
AB_{1,2} = (1 \times 8) + (2 \times 10) + (3 \times 12) = 8 + 20 + 36 = 64
\]
To compute the element in row 2, column 1: \[
AB_{2,1} = (4 \times 7) + (5 \times 9) + (6 \times 11) = 28 + 45 + 66 = 139
\]
To compute the element in row 2, column 2: \[
AB_{2,2} = (4 \times 8) + (5 \times 10) + (6 \times 12) = 32 + 50 + 72 = 154
\]
Thus, the final matrix product is: \[
AB =
\begin{bmatrix}
58 & 64 \\
139 & 154
\end{bmatrix}
\]
Identity Matrices
An identity matrix is always symmetric, with:
All diagonal elements of 1
All off-diagonal elements of 0
The identity matrix behaves like 1 in scalar multiplication:
If \(A\) is an \(r \times c\) matrix and \(I\) is a \(c \times c\) identity matrix, then \(AI = A\).
If \(I\) is an \(r \times r\) identity matrix, then \(IA = A\).
Matrix Inverses
If a matrix is square and meets certain conditions, an inverse matrix exists.
The inverse of \(A\) is denoted as \(A^{-1}\).
The property: \(AA^{-1} = A^{-1}A = I\)
Inverse operations are the matrix equivalent of division.
Tips for Reading Matrix Formulas
Always check the dimension of the final result when interpreting matrix expressions.
Special Cases of Matrix Multiplication
Let \(C_{n \times 1}\) be a column vector of chosen numbers, and let \(Y_{n \times 1}\) be a column vector representing a sample of data. The computation \(C' Y\) (the transpose of \(C\) multiplied by \(Y\)) allows us to compute:
Averages
If all elements of \(C\) are equal to \(\frac{1}{n}\), then the matrix multiplication: \[
C' Y =
\begin{bmatrix}
\frac{1}{n} & \frac{1}{n} & \dots & \frac{1}{n}
\end{bmatrix}
\begin{bmatrix}
y_1 \\ y_2 \\ \vdots \\ y_n
\end{bmatrix}
= \frac{1}{n} (y_1 + y_2 + \dots + y_n) = \frac{1}{n} \sum_{i=1}^{n} y_i = \bar{Y}
\] This shows that \(C' Y\) computes the sample mean\(\bar{Y}\) using matrix multiplication.
Weighted Averages
If \(C\) contains weights that sum to 1 and are all positive, then the matrix multiplication: \[
C' Y =
\begin{bmatrix}
w_1 & w_2 & \dots & w_n
\end{bmatrix}
\begin{bmatrix}
y_1 \\ y_2 \\ \vdots \\ y_n
\end{bmatrix}
= w_1 y_1 + w_2 y_2 + \dots + w_n y_n = \sum_{i=1}^{n} w_i Y_i = \bar{Y}_w
\] This shows that \(C' Y\) computes the weighted mean\(\bar{Y}_w\) using matrix multiplication.
Summation
If all elements of \(C\) are set to 1, then the matrix multiplication: \[
C' Y =
\begin{bmatrix}
1 & 1 & \dots & 1
\end{bmatrix}
\begin{bmatrix}
y_1 \\ y_2 \\ \vdots \\ y_n
\end{bmatrix}
= y_1 + y_2 + \dots + y_n = \sum_{i=1}^{n} y_i
\] This shows that \(C' Y\) computes the sum of all values in \(Y\) using matrix multiplication.
Extending to Matrices
If \(C\) is not a vector but a matrix, then multiple computations can be performed simultaneously using matrix multiplication: \[
C' Y =
\begin{bmatrix}
\frac{1}{n} & \frac{1}{n} & \dots & \frac{1}{n} \\
1 & 1 & \dots & 1
\end{bmatrix}
\begin{bmatrix}
y_1 \\ y_2 \\ \vdots \\ y_n
\end{bmatrix}
=
\begin{bmatrix}
\bar{Y} \\
\sum Y
\end{bmatrix}
\] This shows that \(C' Y\) stores the mean in the first row and the sum in the second, efficiently computing both.
Sums of Squares
If \(Y\) is multiplied by its own transpose, then the matrix multiplication: \[
Y' Y =
\begin{bmatrix}
y_1 & y_2 & \dots & y_n
\end{bmatrix}
\begin{bmatrix}
y_1 \\ y_2 \\ \vdots \\ y_n
\end{bmatrix}
= y_1^2 + y_2^2 + \dots + y_n^2 = \sum_{i=1}^{n} Y_i^2
\] This shows that \(Y' Y\) computes the sum of squares of \(Y\) using matrix multiplication, which is useful for variance and regression calculations.
Summarizing Multiple Variables
This section describes how to summarize numerical data when working with multiple variables, leading up to the multivariate normal distribution (MVN) and how to estimate its parameters.
One-Variable Summaries
Basic summaries for individual variables:
Mean and standard deviation (if normally distributed)
Five-number summary: min, 1st quartile, median, 3rd quartile, max
Multiple Variables
When working with multiple variables:
Compute the mean and standard deviation for each variable
Some variables may be correlated
Matrices provide a structured way to organize this information
Parameters for Two Variables
For two variables \(X_1\) and \(X_2\):
\(X_1\)
Mean: \(\mu_1\)
Standard deviation: \(\sigma_1\)
\(X_2\)
Mean: \(\mu_2\)
Standard deviation: \(\sigma_2\)
In addition to individual means and standard deviations, we also measure covariance (\(\sigma_{12}\)), which describes how the two variables vary together. Covariance is directly related to correlation.
Matrix Representation
Matrices provide a compact way to represent the relationships between multiple variables.
\(X_1\) and \(X_2\) follow a multivariate normal distribution (MVN):
The diagonal elements are variances; the off-diagonal elements are covariances.
The covariance matrix must be symmetric.
Covariance and Correlation
Covariance (\(\sigma_{12}\)) measures how two variables move together but depends on scale.
Correlation is the standardized version of covariance. \[
\text{COR}(X_1,X_2) = \frac{\text{COV}(X_1, X_2)}{\sigma_1 \sigma_2}
\]
Properties:.
Covariance has units, correlation does not.
A covariance of zero means the variables are not linearly related.
Correlation ranges from -1 to 1, making it easier to interpret.
Multivariate Normal Distribution (MVN)
A multivariate normal distribution (MVN) extends the normal distribution to multiple variables, describing their relationships through both their individual distributions and their dependencies. It is defined by:
A mean vector, which gives the expected values of the variables.
A covariance matrix, which describes how the variables vary together.
A joint probability distribution, which specifies the likelihood of different combinations of variable values.
Theoretical Properties
Each individual variable follows a normal distribution.
The relationships between variables are linear.
Data points are denser near the mean vector.
Estimating Parameters from Data
In practice, we estimate MVN parameters from a sample:
QQ plots (e.g., mqqnorm(dataset)) should be linear with slight tail deviations.
Check scatterplot matrices for pairwise relationships.
Start with these visualizations.
Patterns should be elliptical or circular, and denser in the middle.
Linear relationships suggest MVN, while unusual trends indicate deviations.
Perform hypothesis tests where the null is MVN:
Royston’s test
Mardia’s test
No single test is definitive
Applications of MVN
The multivariate normal distribution is widely used in statistics and machine learning:
Classification:
Discriminant analysis
Regression:
Multiple linear regression (MLR) assumes MVN.
Hypothesis testing in regression uses MVN properties.
Repeated measures and time series analysis rely on MVN structure.
Multiple Linear Regression (MLR) Revisited
How does linear regression use matrix operations and MVN assumptions?
Simple Linear Regression
Model: \(y = \beta_0 + \beta_1x + \epsilon\)
This relationship is assumed to hold for each observation, meaning the model consists of \(n\) equations. For \(n\) observations: \[
\begin{aligned}
y_1 &= \beta_0 + \beta_1x_1 + \epsilon_1 \\
&\vdots \\
y_n &= \beta_0 + \beta_1x_n + \epsilon_n
\end{aligned}
\]
Assumptions:.
Errors \(\epsilon_i\) are independent, normally distributed, and have constant variance.
The error terms form a vector \(\epsilon\) that follows a multivariate normal distribution.
Compute the covariance matrix of estimates: \(\text{Var}(\hat{\beta}) = \sigma^2 (X^T X)^{-1}\)
Take the square root of diagonal elements to get standard errors.
Use \(\hat{\beta}\) and standard errors to compute t-statistics, p-values. and confidence intervals.
The Matrix Advantage
The matrix formula \(\hat{\beta} = (X^T X)^{-1} X^T Y\) works regardless of the number of predictors or sample size.
Matrix form provides a general framework for linear regression.
Code Examples (in R)
Matrix Operations
Code
M =matrix(c(0,1,2,3), 2, 2) # 2x2 (r x c) matrix, filled by columnN =matrix(c(4,3,2,1), 2, 2)t(M) # TransposeM * N # Elementwise multiplicationM %*% N # Matrix multiplicationMinv =solve(M) # Matrix inverseM %*% Minv # Should return identity matrix c(1,0,0,1): confirms inverse
Multiple Linear Regression
Code
# Predict y from x using matrix algebrax =c(1,2,3,4,5)y =c(6.5, 10.8, 14, 21.2, 26.8) # Response vector# Design matrix: column of 1s for intercept, then xbigX =cbind(rep(1, 5), x)# Estimate beta using matrix multiplicationbeta_hat =solve(t(bigX) %*% bigX) %*%t(bigX) %*% ybeta_hat# Compare to lm()fit =lm(y ~ x)coef(summary(fit))[,1] # Estimated coefficientssummary(fit)# Estimate standard errors manuallysigma2 =1.261^2# From residual standard error in model summarycov_beta = sigma2 *solve(t(bigX) %*% bigX)sqrt(diag(cov_beta)) # Standard errors# Compare to lm() standard errorscoef(summary(fit))[,2]