Matrices

Objectives

This chapter explores fundamental matrix concepts, including operations, applications in summarizing data, and their role in multiple linear regression.

Review fundamental matrix operations and properties.
Understand special cases of matrix multiplication.
Apply matrix concepts to summarize multivariate data (e.g., multivariate normal distribution).
Explore how matrix algebra supports multiple linear regression (MLR).

Review of Matrix Operations

Symmetric Matrices

A symmetric matrix has the same number of rows and columns (i.e, a square matrix).

Vectors

A vector is a matrix with a single column.
Variables in vector format: \(X = (X_1, X_2, X_3, X_4)\)

Transposing Matrices

The transpose of a matrix swaps its rows and columns.
If \(A\) is a matrix, then its transpose is denoted as \(A'\) or \(A^T\).
The first column becomes the first row in the transpose.

Matrix Addition and Subtraction

Matrices must have the same dimensions for addition or subtraction.
Operations are performed elementwise.

Matrix Multiplication

Not all matrices can be multiplied—matrix multiplication is only defined if the number of columns in the first matrix matches the number of rows in the second matrix.
If \(A\) is an \(m \times n\) matrix and \(B\) is an \(n \times p\) matrix, then the product \(AB\) is an \(m \times p\) matrix.
Each element in \(AB\) is obtained by computing the dot product of a row from \(A\) and a column from \(B\).

What is a Dot Product?

The dot product of two vectors is the sum of the element-wise multiplications of their corresponding entries.

For matrix multiplication, to compute the element at row \(i\), column \(j\) of \(AB\), we take:
1. Row \(i\) from \(A\)
2. Column \(j\) from \(B\)
3. Multiply corresponding elements and sum them:
\[ AB_{i,j} = A_{i,1}B_{1,j} + A_{i,2}B_{2,j} + \dots + A_{i,n}B_{n,j} \]

Step-by-Step Example

Let:
\[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \] (\(2 \times 3\) matrix) and
\[ B = \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \end{bmatrix} \] (\(3 \times 2\) matrix)

Since \(A\) has 3 columns and \(B\) has 3 rows, multiplication is valid, and the resulting matrix \(AB\) has dimension \(2 \times 2\).

To compute the element in row 1, column 1 of \(AB\):
\[ AB_{1,1} = (1 \times 7) + (2 \times 9) + (3 \times 11) = 7 + 18 + 33 = 58 \]

To compute the element in row 1, column 2:
\[ AB_{1,2} = (1 \times 8) + (2 \times 10) + (3 \times 12) = 8 + 20 + 36 = 64 \]

To compute the element in row 2, column 1:
\[ AB_{2,1} = (4 \times 7) + (5 \times 9) + (6 \times 11) = 28 + 45 + 66 = 139 \]

To compute the element in row 2, column 2:
\[ AB_{2,2} = (4 \times 8) + (5 \times 10) + (6 \times 12) = 32 + 50 + 72 = 154 \]

Thus, the final matrix product is:
\[ AB = \begin{bmatrix} 58 & 64 \\ 139 & 154 \end{bmatrix} \]

Identity Matrices

An identity matrix is always symmetric, with:
- All diagonal elements of 1
- All off-diagonal elements of 0
The identity matrix behaves like 1 in scalar multiplication:
If \(A\) is an \(r \times c\) matrix and \(I\) is a \(c \times c\) identity matrix, then \(AI = A\).
If \(I\) is an \(r \times r\) identity matrix, then \(IA = A\).

Matrix Inverses

If a matrix is square and meets certain conditions, an inverse matrix exists.
The inverse of \(A\) is denoted as \(A^{-1}\).
The property: \(AA^{-1} = A^{-1}A = I\)
Inverse operations are the matrix equivalent of division.

Tips for Reading Matrix Formulas

Always check the dimension of the final result when interpreting matrix expressions.

Special Cases of Matrix Multiplication

Let \(C_{n \times 1}\) be a column vector of chosen numbers, and let \(Y_{n \times 1}\) be a column vector representing a sample of data. The computation \(C' Y\) (the transpose of \(C\) multiplied by \(Y\)) allows us to compute:

Averages

If all elements of \(C\) are equal to \(\frac{1}{n}\), then the matrix multiplication: \[ C' Y = \begin{bmatrix} \frac{1}{n} & \frac{1}{n} & \dots & \frac{1}{n} \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \frac{1}{n} (y_1 + y_2 + \dots + y_n) = \frac{1}{n} \sum_{i=1}^{n} y_i = \bar{Y} \] This shows that \(C' Y\) computes the sample mean \(\bar{Y}\) using matrix multiplication.

Weighted Averages

If \(C\) contains weights that sum to 1 and are all positive, then the matrix multiplication: \[ C' Y = \begin{bmatrix} w_1 & w_2 & \dots & w_n \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = w_1 y_1 + w_2 y_2 + \dots + w_n y_n = \sum_{i=1}^{n} w_i Y_i = \bar{Y}_w \] This shows that \(C' Y\) computes the weighted mean \(\bar{Y}_w\) using matrix multiplication.

Summation

If all elements of \(C\) are set to 1, then the matrix multiplication: \[ C' Y = \begin{bmatrix} 1 & 1 & \dots & 1 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = y_1 + y_2 + \dots + y_n = \sum_{i=1}^{n} y_i \] This shows that \(C' Y\) computes the sum of all values in \(Y\) using matrix multiplication.

Extending to Matrices

If \(C\) is not a vector but a matrix, then multiple computations can be performed simultaneously using matrix multiplication: \[ C' Y = \begin{bmatrix} \frac{1}{n} & \frac{1}{n} & \dots & \frac{1}{n} \\ 1 & 1 & \dots & 1 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} \bar{Y} \\ \sum Y \end{bmatrix} \] This shows that \(C' Y\) stores the mean in the first row and the sum in the second, efficiently computing both.

Sums of Squares

If \(Y\) is multiplied by its own transpose, then the matrix multiplication: \[ Y' Y = \begin{bmatrix} y_1 & y_2 & \dots & y_n \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = y_1^2 + y_2^2 + \dots + y_n^2 = \sum_{i=1}^{n} Y_i^2 \] This shows that \(Y' Y\) computes the sum of squares of \(Y\) using matrix multiplication, which is useful for variance and regression calculations.

Summarizing Multiple Variables

This section describes how to summarize numerical data when working with multiple variables, leading up to the multivariate normal distribution (MVN) and how to estimate its parameters.

One-Variable Summaries

Basic summaries for individual variables:

Mean and standard deviation (if normally distributed)
Five-number summary: min, 1st quartile, median, 3rd quartile, max

Multiple Variables

When working with multiple variables:

Compute the mean and standard deviation for each variable
Some variables may be correlated
Matrices provide a structured way to organize this information

Parameters for Two Variables

For two variables \(X_1\) and \(X_2\):

\(X_1\)

Mean: \(\mu_1\)
Standard deviation: \(\sigma_1\)

\(X_2\)

Mean: \(\mu_2\)
Standard deviation: \(\sigma_2\)

In addition to individual means and standard deviations, we also measure covariance (\(\sigma_{12}\)), which describes how the two variables vary together. Covariance is directly related to correlation.

Matrix Representation

Matrices provide a compact way to represent the relationships between multiple variables.

\(X_1\) and \(X_2\) follow a multivariate normal distribution (MVN):

Mean vector: \[ \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix} \]
Covariance matrix: \[ \Sigma = \begin{bmatrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{bmatrix} \]
The diagonal elements are variances; the off-diagonal elements are covariances.
The covariance matrix must be symmetric.

Covariance and Correlation

Covariance (\(\sigma_{12}\)) measures how two variables move together but depends on scale.
Correlation is the standardized version of covariance. \[ \text{COR}(X_1,X_2) = \frac{\text{COV}(X_1, X_2)}{\sigma_1 \sigma_2} \]
Properties:.
- Covariance has units, correlation does not.
- A covariance of zero means the variables are not linearly related.
- Correlation ranges from -1 to 1, making it easier to interpret.

Multivariate Normal Distribution (MVN)

A multivariate normal distribution (MVN) extends the normal distribution to multiple variables, describing their relationships through both their individual distributions and their dependencies. It is defined by:

A mean vector, which gives the expected values of the variables.
A covariance matrix, which describes how the variables vary together.
A joint probability distribution, which specifies the likelihood of different combinations of variable values.

Theoretical Properties

Each individual variable follows a normal distribution.
The relationships between variables are linear.
Data points are denser near the mean vector.

Estimating Parameters from Data

In practice, we estimate MVN parameters from a sample:

Sample Mean Vector: \[ \bar{X} = \begin{bmatrix} \bar{X}_1 \\ \bar{X}_2 \end{bmatrix} \] Sample Covariance Matrix: \[ S = \begin{bmatrix} s_1^2 & s_{12} \\ s_{12} & s_2^2 \end{bmatrix} \]

where:

\(s_1^2\) and \(s_2^2\) are sample variances.
\(s_{12}\) is the sample covariance.

Assessing Multivariate Normality

To check if data follows an MVN distribution:

Each variable should be normally distributed.
Density plots should be bell-shaped.
QQ plots (e.g., mqqnorm(dataset)) should be linear with slight tail deviations.
Check scatterplot matrices for pairwise relationships.
Start with these visualizations.
Patterns should be elliptical or circular, and denser in the middle.
Linear relationships suggest MVN, while unusual trends indicate deviations.
Perform hypothesis tests where the null is MVN:
- Royston’s test
- Mardia’s test
- No single test is definitive

Applications of MVN

The multivariate normal distribution is widely used in statistics and machine learning:

Classification:
- Discriminant analysis
Regression:
- Multiple linear regression (MLR) assumes MVN.
- Hypothesis testing in regression uses MVN properties.
- Repeated measures and time series analysis rely on MVN structure.

Multiple Linear Regression (MLR) Revisited

How does linear regression use matrix operations and MVN assumptions?

Simple Linear Regression

Model: \(y = \beta_0 + \beta_1x + \epsilon\)
This relationship is assumed to hold for each observation, meaning the model consists of \(n\) equations. For \(n\) observations: \[ \begin{aligned} y_1 &= \beta_0 + \beta_1x_1 + \epsilon_1 \\ &\vdots \\ y_n &= \beta_0 + \beta_1x_n + \epsilon_n \end{aligned} \]
Assumptions:.
- Errors \(\epsilon_i\) are independent, normally distributed, and have constant variance.
- The error terms form a vector \(\epsilon\) that follows a multivariate normal distribution.
These equations can be rewritten in matrix form.

Big Picture: Computing Regression Coefficients

To estimate and test coefficients in MLR:

Estimate coefficients using: \(\hat{\beta} = (X^T X)^{-1} X^T Y\)
Compute the covariance matrix of estimates: \(\text{Var}(\hat{\beta}) = \sigma^2 (X^T X)^{-1}\)
- Take the square root of diagonal elements to get standard errors.
Use \(\hat{\beta}\) and standard errors to compute t-statistics, p-values. and confidence intervals.

The Matrix Advantage

The matrix formula \(\hat{\beta} = (X^T X)^{-1} X^T Y\) works regardless of the number of predictors or sample size.
Matrix form provides a general framework for linear regression.

Code Examples (in R)

Matrix Operations

Code

M = matrix(c(0,1,2,3), 2, 2)  # 2x2 (r x c) matrix, filled by column
N = matrix(c(4,3,2,1), 2, 2)

t(M)        # Transpose
M * N       # Elementwise multiplication
M %*% N     # Matrix multiplication
Minv = solve(M)     # Matrix inverse
M %*% Minv          # Should return identity matrix c(1,0,0,1): confirms inverse

Multiple Linear Regression

Code

# Predict y from x using matrix algebra

x = c(1,2,3,4,5)
y = c(6.5, 10.8, 14, 21.2, 26.8) # Response vector

# Design matrix: column of 1s for intercept, then x
bigX = cbind(rep(1, 5), x)

# Estimate beta using matrix multiplication
beta_hat = solve(t(bigX) %*% bigX) %*% t(bigX) %*% y
beta_hat

# Compare to lm()
fit = lm(y ~ x)
coef(summary(fit))[,1]   # Estimated coefficients
summary(fit)


# Estimate standard errors manually
sigma2 = 1.261^2  # From residual standard error in model summary
cov_beta = sigma2 * solve(t(bigX) %*% bigX)
sqrt(diag(cov_beta))      # Standard errors

# Compare to lm() standard errors
coef(summary(fit))[,2]