Housing Regression

Linear regression analysis and prediction of home sale prices in the Ames housing dataset

Overview

A two-part regression analysis of the Ames, Iowa housing dataset (De Cock, 2011), completed for SMU’s MSDS Statistical Foundations course. The first part estimates the relationship between sale price and above-ground living area in three Ames neighborhoods (North Ames, Edwards, and Brookside), using log transformations, residual diagnostics, influential-point analysis (Cook’s D and leverage), and an interaction term to test whether the price-area relationship depends on neighborhood. The second part builds a predictive model for sale price across all of Ames, comparing forward, backward, stepwise, and custom variable-selection strategies and evaluating models by adjusted R², cross-validated PRESS, AIC, and Kaggle leaderboard score. My best model achieved a Kaggle RMSLE of 0.136.

Read the paper

First page of the housing regression paper

Interactive companion

Toggle neighborhoods, switch to log scale, remove outliers, and compare simple, parallel-slopes, and independent-slopes linear regression fits. Built on an earlier R Shiny version of this tool. Data: Kaggle — House Prices: Advanced Regression Techniques.


Skills

R · SAS · R Shiny · D3.js · Linear regression · Variable selection · Residual diagnostics · Predictive modeling