Housing Regression
Linear regression analysis and prediction of home sale prices in the Ames housing dataset
Kristin Henderson
Summer 2024
Overview
A two-part regression analysis of the Ames, Iowa housing dataset (De Cock, 2011), completed for SMU’s MSDS Statistical Foundations course. The first part estimates the relationship between sale price and above-ground living area in three Ames neighborhoods (North Ames, Edwards, and Brookside), using log transformations, residual diagnostics, influential-point analysis (Cook’s D and leverage), and an interaction term to test whether the price-area relationship depends on neighborhood. The second part builds a predictive model for sale price across all of Ames, comparing forward, backward, stepwise, and custom variable-selection strategies and evaluating models by adjusted R², cross-validated PRESS, AIC, and Kaggle leaderboard score. My best model achieved a Kaggle RMSLE of 0.136.
Interactive companion
Skills
R · SAS · R Shiny · D3.js · Linear regression · Variable selection · Residual diagnostics · Predictive modeling