Stats Notes for Data Science
Open-Source Statistics Reference for Data Science, with R and SAS Examples
Kristin Henderson
Published August 2025
Overview
This is an open-source reference book written as a companion for graduate-level statistics courses in the SMU Master of Science in Data Science program. It covers both the theoretical foundations and practical applications of statistical methods, with worked examples in R and SAS throughout.
Read the book · View on GitHub
Contents
Part 1: Statistical Foundations T-distributions, data screening and transformations, Type I/II error and power, ANOVA, multiple comparisons, simple and multiple linear regression, model selection and validation.
Part 2: Applied Statistics Bias-variance tradeoff, bootstrap methods, classification, logistic regression, PCA, and clustering.
Appendices R and SAS code examples, and a hypothesis test flowchart for method selection.
About the Project
The book was built with Quarto and is licensed under CC BY 4.0, meaning anyone is free to use and adapt it with attribution. It grew out of personal course notes and was expanded into a full reference at the suggestion of the course instructor. It is intended to bridge the gap between statistical theory and the practical needs of data science students.
Skills
R · SAS · Quarto · Statistical Modeling · Technical Writing