Stats Notes for Data Science

Open-Source Statistics Reference for Data Science, with R and SAS Examples

Overview

This is an open-source reference book written as a companion for graduate-level statistics courses in the SMU Master of Science in Data Science program. It covers both the theoretical foundations and practical applications of statistical methods, with worked examples in R and SAS throughout. The project grew out of personal course notes and was expanded into a full reference at the suggestion of the course instructor; it is licensed under CC BY 4.0 for free reuse and adaptation.

Read the book

View on GitHub

Contents

Part 1: Statistical Foundations T-distributions and inference, data screening and transformations, Type I/II error and power, non-parametric alternatives, ANOVA and multiple comparisons, correlation and linear regression, confidence and prediction intervals, regression diagnostics, model selection and validation.

Part 2: Applied Statistics Bias-variance tradeoff, bootstrap methods, repeated measures, contingency tables, classification, logistic regression, PCA, clustering, and a chapter on communicating statistical results to non-technical audiences.

Appendices R and SAS code examples, and a hypothesis test flowchart for method selection.


Skills

R · SAS · Quarto · Hypothesis Testing · Statistical Modeling · Technical Writing