Drawing Statistical Conclusions

Objectives

This chapter introduces how statistical studies are designed, how we draw conclusions from them, and how uncertainty is measured through hypothesis testing and resampling methods.

Distinguish between observational studies and experiments, including the role of randomization.
Understand key sampling methods and definitions (parameter vs. statistic).
Recognize ethical requirements for data collection, including informed consent and IRB review.
Conduct hypothesis tests and interpret p-values in context.
Use permutation testing to assess the likelihood of observed results under random assignment.
Communicate statistical results clearly, including context, limitations, and reproducibility.

Observational Studies vs. Experiments

Experiment: The investigator actively manipulates the environment with random assignment to treatment groups.
- Randomization ensures groups are comparable across all potential confounding variables and removes researcher bias.
- Enables cause-and-effect conclusions.
Observational study: The researcher is passive and observes without intervention.
- Suitable when experiments would be unethical or impractical.
- Can build evidence for causation when combined with multiple studies and theoretical reasoning (e.g., smoking causes lung cancer).

Statistical Sampling

Random sampling uses chance to select a representative sample, allowing us to infer population characteristics.

Key Definitions

Parameter: Characteristic of a population
Statistic: Characteristic of a sample

Sampling Designs

Simple random sample: Each individual has an equal chance of selection.
Stratified random sample: The population is divided into subgroups (e.g., gender), and random sampling is performed within each group.

Ethics of Gathering Data

Informed consent must be obtained from participants so they are aware of the study’s purpose, procedures, and potential risks or benefits.
The Institutional Review Board (IRB) evaluates the study to determine its ethical and scientific appropriateness:
- Ensures the study minimizes risks and maximizes benefits.
- Reviews the experimental design and statistical plan for ethical and scientific rigor.

Measuring Uncertainty

Hypothesis Testing

Key Concepts

Hypotheses:
- Null hypothesis (\(H_0\)): Assumes no effect or difference (e.g., treatment means are equal).
  - Because treatment group assignment is random, we assume that if the treatment has no effect, the population means are equal across groups.
- Alternative hypothesis (\(H_a\)): Suggests an effect or difference exists.
Test statistic: A sample estimate (e.g., difference in means) divided by a measure of variability (e.g., standard error).
- Example: A t-statistic in a two-sample t-test.
p-value: Probability of observing by random chance a result as extreme as the sample, assuming \(H_0\) is true. Calculated using the test statistic.
Decision: Compare the p-value to the significance level (\(\alpha\)). Under the assumption that the population means are equal, a small p-value indicates that the observed result would be unlikely due to chance alone.
- If \(p < \alpha\): Reject \(H_0\).
- Otherwise: Fail to reject \(H_0\).
Conclusion: Summarize the result in plain language, referencing both hypotheses and the p-value.
- In observational studies, we use a fictitious chance model to apply the same calculations, but causal interpretations are limited due to lack of randomization.

General Principles

Evidence is gathered by collecting a sample and analyzing differences in key metrics.
If \(H_0\) is true, differences between sample means should be small. The p-value quantifies how small is “small enough.”

Example: Lottery Case

Assumption (\(H_0\)): Bivin is playing fair.
Evidence (test statistic): He won 10 times in a row.
Probability (p-value): The probability of winning 10 times in a row, assuming fairness, is extremely low.
Decision: Reject \(H_0\). The outcome is too unlikely under fair play.
Conclusion: Since the probability of winning 10 times in a row under fair play is so small, we conclude that Bivin is likely cheating.

Permutation Testing

Purpose: Tests results against all possible randomizations to assess how unusual the observed result is.
- In an experiment, permutations represent all possible assignments of participants to treatment groups.
p-value calculation: Count how many permutations are as extreme as the observed result, and divide by the total number of permutations.
- Example: If 10 out of 1,000 permutations are as extreme as the observed result, the p-value is \(10 / 1{,}000 = 0.01\).

Writing Up Results

Results are typically communicated in writing, often for non-expert audiences.

Include context about the experiment:
- What was the research question?
- What were the treatment groups?
- What was measured?
Include descriptive statistics:
- Mean
- Standard deviation
- Outliers (with explanations)
- Important differences observed in the data
State the p-value:
- Does it support or contradict \(H_0\)?
- What does this imply in the context of the study?
Mention challenges:
- Difficulties encountered during the study
- Confounding variables that may have affected results
Clarify the scope of the study:
- Can results be generalized to the target population?
- Was the sample representative?
Ensure reproducibility:
- Provide enough detail for another researcher to reproduce the study.