Free Statistics 101 Flashcards

Import this deck directly into the Stacked app

All 30 Cards

Mean

Average; sum of all values divided by the number of values; sensitive to outliers

Median

Middle value when data is ordered; not affected by outliers; better measure for skewed data

Mode

Most frequently occurring value; can have multiple modes or no mode; only measure for categorical data

Standard Deviation

Measure of spread around the mean; square root of variance; low SD = data clustered near mean

Variance

Average of squared deviations from the mean; SD²; measures data spread

Range

Maximum value minus minimum value; simplest measure of spread; sensitive to outliers

Normal Distribution

Bell-shaped, symmetric curve; mean = median = mode; 68-95-99.7 rule; many natural phenomena follow this

68-95-99.7 Rule

In a normal distribution: 68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD of the mean

Z-Score

Number of standard deviations a value is from the mean; Z = (X - μ) / σ; positive = above mean, negative = below

Probability

Likelihood of an event; ranges from 0 (impossible) to 1 (certain); P(A) = favorable outcomes / total outcomes

Independent Events

Occurrence of one does not affect the other; P(A and B) = P(A) × P(B); example: coin flips

Mutually Exclusive Events

Cannot occur simultaneously; P(A or B) = P(A) + P(B); example: rolling a 1 or a 6 on one die

Conditional Probability

P(A|B) = probability of A given B occurred; P(A|B) = P(A and B) / P(B); Bayes' theorem

Population vs Sample

Population: entire group of interest. Sample: subset of population. Statistics estimate parameters

Parameter vs Statistic

Parameter: describes population (μ, σ). Statistic: describes sample (x̄, s). We use statistics to estimate parameters

Sampling Bias

When sample does not represent the population; types: selection bias, response bias, voluntary response bias

Central Limit Theorem

As sample size increases, sampling distribution of the mean approaches normal regardless of population shape; n ≥ 30

Confidence Interval

Range of values likely to contain the population parameter; 95% CI means 95% of intervals would capture the true value

Margin of Error

Half-width of confidence interval; decreases with larger sample size; affected by confidence level and variability

Hypothesis Testing

Null hypothesis (H₀): no effect/difference. Alternative (Hₐ): there is an effect. Collect data, calculate p-value, decide

P-Value

Probability of observing data as extreme as the sample, assuming H₀ is true; small p-value (< α) = reject H₀

Significance Level (α)

Threshold for rejecting H₀; typically 0.05 (5%); if p-value < α, result is 'statistically significant'

Type I Error

Rejecting H₀ when it is actually true (false positive); probability = α; 'seeing an effect that isn't there'

Type II Error

Failing to reject H₀ when it is actually false (false negative); probability = β; 'missing a real effect'

Correlation Coefficient (r)

Measures linear relationship strength; -1 to +1; |r| > 0.7 strong, 0.3-0.7 moderate, < 0.3 weak

Correlation vs Causation

Correlation does NOT imply causation; confounding variables may explain the relationship; need controlled experiments

Linear Regression

ŷ = a + bx; predicts dependent variable from independent; b = slope (change in y per unit x); a = y-intercept

R-Squared (R²)

Proportion of variance in y explained by x; ranges 0-1; R² = 0.85 means 85% of variation explained by the model

Chi-Square Test

Tests association between categorical variables; compares observed vs expected frequencies; larger χ² = more evidence of association

T-Test

Compares means; one-sample (vs known value), two-sample (two groups), paired (before/after); uses t-distribution for small samples

Study this deck on the go with Stacked — the AI-powered flashcard app.

Get Stacked Free →