AP Statistics Master Practice Exam

📌 Directions: Each question is accompanied by a concept review. After reading the concept, select the best answer. Detailed solutions appear immediately after each response. Your final score and answer key appear at the end.

Concept Unit 1 — Exploring One-Variable Data

Measures of Center & Spread

The mean is the arithmetic average and is sensitive to outliers. The median is the middle value and is resistant to outliers. Standard deviation (s) measures typical distance from the mean. The IQR (Q3 − Q1) is the resistant measure of spread. Use mean/SD for symmetric distributions; median/IQR for skewed distributions.

📌 Must Memorize

Mean: x̄ = Σxᵢ / n
IQR = Q3 − Q1
Outlier rule: < Q1 − 1.5·IQR or > Q3 + 1.5·IQR
Skewed right → mean > median
Skewed left → mean < median

📝 Quick Example

Data: {2, 4, 6, 8, 100}. Which is larger, the mean or the median?

→ Mean = 24, Median = 6. The outlier (100) pulls the mean up. Mean > Median.

1 Easy Unit 1

A dataset of household incomes in a city is strongly right-skewed. Which of the following correctly describes the relationship between the mean and median?

AThe mean is less than the median because the left tail pulls the mean down.

BThe mean equals the median because they always balance each other.

CThe mean is greater than the median because large values in the right tail pull the mean up.

DThe mean is greater than the median only when the sample size exceeds 30.

EThe median is undefined for right-skewed distributions.

Correct Answer: C

In a right-skewed distribution, a few very large values stretch the right tail. The mean is sensitive to these extreme values and gets pulled in the direction of the skew (upward). The median, which depends only on the middle value's position, is not affected by these outliers. Therefore, in a right-skewed distribution: mean > median. This is a foundational AP Statistics fact. Right skew → mean > median | Left skew → mean < median | Symmetric → mean ≈ median Choice A describes left skew. Choice B is only true for symmetric distributions. Choices D and E are false.

Concept Unit 1 — Exploring One-Variable Data

The Normal Distribution & Empirical Rule

The Empirical Rule (68–95–99.7 Rule): In a Normal distribution, approximately 68% of data falls within 1 SD, 95% within 2 SD, and 99.7% within 3 SD of the mean. A z-score measures how many standard deviations an observation is from the mean.

📌 Must Memorize

z = (x − μ) / σ
68%: μ ± 1σ
95%: μ ± 2σ
99.7%: μ ± 3σ

📝 Quick Example

Heights: μ = 68 in, σ = 3 in. What % are between 62 and 74 inches?

→ 62 = μ − 2σ, 74 = μ + 2σ → Empirical Rule: 95%

2 Medium Unit 1

The scores on an AP Statistics exam are approximately normally distributed with a mean of 72 and a standard deviation of 8. What is the approximate percentage of students who scored between 56 and 88?

A34%

B68%

C95%

D99.7%

E47.5%

Correct Answer: C

First, find how many standard deviations 56 and 88 are from the mean (72): z for 56: (56 − 72) / 8 = −16 / 8 = −2
z for 88: (88 − 72) / 8 = 16 / 8 = +2 The interval [56, 88] spans from μ − 2σ to μ + 2σ. By the Empirical Rule, approximately 95% of data in a Normal distribution falls within 2 standard deviations of the mean. Answer: C.

Concept Unit 2 — Exploring Two-Variable Data

Correlation & Least-Squares Regression

The correlation coefficient r measures the direction and strength of a linear relationship (−1 ≤ r ≤ 1). The least-squares regression line (LSRL) minimizes the sum of squared residuals. The slope b₁ = r(Sᵧ/Sₓ) and it passes through (x̄, ȳ).

📌 Must Memorize

ŷ = b₀ + b₁x
b₁ = r · (Sᵧ / Sₓ)
b₀ = ȳ − b₁x̄
r² = coefficient of determination
r² = % variation in y explained by x

📝 Quick Example

r = 0.9, Sᵧ = 5, Sₓ = 2, x̄ = 10, ȳ = 20. Find the LSRL.

→ b₁ = 0.9(5/2) = 2.25; b₀ = 20 − 2.25(10) = −2.5 → ŷ = −2.5 + 2.25x

3 Medium Unit 2

A regression analysis of hours studied (x) and exam score (y) for 30 students yields the following: x̄ = 5, ȳ = 75, Sₓ = 2, Sᵧ = 10, and r = 0.8. What is the slope of the least-squares regression line?

A0.16

B0.8

C4.0

D5.0

E1.6

Correct Answer: C

The formula for the slope of the LSRL is: b₁ = r · (Sᵧ / Sₓ) = 0.8 · (10 / 2) = 0.8 · 5 = 4.0 The slope is 4.0, meaning each additional hour of study is associated with an average increase of 4.0 points on the exam. Note: The y-intercept would be b₀ = ȳ − b₁x̄ = 75 − 4(5) = 55, giving LSRL: ŷ = 55 + 4.0x.

Concept Unit 3 — Collecting Data

Sampling Methods & Bias

A simple random sample (SRS) gives every individual an equal chance of selection. Stratified sampling divides the population into groups and randomly samples each. Cluster sampling randomly selects groups. Voluntary response and convenience samples are biased.

📌 Must Memorize

Voluntary response bias → overrepresents strong opinions
Undercoverage → population groups missed
Response bias → question wording affects answers
Nonresponse bias → non-responders differ from responders

4 Easy Unit 3

A radio station asks listeners to call in and vote for their favorite song. Of the 2,000 callers, 85% voted for Song A. Which type of bias is most likely present in this survey?

AUndercoverage bias

BVoluntary response bias

CNonresponse bias

DResponse bias

ECluster sampling bias

Correct Answer: B

This is a classic voluntary response scenario: listeners choose whether to call in. People with strong opinions (e.g., passionate fans of Song A) are far more likely to call in than those with mild preferences. This self-selection creates a sample that does not represent the broader listening audience. Voluntary response → people with strong opinions self-select → biased results Key AP distinction: Nonresponse bias (C) occurs when selected people refuse to respond. Here, there was no formal selection — people volunteered themselves, making this voluntary response bias.

Concept Unit 3 — Collecting Data

Experimental Design

A well-designed experiment includes: random assignment of treatments, control groups, and replication. A confounding variable is associated with both the explanatory and response variable. Blocking groups subjects by a known variable before random assignment.

📌 Must Memorize

3 Principles: Control · Randomization · Replication
Placebo effect → controlled by blind/double-blind design
Confounding variable → lurking variable that affects both
Blocking → reduces variability; NOT the same as stratifying

5 Medium Unit 3

A researcher wants to test whether a new fertilizer increases tomato yield. She plants 40 tomato plants and randomly assigns 20 to receive the new fertilizer and 20 to receive a standard fertilizer. She measures the yield (in pounds) at the end of the season. Which aspect of this design BEST allows the researcher to conclude that any difference in yield is caused by the fertilizer?

AThe large sample size of 40 plants

BThe measurement of yield in pounds

CThe random assignment of plants to treatment groups

DThe use of a standard fertilizer as a comparison

EThe fact that tomatoes are a consistent crop

Correct Answer: C

Random assignment is the key feature that allows researchers to make causal conclusions. By randomly assigning plants to treatments, potential confounding variables (e.g., soil quality, sunlight exposure, plant genetics) are distributed roughly equally across groups. Any systematic difference in outcomes can then be attributed to the treatment. Random Assignment → Balanced Confounders → Causal Inference Possible Note: Using a control group (D) is important for comparison but alone doesn't establish causation. Random assignment does. This is a critical AP Statistics concept: observational studies show association; experiments with random assignment can show causation.

Concept Unit 4 — Probability

Probability Rules & Conditional Probability

For any events A and B: P(A or B) = P(A) + P(B) − P(A and B). Events are independent if P(A|B) = P(A). Conditional probability: P(A|B) = P(A and B) / P(B).

📌 Must Memorize

Addition Rule: P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Multiplication Rule: P(A ∩ B) = P(A) · P(B|A)
Independent: P(A ∩ B) = P(A) · P(B)
Conditional: P(A|B) = P(A ∩ B) / P(B)

6 Medium Unit 4

In a statistics class, 60% of students passed the midterm (M), 70% passed the final (F), and 50% passed both. What is the probability that a randomly selected student passed the midterm given that they passed the final?

A0.42

B0.50

C5/7 ≈ 0.714

D0.80

E0.60

Correct Answer: C

We need P(M|F) — the probability of passing the midterm, given they passed the final. P(M|F) = P(M ∩ F) / P(F) = 0.50 / 0.70 = 5/7 ≈ 0.714 Interpretation: Among the 70% of students who passed the final, 50% passed both, so the fraction who also passed the midterm is 0.50/0.70 = 5/7. Note: Are M and F independent? Check: P(M)·P(F) = 0.6 × 0.7 = 0.42 ≠ 0.50 = P(M∩F), so they are not independent.

Concept Unit 4 — Probability

Discrete Random Variables — Expected Value & Variance

The expected value (mean) of a discrete random variable X is μₓ = Σ[x · P(X=x)]. The variance is σ²ₓ = Σ[(x − μ)² · P(X=x)]. For linear transformations: if Y = a + bX, then μᵧ = a + bμₓ and σᵧ = |b|σₓ.

📌 Must Memorize

E(X) = Σ[x · P(x)]
Var(X) = Σ[(x − μ)² · P(x)]
E(aX + b) = aE(X) + b
Var(aX + b) = a²·Var(X) [b shifts, doesn't scale variance]
SD(X+Y) = √[Var(X) + Var(Y)] only if X,Y independent

7 Medium Unit 4

A game awards the following payouts based on the result of spinning a wheel:

Payout ($)	0	5	10	20
P(X)	0.50	0.25	0.20	0.05

What is the expected payout per spin?

A$3.75

B$4.25

C$8.75

D$5.00

E$6.50

Correct Answer: B

Apply the expected value formula: E(X) = 0(0.50) + 5(0.25) + 10(0.20) + 20(0.05)
= 0 + 1.25 + 2.00 + 1.00
= $4.25 The expected payout per spin is $4.25. This is the long-run average if you played many times — not necessarily what you'd win on any single spin.

Concept Unit 5 — Sampling Distributions

The Central Limit Theorem (CLT)

If the sample size n is large enough (generally n ≥ 30), the sampling distribution of x̄ is approximately Normal, regardless of the shape of the population distribution. The mean of the sampling distribution is μ, and the standard deviation (standard error) is σ/√n.

📌 Must Memorize

x̄ ~ N(μ, σ/√n) when n is large (≥30)
Standard Error of x̄ = σ/√n
CLT: shape of x̄ ≈ Normal (n ≥ 30)
Larger n → smaller SE → x̄ closer to μ

8 Medium Unit 5

The weight of packages shipped by a courier service has a population mean of 12 lbs and a population standard deviation of 4 lbs. A random sample of 64 packages is selected. What is the standard deviation of the sampling distribution of the sample mean?

A4 lbs

B1 lb

C0.5 lbs

D2 lbs

E8 lbs

Correct Answer: C

The standard deviation of the sampling distribution of x̄ (also called the standard error) is: SE = σ / √n = 4 / √64 = 4 / 8 = 0.5 lbs The SE decreases as n increases because larger samples produce more consistent estimates of the population mean. With n = 64, the sample mean will typically be within 0.5 lbs of the true mean. Note: Do not confuse the population SD (σ = 4) with the SE of x̄.

Concept Unit 5 — Sampling Distributions

Sampling Distribution of Sample Proportions

For large samples, the sampling distribution of p̂ is approximately Normal with mean p and standard deviation √(p(1−p)/n). Conditions: np ≥ 10 and n(1−p) ≥ 10 (Large Counts Condition).

📌 Must Memorize

p̂ ~ N(p, √(p(1−p)/n))
Conditions: np ≥ 10 AND n(1−p) ≥ 10
Mean of p̂ = p (unbiased)
10% Condition: n ≤ 10% of population

9 Hard Unit 5

Suppose 40% of voters in a large city support Candidate A. A pollster takes an SRS of 100 voters. What is the probability that the sample proportion supporting Candidate A is greater than 0.45?

AApproximately 0.1539

BApproximately 0.3446

CApproximately 0.4015

DApproximately 0.0500

EApproximately 0.5000

Correct Answer: A

Check conditions: np = 100(0.4) = 40 ≥ 10 ✓; n(1−p) = 100(0.6) = 60 ≥ 10 ✓. The sampling distribution of p̂ is approximately Normal. μ(p̂) = 0.40
σ(p̂) = √(0.4 × 0.6 / 100) = √(0.0024) = 0.04899

z = (0.45 − 0.40) / 0.04899 = 0.05 / 0.04899 ≈ 1.02 P(p̂ > 0.45) = P(Z > 1.02) = 1 − P(Z < 1.02) ≈ 1 − 0.8461 = 0.1539.

Concept Unit 6 — Inference for Categorical Data (Proportions)

Confidence Intervals for Proportions

A confidence interval for a population proportion p uses the formula: p̂ ± z* · √(p̂(1−p̂)/n). The margin of error (ME) = z* · SE. A 95% CI uses z* = 1.96; 90% uses z* = 1.645; 99% uses z* = 2.576.

📌 Must Memorize

CI: p̂ ± z* √(p̂(1−p̂)/n)
90% → z* = 1.645
95% → z* = 1.960
99% → z* = 2.576
Wider CI ↔ Higher confidence (not more accurate!)

10 Medium Unit 6

A survey of 400 students found that 240 own a laptop. A statistician constructs a 95% confidence interval for the true proportion of students who own a laptop. Which of the following is the correct interval?

A(0.552, 0.648)

B(0.560, 0.640)

C(0.540, 0.660)

D(0.548, 0.652)

E(0.530, 0.670)

Correct Answer: A

p̂ = 240/400 = 0.60
SE = √(0.60 × 0.40 / 400) = √(0.0006) = 0.02449
ME = 1.96 × 0.02449 ≈ 0.04801 ≈ 0.048

CI: 0.60 ± 0.048 → (0.552, 0.648) The 95% confidence interval is (0.552, 0.648) — answer A.

Interpretation: We are 95% confident that the true proportion of students who own a laptop is between 55.2% and 64.8%.

Concept Unit 6 — Inference: Significance Tests for Proportions

Hypothesis Testing — p-values & Conclusions

A p-value is the probability of observing a result at least as extreme as the one obtained, assuming H₀ is true. If p-value ≤ α, reject H₀. The significance level α is set before the test (commonly 0.05).

📌 Must Memorize

H₀: null hypothesis (equality)
Hₐ: alternative hypothesis (what we're testing)
If p-value ≤ α → Reject H₀ (statistically significant)
If p-value > α → Fail to reject H₀
NEVER say "Accept H₀" — always "Fail to reject"

11 Easy Unit 6

A researcher conducts a significance test with α = 0.05 and obtains a p-value of 0.032. Which of the following is the correct conclusion?

AFail to reject H₀; there is not sufficient evidence to support Hₐ.

BReject H₀; there is sufficient evidence to support Hₐ.

CAccept H₀; the null hypothesis is proven true.

DAccept Hₐ; the alternative hypothesis is proven true.

EReject H₀; there is a 3.2% chance the null hypothesis is true.

Correct Answer: B

Since p-value (0.032) ≤ α (0.05), we reject H₀. There is sufficient evidence at the 5% significance level to support the alternative hypothesis. p-value = 0.032 < α = 0.05 → Reject H₀ Critical AP pitfalls: Choice C and D are always wrong — we never "prove" or "accept" hypotheses. Choice E misinterprets the p-value: the p-value is NOT the probability that H₀ is true. It is the probability of obtaining results this extreme or more extreme, assuming H₀ is true.

Concept Unit 7 — Inference for Quantitative Data (Means)

One-Sample t-Test & t-Interval

When σ is unknown, use the t-distribution with df = n − 1. The test statistic is t = (x̄ − μ₀) / (s/√n). Conditions: Random sample, Normal/large sample (n ≥ 30 or data roughly Normal), and 10% condition.

📌 Must Memorize

t = (x̄ − μ₀) / (s / √n), df = n − 1
t-interval: x̄ ± t* · (s / √n)
t-dist: heavier tails than Normal
Conditions: Random + Normal/Large + 10%

12 Hard Unit 7

A gym claims that the average weight loss of its members after 3 months is more than 10 lbs. A random sample of 25 members shows x̄ = 11.4 lbs and s = 3.5 lbs. Which of the following is the value of the test statistic for testing H₀: μ = 10 vs. Hₐ: μ > 10?

At = 0.40

Bt = 1.40

Ct = 2.00

Dt = 7.00

Et = 3.14

Correct Answer: C

Apply the one-sample t-test statistic formula: t = (x̄ − μ₀) / (s / √n)
t = (11.4 − 10) / (3.5 / √25)
t = 1.4 / (3.5 / 5)
t = 1.4 / 0.70
t = 2.00 With df = n − 1 = 24, t = 2.00. Using a t-table, for a one-tailed test with df = 24, t = 2.00 gives a p-value between 0.025 and 0.05. At α = 0.05, we would reject H₀.

Concept Unit 7 — Type I & Type II Errors

Type I Error, Type II Error & Power

A Type I error (false positive) occurs when we reject H₀ when it is actually true. Its probability = α. A Type II error (false negative) occurs when we fail to reject H₀ when Hₐ is actually true. Its probability = β. Power = 1 − β = probability of correctly rejecting a false H₀.

📌 Must Memorize

Type I: Reject H₀ when H₀ is TRUE → P = α
Type II: Fail to reject H₀ when Hₐ is TRUE → P = β
Power = 1 − β
Increasing n → Power increases → β decreases
Increasing α → Power increases → Type I error risk ↑

13 Medium Unit 7

A pharmaceutical company tests whether a new drug reduces blood pressure. The null hypothesis is that the drug has no effect. The company concludes the drug is effective, but in reality the drug has no effect. What type of error has been made?

AType II error, because they failed to reject a false null hypothesis.

BType I error, because they rejected a null hypothesis that was actually true.

CType II error, because they accepted the alternative hypothesis incorrectly.

DType I error, because they failed to reject the null hypothesis.

ENo error was made; any conclusion can be correct with a low p-value.

Correct Answer: B

The company concluded the drug is effective → they rejected H₀ (no effect). But in reality, H₀ is true (the drug has no effect). Rejecting a true null hypothesis is the definition of a Type I error. Type I Error: Reject H₀ when H₀ is TRUE (false positive)
Probability of Type I error = α (significance level) Memory trick: "Type I = falsely Innocent declared Guilty" (you wrongly convicted someone). Type II = "Guilty person goes free" (you missed the real effect).

Concept Unit 8 — Inference for Categorical Data (Two-Way Tables)

Chi-Square Test for Independence

Used to test whether two categorical variables are associated. The test statistic is χ² = Σ[(O − E)² / E], where O = observed and E = expected count. df = (r−1)(c−1) for a two-way table with r rows and c columns. Expected count = (row total × column total) / grand total.

📌 Must Memorize

χ² = Σ[(O − E)² / E]
Expected: E = (row total × col total) / n
df = (rows − 1)(columns − 1)
Condition: All expected counts ≥ 5
Large χ² → strong evidence against independence

14 Medium Unit 8

A researcher surveys 200 students and records their gender (Male/Female) and whether they prefer coffee or tea. The data are displayed in a 2×2 table. For a chi-square test of independence, what are the degrees of freedom?

E199

Correct Answer: A

For a chi-square test of independence on a two-way table: df = (number of rows − 1) × (number of columns − 1)
df = (2 − 1) × (2 − 1) = 1 × 1 = 1 For a 2×2 table (2 rows: Male/Female; 2 columns: Coffee/Tea), df = 1. Common trap: students choose n − 1 = 199 (that's for a t-test or χ² goodness-of-fit with 200 categories, which isn't the case here).

Concept Unit 9 — Inference for Quantitative Data: Slopes

Inference for Linear Regression Slope

We can test whether the slope β₁ of the population regression line is zero (meaning no linear relationship). The test statistic is t = b₁ / SE(b₁), with df = n − 2. A CI for β₁ is b₁ ± t* · SE(b₁).

📌 Must Memorize

H₀: β₁ = 0 (no linear relationship)
t = b₁ / SE(b₁), df = n − 2
CI: b₁ ± t* · SE(b₁)
If 0 is NOT in CI → reject H₀ → significant relationship

15 Hard Unit 9

Computer output for a regression of y on x based on n = 22 observations gives a slope of b₁ = 3.2 with SE(b₁) = 1.6. At α = 0.05 (two-tailed), which conclusion is correct?

AFail to reject H₀: β₁ = 0; there is no significant linear relationship at α = 0.05 (t = 2.0, p ≈ 0.059).

BReject H₀: β₁ = 0; there is a significant linear relationship at α = 0.05 (t = 2.0, p < 0.05).

CFail to reject H₀; t = 0.5, which is not significant.

DReject H₀; t = 5.0, which is highly significant.

ECannot determine; the sample size is too small for regression inference.

Correct Answer: A

Compute the test statistic: t = b₁ / SE(b₁) = 3.2 / 1.6 = 2.0
df = n − 2 = 22 − 2 = 20 For df = 20 (two-tailed, α = 0.05), the critical value from the t-table is t* = 2.086. Since our computed |t| = 2.0 < 2.086, and the corresponding p-value ≈ 0.059 > 0.05, we fail to reject H₀ at the 5% level. |t| = 2.0 < t*(df=20, α=0.05) = 2.086 → p ≈ 0.059 > 0.05 → Fail to Reject H₀ There is not sufficient evidence of a significant linear relationship at α = 0.05. Answer: A.

Concept Unit 2 — Exploring Two-Variable Data

Residuals & Residual Plots

A residual = Observed y − Predicted ŷ. A residual plot shows residuals on the y-axis and x (or ŷ) on the x-axis. A linear model is appropriate when the residual plot shows no pattern (random scatter). Patterns suggest the linear model is not appropriate.

📌 Must Memorize

Residual = y − ŷ (observed minus predicted)
Good fit → residuals randomly scattered around 0
Pattern in residual plot → linear model NOT appropriate
Curved pattern → consider nonlinear model

16 Medium Unit 2

A researcher fits a linear regression model to data. The residual plot shows a clear U-shaped (curved) pattern. What does this indicate?

AThe linear model is a good fit for the data.

BThe linear model is not appropriate; a nonlinear model may be better.

CThe correlation coefficient r is close to 0.

DThere are influential outliers that must be removed.

EThe sample size is too small to draw any conclusions.

Correct Answer: B

A curved (U-shaped or arch-shaped) pattern in a residual plot is a clear diagnostic signal that the relationship between x and y is not linear. The linear model is systematically over- or under-predicting at different values of x. Good linear fit → residuals scattered randomly (no pattern)
Curved residual plot → relationship is nonlinear → linear model inappropriate A well-fitted linear model should show residuals randomly scattered around the horizontal line at 0, with no discernible pattern. A U-shape suggests a quadratic or exponential model might better capture the relationship.

Concept Unit 4 — Probability

Binomial Distribution

A Binomial setting requires: Binary outcomes (S/F), fixed number of trials n, Independent trials, same probability of success p (BSIP). X ~ B(n, p): P(X = k) = C(n,k) · pᵏ · (1−p)ⁿ⁻ᵏ. Mean = np; SD = √(np(1−p)).

📌 Must Memorize

BSIP: Binary, Same p, Independent, fixed n
P(X=k) = C(n,k) · pᵏ(1−p)ⁿ⁻ᵏ
μ = np
σ = √(np(1−p))

17 Hard Unit 4

Suppose 30% of emails received by a company are spam. In a sample of 10 randomly selected emails, what is the probability that exactly 3 are spam? (Note: C(10,3) = 120)

AApproximately 0.267

BApproximately 0.300

CApproximately 0.057

DApproximately 0.900

EApproximately 0.233

Correct Answer: A

This is a Binomial setting: n = 10, p = 0.30, X = number of spam emails. We want P(X = 3): P(X = 3) = C(10,3) · (0.30)³ · (0.70)⁷
= 120 · 0.027 · 0.0823543
= 120 · 0.002223566
≈ 0.2668 ≈ 0.267 Detailed calculation: (0.30)³ = 0.027; (0.70)⁷ = 0.0823543; 120 × 0.027 × 0.0823543 ≈ 0.2668. Answer: A ≈ 0.267.

Concept Unit 6 — Two-Proportion Z-Test

Comparing Two Proportions

To test H₀: p₁ = p₂, use the combined (pooled) proportion p̂_c = (x₁ + x₂)/(n₁ + n₂). The test statistic is z = (p̂₁ − p̂₂) / SE, where SE = √[p̂_c(1−p̂_c)(1/n₁ + 1/n₂)].

📌 Must Memorize

H₀: p₁ = p₂ (use pooled p̂_c)
p̂_c = (x₁ + x₂) / (n₁ + n₂)
z = (p̂₁ − p̂₂) / √[p̂_c(1−p̂_c)(1/n₁ + 1/n₂)]
CI for (p₁−p₂): use individual p̂₁, p̂₂ (NOT pooled)

18 Hard Unit 6

In City A, 150 out of 300 voters support a ballot measure. In City B, 120 out of 200 voters support it. A researcher tests H₀: p_A = p_B vs. Hₐ: p_A ≠ p_B. What is the pooled sample proportion p̂_c?

A0.50

B0.54

C0.55

D0.60

E0.575

Correct Answer: B

The pooled proportion combines the successes and sample sizes from both groups: p̂_c = (x_A + x_B) / (n_A + n_B)
= (150 + 120) / (300 + 200)
= 270 / 500
= 0.54 The pooled proportion is 0.54 — answer B. Verify: p̂_A = 150/300 = 0.50; p̂_B = 120/200 = 0.60. The pooled estimate (0.54) lies between these two individual proportions, weighted by sample size.

Concept Unit 5 — Sampling Distributions

Unbiased Estimators & Variability

A statistic is an unbiased estimator of a parameter if its sampling distribution is centered at the parameter value (i.e., the mean of all possible sample statistics equals the true parameter). Reducing variability requires larger sample sizes.

📌 Must Memorize

Unbiased: E(statistic) = parameter
x̄ is an unbiased estimator of μ
p̂ is an unbiased estimator of p
s² is an unbiased estimator of σ²
Larger n → less variability (not less bias)

19 Medium Unit 5

A statistics teacher wants to estimate the mean height of all students in the school. She takes many random samples of size n = 30 and records the sample mean for each. Which of the following BEST describes the center of the sampling distribution of x̄?

AIt is equal to the mean of the largest sample taken.

BIt is equal to the population mean μ, making x̄ an unbiased estimator.

CIt is slightly larger than μ because samples tend to overestimate the mean.

DIt depends on the shape of the population distribution.

EIt cannot be determined without knowing the population standard deviation.

Correct Answer: B

The sampling distribution of x̄ is always centered at the population mean μ, regardless of the population distribution or sample size. This means x̄ is an unbiased estimator of μ. E(x̄) = μ (always — this is what "unbiased" means)
SD(x̄) = σ/√n (decreases as n increases) Key distinction: The center (μ) doesn't depend on n. The variability (σ/√n) does depend on n. A larger n makes the distribution more concentrated around μ, but the center remains at μ regardless.

Concept Unit 8 — Chi-Square Goodness of Fit

Chi-Square Goodness-of-Fit Test

Used to test whether a single categorical variable follows a hypothesized distribution. df = (number of categories − 1). All expected counts must be ≥ 5. A large χ² provides evidence against the stated distribution.

📌 Must Memorize

H₀: Data follow the specified distribution
χ² = Σ[(O − E)² / E]
df = (number of categories) − 1
χ² is always ≥ 0; right-tailed test only
Expected = n × stated proportion

20 Hard Unit 8

A die is rolled 120 times. A researcher tests whether the die is fair. The observed frequencies for faces 1–6 are recorded. Under the null hypothesis that the die is fair, what is the expected count for each face, and what are the degrees of freedom for the chi-square goodness-of-fit test?

AExpected = 120; df = 5

BExpected = 20; df = 5

CExpected = 20; df = 6

DExpected = 20; df = 119

EExpected = 6; df = 5

Correct Answer: B

For a fair die, each of the 6 faces has probability 1/6. Expected count per face = n × p = 120 × (1/6) = 20

df = (number of categories) − 1 = 6 − 1 = 5 All expected counts = 20 ≥ 5 ✓, so the condition for the chi-square test is met. If the observed counts deviate substantially from 20, the χ² statistic will be large, providing evidence that the die is not fair. With df = 5 and α = 0.05, the critical value is χ² = 11.07.

Exam Complete

Correct

Incorrect

Time Used

Predicted AP Score

Complete Answer Key & Solutions

Answer Key & Full Solutions