AP Statistics
20 Exam-Style Questions · All 9 Units · Full Solutions
The mean is the arithmetic average and is sensitive to outliers. The median is the middle value and is resistant to outliers. Standard deviation (s) measures typical distance from the mean. The IQR (Q3 − Q1) is the resistant measure of spread. Use mean/SD for symmetric distributions; median/IQR for skewed distributions.
Data: {2, 4, 6, 8, 100}. Which is larger, the mean or the median?
→ Mean = 24, Median = 6. The outlier (100) pulls the mean up. Mean > Median.
A dataset of household incomes in a city is strongly right-skewed. Which of the following correctly describes the relationship between the mean and median?
The Empirical Rule (68–95–99.7 Rule): In a Normal distribution, approximately 68% of data falls within 1 SD, 95% within 2 SD, and 99.7% within 3 SD of the mean. A z-score measures how many standard deviations an observation is from the mean.
Heights: μ = 68 in, σ = 3 in. What % are between 62 and 74 inches?
→ 62 = μ − 2σ, 74 = μ + 2σ → Empirical Rule: 95%
The scores on an AP Statistics exam are approximately normally distributed with a mean of 72 and a standard deviation of 8. What is the approximate percentage of students who scored between 56 and 88?
The correlation coefficient r measures the direction and strength of a linear relationship (−1 ≤ r ≤ 1). The least-squares regression line (LSRL) minimizes the sum of squared residuals. The slope b₁ = r(Sᵧ/Sₓ) and it passes through (x̄, ȳ).
r = 0.9, Sᵧ = 5, Sₓ = 2, x̄ = 10, ȳ = 20. Find the LSRL.
→ b₁ = 0.9(5/2) = 2.25; b₀ = 20 − 2.25(10) = −2.5 → ŷ = −2.5 + 2.25x
A regression analysis of hours studied (x) and exam score (y) for 30 students yields the following: x̄ = 5, ȳ = 75, Sₓ = 2, Sᵧ = 10, and r = 0.8. What is the slope of the least-squares regression line?
A simple random sample (SRS) gives every individual an equal chance of selection. Stratified sampling divides the population into groups and randomly samples each. Cluster sampling randomly selects groups. Voluntary response and convenience samples are biased.
A radio station asks listeners to call in and vote for their favorite song. Of the 2,000 callers, 85% voted for Song A. Which type of bias is most likely present in this survey?
A well-designed experiment includes: random assignment of treatments, control groups, and replication. A confounding variable is associated with both the explanatory and response variable. Blocking groups subjects by a known variable before random assignment.
A researcher wants to test whether a new fertilizer increases tomato yield. She plants 40 tomato plants and randomly assigns 20 to receive the new fertilizer and 20 to receive a standard fertilizer. She measures the yield (in pounds) at the end of the season. Which aspect of this design BEST allows the researcher to conclude that any difference in yield is caused by the fertilizer?
For any events A and B: P(A or B) = P(A) + P(B) − P(A and B). Events are independent if P(A|B) = P(A). Conditional probability: P(A|B) = P(A and B) / P(B).
In a statistics class, 60% of students passed the midterm (M), 70% passed the final (F), and 50% passed both. What is the probability that a randomly selected student passed the midterm given that they passed the final?
The expected value (mean) of a discrete random variable X is μₓ = Σ[x · P(X=x)]. The variance is σ²ₓ = Σ[(x − μ)² · P(X=x)]. For linear transformations: if Y = a + bX, then μᵧ = a + bμₓ and σᵧ = |b|σₓ.
A game awards the following payouts based on the result of spinning a wheel:
| Payout ($) | 0 | 5 | 10 | 20 |
|---|---|---|---|---|
| P(X) | 0.50 | 0.25 | 0.20 | 0.05 |
What is the expected payout per spin?
If the sample size n is large enough (generally n ≥ 30), the sampling distribution of x̄ is approximately Normal, regardless of the shape of the population distribution. The mean of the sampling distribution is μ, and the standard deviation (standard error) is σ/√n.
The weight of packages shipped by a courier service has a population mean of 12 lbs and a population standard deviation of 4 lbs. A random sample of 64 packages is selected. What is the standard deviation of the sampling distribution of the sample mean?
For large samples, the sampling distribution of p̂ is approximately Normal with mean p and standard deviation √(p(1−p)/n). Conditions: np ≥ 10 and n(1−p) ≥ 10 (Large Counts Condition).
Suppose 40% of voters in a large city support Candidate A. A pollster takes an SRS of 100 voters. What is the probability that the sample proportion supporting Candidate A is greater than 0.45?
A confidence interval for a population proportion p uses the formula: p̂ ± z* · √(p̂(1−p̂)/n). The margin of error (ME) = z* · SE. A 95% CI uses z* = 1.96; 90% uses z* = 1.645; 99% uses z* = 2.576.
A survey of 400 students found that 240 own a laptop. A statistician constructs a 95% confidence interval for the true proportion of students who own a laptop. Which of the following is the correct interval?
A p-value is the probability of observing a result at least as extreme as the one obtained, assuming H₀ is true. If p-value ≤ α, reject H₀. The significance level α is set before the test (commonly 0.05).
A researcher conducts a significance test with α = 0.05 and obtains a p-value of 0.032. Which of the following is the correct conclusion?
When σ is unknown, use the t-distribution with df = n − 1. The test statistic is t = (x̄ − μ₀) / (s/√n). Conditions: Random sample, Normal/large sample (n ≥ 30 or data roughly Normal), and 10% condition.
A gym claims that the average weight loss of its members after 3 months is more than 10 lbs. A random sample of 25 members shows x̄ = 11.4 lbs and s = 3.5 lbs. Which of the following is the value of the test statistic for testing H₀: μ = 10 vs. Hₐ: μ > 10?
A Type I error (false positive) occurs when we reject H₀ when it is actually true. Its probability = α. A Type II error (false negative) occurs when we fail to reject H₀ when Hₐ is actually true. Its probability = β. Power = 1 − β = probability of correctly rejecting a false H₀.
A pharmaceutical company tests whether a new drug reduces blood pressure. The null hypothesis is that the drug has no effect. The company concludes the drug is effective, but in reality the drug has no effect. What type of error has been made?
Used to test whether two categorical variables are associated. The test statistic is χ² = Σ[(O − E)² / E], where O = observed and E = expected count. df = (r−1)(c−1) for a two-way table with r rows and c columns. Expected count = (row total × column total) / grand total.
A researcher surveys 200 students and records their gender (Male/Female) and whether they prefer coffee or tea. The data are displayed in a 2×2 table. For a chi-square test of independence, what are the degrees of freedom?
We can test whether the slope β₁ of the population regression line is zero (meaning no linear relationship). The test statistic is t = b₁ / SE(b₁), with df = n − 2. A CI for β₁ is b₁ ± t* · SE(b₁).
Computer output for a regression of y on x based on n = 22 observations gives a slope of b₁ = 3.2 with SE(b₁) = 1.6. At α = 0.05 (two-tailed), which conclusion is correct?
A residual = Observed y − Predicted ŷ. A residual plot shows residuals on the y-axis and x (or ŷ) on the x-axis. A linear model is appropriate when the residual plot shows no pattern (random scatter). Patterns suggest the linear model is not appropriate.
A researcher fits a linear regression model to data. The residual plot shows a clear U-shaped (curved) pattern. What does this indicate?
A Binomial setting requires: Binary outcomes (S/F), fixed number of trials n, Independent trials, same probability of success p (BSIP). X ~ B(n, p): P(X = k) = C(n,k) · pᵏ · (1−p)ⁿ⁻ᵏ. Mean = np; SD = √(np(1−p)).
Suppose 30% of emails received by a company are spam. In a sample of 10 randomly selected emails, what is the probability that exactly 3 are spam? (Note: C(10,3) = 120)
To test H₀: p₁ = p₂, use the combined (pooled) proportion p̂_c = (x₁ + x₂)/(n₁ + n₂). The test statistic is z = (p̂₁ − p̂₂) / SE, where SE = √[p̂_c(1−p̂_c)(1/n₁ + 1/n₂)].
In City A, 150 out of 300 voters support a ballot measure. In City B, 120 out of 200 voters support it. A researcher tests H₀: p_A = p_B vs. Hₐ: p_A ≠ p_B. What is the pooled sample proportion p̂_c?
A statistic is an unbiased estimator of a parameter if its sampling distribution is centered at the parameter value (i.e., the mean of all possible sample statistics equals the true parameter). Reducing variability requires larger sample sizes.
A statistics teacher wants to estimate the mean height of all students in the school. She takes many random samples of size n = 30 and records the sample mean for each. Which of the following BEST describes the center of the sampling distribution of x̄?
Used to test whether a single categorical variable follows a hypothesized distribution. df = (number of categories − 1). All expected counts must be ≥ 5. A large χ² provides evidence against the stated distribution.
A die is rolled 120 times. A researcher tests whether the die is fair. The observed frequencies for faces 1–6 are recorded. Under the null hypothesis that the die is fair, what is the expected count for each face, and what are the degrees of freedom for the chi-square goodness-of-fit test?
Exam Complete