Slide 1 — Statistical inference about the difference of population means(第1页——总体均值差的统计推断)

Knowledge Points (知识点)

  1. Inference for the difference of two population means(两个总体均值差的统计推断)
  2. Case 1: population standard deviations σ₁ and σ₂ known(情形一:已知总体标准差 σ₁、σ₂)
  3. Case 2: population standard deviations σ₁ and σ₂ unknown(情形二:未知总体标准差 σ₁、σ₂)

Explanation(解释)

  • We study how to use two independent samples to make inferences about the difference in population means .
  • There are two main scenarios:
    • When both population standard deviations and are known, we use the z-distribution.
    • When at least one of is unknown, we estimate them using sample standard deviations and use t-based methods.
  • The goal is to construct confidence intervals and perform hypothesis tests about .

Example(例子)

  • Suppose we compare average spending of Group 1 (students who mainly shop online) and Group 2 (students who mainly shop offline).
  • We want to know whether the population means differ, that is, whether .
  • Depending on whether are known or unknown, we choose different formulas and distributions, but the target parameter is always .

Extension(拓展)

  • This framework applies to many business problems: comparing two products, two branches, or two marketing strategies.
  • Later sections will also discuss assumptions such as independence, normality, and large-sample approximations that justify using z or t procedures.

Summary(小结)

  • This chapter introduces two-sample inference for the difference in means, with separate methods for known and unknown population standard deviations.

Slide 2 — Notation for two-population mean comparison(第2页——两个总体均值比较的符号约定)

Knowledge Points (知识点)

  1. Population parameters (总体参数:均值与标准差)
  2. Sample statistics (样本统计量:均值与样本量)
  3. Difference of population means and difference of sample means (总体均值差与样本均值差)

Explanation(解释)

  • Population 1 has mean and standard deviation ; Population 2 has mean and standard deviation .
  • We draw independent samples:
    • Sample 1: size , sample mean from Population 1.
    • Sample 2: size , sample mean from Population 2.
  • The parameter of interest is the difference in population means .
  • The point estimator of is the difference in sample means .

Example(例子)

  • Population 1: monthly online spending of all business students; Population 2: monthly online spending of all non-business students.
  • We sample business students and non-business students and compute their sample means .
  • We will use to estimate .

Extension(拓展)

  • The notation extends naturally to paired-sample designs, but here we focus on independent samples.
  • Clear notation helps when we later derive the sampling distribution of and construct confidence intervals.

Summary(小结)

  • We distinguish clearly between population parameters and sample statistics, and recognize as the point estimator of .

Slide 3 — Distribution of the difference of sample means(第3页——样本均值差的分布)

Knowledge Points (知识点)

  1. Expected value of (样本均值差的期望)
  2. Standard deviation / standard error of when are known(已知总体标准差时的标准差/标准误)
  3. Role of sample sizes (样本量对变异性的影响)

Explanation(解释)

  • Under the usual assumptions (independent samples, each from a population with mean and variance ), the expected value of the difference in sample means is
  • When the population standard deviations are known, the standard deviation (also called standard error) of is
  • Larger sample sizes make the standard error smaller, leading to more precise estimates.
  • 总体标准差已知时,样本均值差 标准差/标准误
  • 样本量 越大,标准误越小,估计就越精确。

Example(例子)

  • Let .
  • Then
  • This tells us the typical sampling variation of the difference in sample means.
  • 该值表示样本均值差在重复抽样中的典型波动大小。

Extension(拓展)

  • When both populations are normal or when sample sizes are large, the distribution of is approximately normal with mean and standard deviation .
  • This normality justifies using z-based confidence intervals and hypothesis tests in the “σ known” case.

Summary(小结)

  • The difference in sample means is an unbiased estimator of the difference in population means, and its standard error depends on both population variances and sample sizes.

Slide 4 — Confidence interval for when σ₁, σ₂ are known(第4页——已知 σ₁、σ₂ 时的均值差置信区间)

Knowledge Points (知识点)

  1. Point estimate (总体均值差的点估计)
  2. confidence interval formula with known σ₁, σ₂(已知总体标准差时的置信区间公式)
  3. Meaning of significance level in a two-tailed interval(双侧置信区间中的显著性水平)

Explanation(解释)

  • The point estimate of the difference in population means is
  • When are known and the sampling distribution of is normal, a confidence interval for is
  • Here is the critical value from the standard normal distribution such that the two tails together have area .
  • 已知且 近似正态时,置信区间
  • 其中 是标准正态分布的临界值,使得两侧尾部的总面积为

Example(例子)

  • Suppose .
  • For a 95% confidence interval, and .
  • The standard error is
  • The interval is
  • We are 95% confident that lies between 3.46 and 16.54.
  • 置信区间为
  • 我们有 95% 的把握认为,总体均值差 介于 3.46 与 16.54 之间。

Extension(拓展)

  • The same formula can be adapted for one-sided intervals by using instead of .
  • The structure parallels the one-sample z-interval, but now the standard error combines the variability from both populations.

Summary(小结)

  • With known population standard deviations, the confidence interval for is built from the point estimate plus/minus a z-critical value times the combined standard error.

Slide 5 — Confidence interval for μ₁ − μ₂(第5页——均值差置信区间)

Knowledge Points (知识点)

  1. Point estimate for (总体均值差的点估计)
  2. Confidence interval formula with known (已知总体标准差时的置信区间公式)
  3. Significance level and two-tailed intervals(显著性水平与双侧区间)

Explanation(解释)

  • The point estimate of the difference in population means is
  • When population standard deviations are known and samples are independent, a confidence interval for is
  • Here is the critical value from the standard normal distribution such that each tail has probability .
  • is the significance level for a two-tailed interval: it is the total probability outside the interval.
  • 当总体标准差 已知且样本相互独立时,置信区间
  • 其中 是标准正态分布的临界值,使得每个尾部的概率为
  • 是双侧置信区间的显著性水平,表示区间之外的总概率。

Example(例子)

  • For a 95% confidence interval, we set .
  • The corresponding critical value is .
  • Any confidence interval using 95% confidence will have the form

Extension(拓展)

  • If the hypothesized difference (often ) lies outside the confidence interval, we will later reject the null hypothesis in a two-tailed test.
  • Thus confidence intervals and hypothesis tests are closely connected.

Summary(小结)

  • With known , the confidence interval for is constructed by taking the point estimate plus/minus a z-critical value times the combined standard error.

Slide 6 — Example setup: ABC company vs competitor(第6页——示例设定:ABC 公司与竞争对手)

Knowledge Points (知识点)

  1. Comparing two population means using sample information(用样本信息比较两个总体均值)
  2. Identifying from a table(从表格中识别参数)
  3. Significance level in practice(实际问题中的显著性水平)

Explanation(解释)

  • The ABC company compares the average life of its own product (Sample 1) with that of a competitor (Sample 2).
  • Both samples are independent, and population standard deviations are assumed known.

Example(例子)

Data table(数据表)

Sample 1 (ABC)Sample 2 (Competitor)
Sample size 120 units80 units
Sample mean 275 min258 min
Standard deviation 15 min20 min
  • Significance level: .

Extension(拓展)

  • The question “Is there any difference?” corresponds to testing whether

or estimating a confidence interval to see if is included.

或者构造置信区间,看 是否落在区间之内。


Summary(小结)

  • This example provides realistic sample data and a chosen significance level, allowing us to compute a confidence interval for and judge whether ABC’s product differs from its competitor’s.

Slide 7 — Example calculation: confidence interval and conclusion(第7页——示例计算:置信区间与结论)

Knowledge Points (知识点)

  1. Computing the standard error of (计算样本均值差的标准误)
  2. Finding the confidence interval numerically(数值上求出置信区间)
  3. Using the interval to judge significance(利用置信区间判断显著性)

Explanation(解释)

  • From the ABC example, the point estimate is
  • The standard error of is
  • With , the critical value is .
  • The confidence interval is
  • 样本均值差的标准误为
  • 显著性水平 时,临界值
  • 置信区间为

Example(例子):Interpretation(区间解释)

  • Because the entire interval is above 0, we can say:
    • At the 5% significance level, there is significant evidence that ABC’s product has a larger mean life than its competitor’s.
    • The estimated difference in mean life is between about 12 and 22 minutes.

Extension(拓展)

  • If 0 had been contained in the interval, we would conclude that the data are consistent with no difference in mean lifetimes.
  • This logic is equivalent to performing a two-tailed hypothesis test with null hypothesis .

Summary(小结)

  • For the ABC example, the 95% confidence interval shows a positive difference far from 0, indicating that ABC’s product performs significantly better than the competitor’s in terms of mean life.

Slide 8 — Hypothesis tests about μ₁ − μ₂ with known σ₁, σ₂(第8页——已知 σ₁、σ₂ 时的均值差假设检验)

Knowledge Points (知识点)

  1. Null and alternative hypotheses for comparing two means(比较两个均值的原假设与备择假设)
  2. Three types of tests: left-tailed, right-tailed, two-tailed(三种检验形式:左尾、右尾、双尾)
  3. z test statistic for when σ’s are known(已知总体标准差时的 z 检验统计量)

Explanation(解释)

Hypotheses(假设形式)

  • Left-tailed test (testing if population 1 mean is smaller):
  • Right-tailed test (testing if population 1 mean is larger):
  • Two-tailed test (testing if there is any difference):
  • Test statistic (known ):
  • 右尾检验(检验总体 1 均值是否更大):
  • 双尾检验(检验是否存在差异):
  • 已知时,检验统计量为

Example(例子)

  • For the ABC company, to test whether there is any difference, we set and use the two-tailed hypotheses:
  • The test statistic becomes

which is far into the rejection region for .

  • 检验统计量为

远大于 时的临界值 ,因此拒绝原假设。


Extension(拓展)

  • Decision rules:
    • Left-tailed: reject if .
    • Right-tailed: reject if .
    • Two-tailed: reject if .
  • p-value methods lead to equivalent conclusions.

Summary(小结)

  • Hypothesis tests about specify a null value , choose the appropriate tail form, and use the z statistic based on the combined standard error when are known.

Slide 9 — One-sided test: ABC product vs competitor (p-value)(第9页——单侧检验:ABC 产品与竞争对手(p 值法))

Knowledge Points (知识点)

  1. Right-tailed test for difference of means(均值差的右尾检验)
  2. p-value vs. significance level (p 值与显著性水平的比较)
  3. Interpretation: “significantly higher” vs “not higher”(“显著更高”的解释)

Explanation(解释)

  • Question: Is ABC’s mean product life higher than the competitor’s?
  • We use a right-tailed test for the difference of two population means.

Step 1: Hypotheses

  • : mean life of ABC’s product
  • : mean life of competitor’s product

Step 2: Significance level

Step 3: Test statistic and p-value

  • For , the p-value < 0.001 (very close to 0).

Decision rule (right-tailed):
If , reject .

Since , we reject and support .

  • 其中 为 ABC 产品的总体平均寿命, 为竞争对手产品的总体平均寿命。
    步骤 2:显著性水平

步骤 3:计算检验统计量与 p 值

  • 对应的 p 值 < 0.001,远小于 0.01。
  • 决策规则(右尾):若 ,则拒绝
  • 因为 ,所以拒绝 ,支持

Example(例子)结论

  • We conclude that ABC’s mean life is significantly higher than the competitor’s at the 1% significance level.

Extension(拓展)

  • The p-value approach does not require computing the critical value.
  • It directly tells how extreme the observed is under .

Summary(小结)

  • For a one-sided right-tailed test, if the computed p-value is smaller than , we conclude that population 1’s mean is significantly larger than population 2’s mean.

Slide 10 — One-sided test: ABC vs competitor (critical value)(第10页——单侧检验:ABC 与竞争对手(临界值法))

Knowledge Points (知识点)

  1. z-critical value for right-tailed test(右尾检验的临界值
  2. Compare test statistic with (用 比较做决策)
  3. Connection to p-value approach(与 p 值法的一致性)

Explanation(解释)

Same hypotheses and data as Slide 9:

  • Test statistic is still .

Critical value for right-tailed test

Decision rule (right-tailed):
Reject if .

Since

we reject and conclude that ABC’s mean is significantly higher.

  • 检验统计量仍为
    右尾检验的临界值
  • 决策规则:若 ,则拒绝
  • 因为

所以拒绝 ,认为 ABC 产品的平均寿命显著更高。


Example(例子)比较两种方法

  • p-value approach (Slide 9): compare p-value with .
  • Critical value approach (this slide): compare with .
  • Both give the same conclusion.

Extension(拓展)

  • For a left-tailed test, we would use .
  • For a two-tailed test, we use as critical values.

Summary(小结)

  • In the critical-value approach, if the standardized test statistic lies in the rejection region (beyond the critical value), we reject ; otherwise, we fail to reject .

Slide 11 — Practice: TOEFL scores of two universities(第11页——练习:两所大学托福成绩比较)

Knowledge Points (知识点)

  1. Two-sample z test for difference in mean scores(两总体均值差的 z 检验)
  2. Setting up hypotheses for “significant difference”(“是否有显著差异”的假设设定)
  3. Interpreting test results in context(在情境中解释统计结论)

Explanation(解释)

We compare TOEFL scores of Newland University and ABC University.

Data table(数据表)

GroupScore ( )Standard deviation ( )Sample size ( )
Newland University1031550
ABC University961050
  • Significance level: .

Step 1: Hypotheses

“Is there significant difference?” → two-tailed test

where

  • : mean TOEFL score of Newland students,
  • : mean TOEFL score of ABC students.

Step 2: Test statistic

Point estimate:

Standard error:

z-score:

Step 3: Decision

  • For a two-tailed test with , critical values:
  • Since , we reject .
  • :Newland 学生托福平均分;:ABC 学生托福平均分。
    步骤 2:检验统计量

步骤 3:决策

  • 双尾检验、 时临界值为
  • 因为 ,所以拒绝

Example(例子)结论

  • There is a significant difference in mean TOEFL scores between Newland and ABC at the 5% level.
  • Since , Newland students have higher average TOEFL scores.

Extension(拓展)

  • We could also construct a 95% confidence interval for :
  • Because 0 is not in this interval, the confidence-interval approach gives the same conclusion as the hypothesis test.
  • 由于区间不包含 0,与假设检验得到的“有显著差异”结论一致。

Summary(小结)

  • For the TOEFL example, we used a two-sample z test (and an equivalent confidence interval) to show that Newland University’s mean TOEFL score is significantly higher than ABC University’s at .