Simple Linear Regression — Lecture 18 & 21(简单线性回归)


1. Purpose and Intuition of Regression(回归的目的与直觉)

Why Use Regression(为什么使用回归)

  • Quantify the relationship between variables.(量化变量之间的关系)
  • Support prediction and decision-making.(支持预测与决策)

Dependent and Independent Variables(因变量与自变量)

  • Dependent variable y: outcome to be predicted.(因变量 y:被预测结果)
  • Independent variable x: explanatory variable.(自变量 x:解释变量)

Scatterplot as First Step(散点图作为第一步)

  • Used to visually inspect direction and linearity.(观察方向与线性)
  • Regression is meaningful only if a relationship seems plausible.(有关系才建模)

2. Regression Model Framework(回归模型框架)

Population Regression Model(总体回归模型)

  • Model form:
    y = β₀ + β₁x + ε
  • β₀: population intercept(总体截距)
  • β₁: population slope(总体斜率)
  • ε: random error term(随机误差)

Expected Value Form(期望形式)

  • E(y) = β₀ + β₁x
  • Shows the average relationship between x and y.(平均关系)

Sample Regression Equation(样本回归方程)

  • Estimated model:
    ŷ = b₀ + b₁x
  • b₀, b₁ are estimates of β₀, β₁.(样本估计值)

3. Sample Data and Means(样本数据与均值)

Structure of Sample Data(样本结构)

  • Observations: (x₁, y₁), (x₂, y₂), … , (xₙ, yₙ)
  • Used to estimate unknown parameters.(用于估计参数)

Sample Means(样本均值)

  • Mean of x:
    x̄ = Σxᵢ / n
  • Mean of y:
    ȳ = Σyᵢ / n

Key Property(关键性质)

  • The regression line always passes through (x̄, ȳ).
  • 回归线必定经过样本均值点。

4. Least Squares Method(最小二乘法)

Objective Function(目标函数)

  • Minimize the sum of squared errors:
    min Σ(yᵢ − ŷᵢ)²
  • 最小化预测误差平方和。

Residuals(残差)

  • Residual for observation i:
    eᵢ = yᵢ − ŷᵢ
  • Measures prediction error.(预测误差)

Why Squared Errors(为什么平方)

  • Prevent positive and negative errors from canceling out.
  • Penalize large errors more heavily.(惩罚大误差)

5. Estimation of Regression Coefficients(回归系数估计)

Formula for Slope b₁(斜率公式)

  • b₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
  • Measures average change in y for a one-unit increase in x.
  • 表示 x 每增加 1,y 的平均变化量。

Interpretation of b₁(斜率解释)

  • b₁ > 0 → positive relationship(正相关)
  • b₁ < 0 → negative relationship(负相关)

Formula for Intercept b₀(截距公式)

  • b₀ = ȳ − b₁x̄
  • Determines vertical position of regression line.
  • 在 x = 0 不现实的情况下通常无经济含义。

6. Estimated Regression Equation(估计回归方程)

General Prediction Equation(预测公式)

  • ŷ = b₀ + b₁x
  • Used for prediction and explanation.(预测与解释)

Example from Lecture(课件示例)

  • ŷ = 10 + 5x
  • Each additional TV ad increases expected sales by 5 cars.
  • 每多 1 条广告,销量平均增加 5 辆。

Prediction Usage(预测用途)

  • Plug in new x values to estimate y.
  • 用于商业决策与规划。

7. Variance Decomposition(变异分解)

Total Sum of Squares SST(总变异)

  • SST = Σ(yᵢ − ȳ)²
  • Measures total variation in y.(y 的总波动)

Explained Sum of Squares SSR(解释变异)

  • SSR = Σ(ŷᵢ − ȳ)²
  • Variation explained by regression model.(模型解释部分)

Error Sum of Squares SSE(误差变异)

  • SSE = Σ(yᵢ − ŷᵢ)²
  • Unexplained variation.(未解释误差)

Decomposition Identity(分解恒等式)

  • SST = SSR + SSE
  • Regression always partitions total variation this way.

8. Coefficient of Determination R²(决定系数)

Definition of R²(定义)

  • R² = SSR / SST
  • Proportion of y variance explained by x.
  • y 的变异中被模型解释的比例。

Interpretation of R²(解释)

  • R² = 0.8772 → 87.72% explained.
  • Higher R² means better model fit.(拟合更好)

Limitation(局限)

  • High R² does NOT imply causation.(不代表因果)

9. Correlation Coefficient r(相关系数)

Relationship Between r and R²(r 与 R² 的关系)

  • r = (sign of b₁) √R²
  • Sign determined by slope direction.(符号由斜率决定)

Range and Meaning(取值范围与含义)

  • −1 ≤ r ≤ 1
  • |r| close to 1 → strong linear relationship.

Example from Lecture(课件示例)

  • b₁ > 0
  • r = +√0.8772 = 0.9366
  • Extremely strong positive correlation.(极强正相关)

10. Steps to Develop a Regression Model(建立回归模型的步骤)

Step 1: Explore Relationship(第一步:探索关系)

  • Use scatterplot and experience.
  • 判断是否值得建模。

Step 2: Estimate Model(第二步:估计模型)

  • Compute b₀ and b₁ using least squares.

Step 3: Evaluate Model(第三步:评估模型)

  • Use R² to assess explanatory power.
  • Use r to assess linear strength.

Step 4: Predict and Decide(第四步:预测与决策)

  • Apply ŷ = b₀ + b₁x to new data.
  • Support business decisions.

11. Final Takeaways(核心总结)

Key Formulas to Remember(必须记住的公式)

  • ŷ = b₀ + b₁x
  • b₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
  • b₀ = ȳ − b₁x̄
  • SST = SSR + SSE
  • R² = SSR / SST
  • r = (sign of b₁) √R²

Conceptual Summary(概念总结)

  • Regression quantifies relationships.
  • Least squares finds the best-fitting line.
  • R² explains variance, r explains strength.
  • Correlation does not imply causation.