Simple Linear Regression — Lecture 18 & 21(简单线性回归)
1. Purpose and Intuition of Regression(回归的目的与直觉)
Why Use Regression(为什么使用回归)
- Quantify the relationship between variables.(量化变量之间的关系)
- Support prediction and decision-making.(支持预测与决策)
Dependent and Independent Variables(因变量与自变量)
- Dependent variable y: outcome to be predicted.(因变量 y:被预测结果)
- Independent variable x: explanatory variable.(自变量 x:解释变量)
Scatterplot as First Step(散点图作为第一步)
- Used to visually inspect direction and linearity.(观察方向与线性)
- Regression is meaningful only if a relationship seems plausible.(有关系才建模)
2. Regression Model Framework(回归模型框架)
Population Regression Model(总体回归模型)
- Model form:
y = β₀ + β₁x + ε
- β₀: population intercept(总体截距)
- β₁: population slope(总体斜率)
- ε: random error term(随机误差)
- E(y) = β₀ + β₁x
- Shows the average relationship between x and y.(平均关系)
Sample Regression Equation(样本回归方程)
- Estimated model:
ŷ = b₀ + b₁x
- b₀, b₁ are estimates of β₀, β₁.(样本估计值)
3. Sample Data and Means(样本数据与均值)
Structure of Sample Data(样本结构)
- Observations: (x₁, y₁), (x₂, y₂), … , (xₙ, yₙ)
- Used to estimate unknown parameters.(用于估计参数)
Sample Means(样本均值)
- Mean of x:
x̄ = Σxᵢ / n
- Mean of y:
ȳ = Σyᵢ / n
Key Property(关键性质)
- The regression line always passes through (x̄, ȳ).
- 回归线必定经过样本均值点。
4. Least Squares Method(最小二乘法)
Objective Function(目标函数)
- Minimize the sum of squared errors:
min Σ(yᵢ − ŷᵢ)²
- 最小化预测误差平方和。
Residuals(残差)
- Residual for observation i:
eᵢ = yᵢ − ŷᵢ
- Measures prediction error.(预测误差)
Why Squared Errors(为什么平方)
- Prevent positive and negative errors from canceling out.
- Penalize large errors more heavily.(惩罚大误差)
5. Estimation of Regression Coefficients(回归系数估计)
- b₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
- Measures average change in y for a one-unit increase in x.
- 表示 x 每增加 1,y 的平均变化量。
Interpretation of b₁(斜率解释)
- b₁ > 0 → positive relationship(正相关)
- b₁ < 0 → negative relationship(负相关)
- b₀ = ȳ − b₁x̄
- Determines vertical position of regression line.
- 在 x = 0 不现实的情况下通常无经济含义。
6. Estimated Regression Equation(估计回归方程)
General Prediction Equation(预测公式)
- ŷ = b₀ + b₁x
- Used for prediction and explanation.(预测与解释)
Example from Lecture(课件示例)
- ŷ = 10 + 5x
- Each additional TV ad increases expected sales by 5 cars.
- 每多 1 条广告,销量平均增加 5 辆。
Prediction Usage(预测用途)
- Plug in new x values to estimate y.
- 用于商业决策与规划。
7. Variance Decomposition(变异分解)
Total Sum of Squares SST(总变异)
- SST = Σ(yᵢ − ȳ)²
- Measures total variation in y.(y 的总波动)
Explained Sum of Squares SSR(解释变异)
- SSR = Σ(ŷᵢ − ȳ)²
- Variation explained by regression model.(模型解释部分)
Error Sum of Squares SSE(误差变异)
- SSE = Σ(yᵢ − ŷᵢ)²
- Unexplained variation.(未解释误差)
Decomposition Identity(分解恒等式)
- SST = SSR + SSE
- Regression always partitions total variation this way.
8. Coefficient of Determination R²(决定系数)
Definition of R²(定义)
- R² = SSR / SST
- Proportion of y variance explained by x.
- y 的变异中被模型解释的比例。
Interpretation of R²(解释)
- R² = 0.8772 → 87.72% explained.
- Higher R² means better model fit.(拟合更好)
Limitation(局限)
- High R² does NOT imply causation.(不代表因果)
9. Correlation Coefficient r(相关系数)
Relationship Between r and R²(r 与 R² 的关系)
- r = (sign of b₁) √R²
- Sign determined by slope direction.(符号由斜率决定)
Range and Meaning(取值范围与含义)
- −1 ≤ r ≤ 1
- |r| close to 1 → strong linear relationship.
Example from Lecture(课件示例)
- b₁ > 0
- r = +√0.8772 = 0.9366
- Extremely strong positive correlation.(极强正相关)
10. Steps to Develop a Regression Model(建立回归模型的步骤)
Step 1: Explore Relationship(第一步:探索关系)
- Use scatterplot and experience.
- 判断是否值得建模。
Step 2: Estimate Model(第二步:估计模型)
- Compute b₀ and b₁ using least squares.
Step 3: Evaluate Model(第三步:评估模型)
- Use R² to assess explanatory power.
- Use r to assess linear strength.
Step 4: Predict and Decide(第四步:预测与决策)
- Apply ŷ = b₀ + b₁x to new data.
- Support business decisions.
11. Final Takeaways(核心总结)
- ŷ = b₀ + b₁x
- b₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
- b₀ = ȳ − b₁x̄
- SST = SSR + SSE
- R² = SSR / SST
- r = (sign of b₁) √R²
Conceptual Summary(概念总结)
- Regression quantifies relationships.
- Least squares finds the best-fitting line.
- R² explains variance, r explains strength.
- Correlation does not imply causation.