Lecture 6 — Numerical Measures (第六讲——数值度量)


1. Roadmap & Objectives (路线图与目标)

Scope (范围)

  • Location measures: mean, median, mode, trimmed mean, percentiles, quartiles
  • 位置度量:均值、中位数、众数、截尾均值、百分位数、四分位数

Outcomes (学习目标)

  • Compute & interpret location measures for business data
  • 计算并解释商业数据的多种位置度量
  • Choose appropriate measures under skewness/outliers
  • 在偏态与极端值情形下选择合适度量

2. Sample vs. Population (样本与总体)

Definitions (定义)

  • Sample: subset of population used for analysis
  • 样本:用于分析的总体子集
  • Population: all elements of interest
  • 总体:研究关注的全部对象

Statistics vs. Parameters (统计量与参数)

  • Sample statistics estimate population parameters (point estimation)
  • 样本统计量用于估计总体参数(点估计)

Notation (符号)

  • Sample mean \bar{x} vs Population mean \mu
  • 样本均值 \bar{x} 与总体均值 \mu

3. Measures of Location (位置度量)

Mean (均值)

  • Definition: arithmetic average of all observations
  • 定义:所有观测值的算术平均
  • Formulas (公式):
    • Sample: \bar{x} = (∑_{i=1}^{n} x_i)/n
    • 样本:\bar{x} = (∑_{i=1}^{n} x_i)/n
    • Population: \mu = (∑_{i=1}^{N} x_i)/N
    • 总体:\mu = (∑_{i=1}^{N} x_i)/N
  • Properties (性质):sensitive to outliers; efficient under symmetry
  • 性质:对极端值敏感;在对称分布下效率高
  • When to use (适用场景):symmetric distributions, cost/price averaging
  • 适用:近似对称的分布、成本/价格平均
  • Excel (函数):=AVERAGE(range)
  • Excel(函数):=AVERAGE(range)
  • Interpretation (解读):represents center of gravity of data
  • 解读:代表“数据重心”的中心位置

Median (中位数)

  • Definition: 50th percentile; middle of ordered data
  • 定义:第50百分位;排序后位于中间
  • Odd/even n (奇偶样本量):
    • Odd: position (n+1)/2
    • 奇数:位置 (n+1)/2
    • Even: average of n/2 and n/2+1
    • 偶数:第 n/2n/2+1 的平均
  • Robustness (稳健性):resists outlier influence
  • 稳健:对极端值不敏感
  • Use cases (应用):income, property values, executive pay
  • 应用:收入、房价、高管薪酬
  • Excel (函数):=MEDIAN(range)
  • Excel(函数):=MEDIAN(range)

Mode (众数)

  • Definition: most frequent value(s)
  • 定义:出现频率最高的数值
  • Types (类型):unimodal, bimodal, multimodal
  • 类型:单峰、双峰、多峰
  • Categorical/Discrete (分类/离散):particularly useful
  • 对分类/离散变量尤为有效
  • Excel (函数):=MODE.SNGL(range) (first mode only)
  • Excel(函数):=MODE.SNGL(range)(仅返回首个众数)

Trimmed Mean (截尾均值)

  • Idea (思想):drop lowest & highest p% then average remaining
  • 思想:去掉最低与最高 p% 后再求平均
  • Steps (步骤):sort → trim → average
  • 步骤:排序→截尾→求均值
  • Trade-off (权衡):more robust than mean; closer to mean than median
  • 权衡:比均值更稳健,结果又比中位数更接近总体均值
  • Typical p (常见 p):5% 或 10%
  • 常见取值:5% 或 10%
  • Excel (函数):=TRIMMEAN(range, 2p)
  • Excel(函数):=TRIMMEAN(range, 2p)

Percentiles (百分位数)

  • Definition (定义):value below which p% of data fall
  • 定义:p% 的数据不超过该数值
  • Position (位置):i = (p/100) * n
  • 位置:i = (p/100) * n
  • Interpolation (插值):if i not integer, average adjacent as needed
  • 插值:i 非整数时在相邻位置间插值/取平均
  • Excel (函数):=PERCENTILE.INC(range, p) / =PERCENTILE.EXC
  • Excel(函数):=PERCENTILE.INC(range, p) / =PERCENTILE.EXC
  • Uses (用途):cutoffs, market segmentation, benchmarks
  • 用途:阈值、市场细分、基准线

Quartiles (四分位数)

  • Relationship (关系):Q1=25th, Q2=50th(=Median), Q3=75th
  • 关系:Q1=25%、Q2=50%(中位数)、Q3=75%
  • Boxplot & IQR (箱线图与四分位距):IQR = Q3 − Q1
  • 箱线图与 IQR:IQR = Q3 − Q1
  • Outliers (异常值界定):often via IQR rules in practice
  • 异常值:实践中常结合 IQR 规则识别

4. Worked Example — Apartment Rents (完整案例——公寓租金)

Data Overview (数据概览)

  • n = 70 efficiency apartments; range 425–615
  • 70 套单间,租金范围 425–615
  • Sum = 34,356; Mean = 490.80
  • 总和 34,356;均值 490.80

Median/Mode (中位数/众数)

  • Median = 475 (35th & 36th in ordered list)
  • 中位数 = 475(排序后第 35 与 36 个值的平均)
  • Mode = 450 (most frequent)
  • 众数 = 450(频数最高)

Percentiles (百分位数示例)

  • 80th: i = 0.80*70 = 56 → avg of 56th & 57th = (535+549)/2 = 542
  • 第 80 百分位:i = 0.80*70 = 56 → 取第 56 与 57 个平均 (535+549)/2 = 542

Quartile (四分位数示例)

  • Q3 (75th): i = 0.75*70 = 52.5 → 取第 53 个值 = 525
  • Q3(第 75 百分位):i = 0.75*70 = 52.5 → 第 53 个观测值 = 525

Interpretation (解读)

  • Typical rent ≈ 491; half ≤ 475; common price point at 450
  • 典型租金约 491;一半 ≤ 475;450 为常见价位
  • Top 20% rents ≥ ~542 → premium segment
  • 顶部 20% 租金 ≥ ~542 → 高端市场分段

5. Computation Workflow (计算流程)

Clean & Order (清洗与排序)

  • Remove errors/duplicates; sort ascending
  • 去错/去重;升序排列

Choose Measure (选择度量)

  • Symmetric: mean; Skewed/outliers: median/trimmed mean
  • 对称分布:均值;偏态/极端值:中位数或截尾均值

Calculate & Verify (计算与核验)

  • Cross-check manual vs Excel outputs
  • 手算与 Excel 结果交叉核验
  • Record positions (for percentiles/quartiles)
  • 记录位置索引(用于百分位/四分位)

6. Excel Implementation (Excel 实现)

Functions (函数速览)

  • Mean: =AVERAGE(B2:B71)
  • 均值:=AVERAGE(B2:B71)
  • Median: =MEDIAN(B2:B71)
  • 中位数:=MEDIAN(B2:B71)
  • Mode: =MODE.SNGL(B2:B71)
  • 众数:=MODE.SNGL(B2:B71)
  • Percentile: =PERCENTILE.INC(B2:B71, 0.80)
  • 百分位:=PERCENTILE.INC(B2:B71, 0.80)
  • Trimmed mean: =TRIMMEAN(B2:B71, 0.10) (5% each tail)
  • 截尾均值:=TRIMMEAN(B2:B71, 0.10)(两端各 5%)

Tips (技巧)

  • Use absolute references for reusable formulas
  • 使用绝对引用便于批量复用
  • Document assumptions (INC vs EXC)
  • 记录假设(含端/不含端)

7. Interpretation & Decision Logic (解读与决策逻辑)

Market Insights (市场洞察)

  • Mean for budgeting; Median for affordability; High percentiles for premium pricing
  • 均值用于预算;中位数用于可负担性;高百分位用于高端定价

Policy/Management (政策/管理)

  • Identify rent thresholds for subsidies or caps
  • 设定补贴/限价阈值
  • Track shifts in median/mode over time
  • 追踪中位数/众数的时间变化

8. Image/Table Insights (图片/表格要点)

What to Read (阅读要点)

  • Confirm ordering before median/percentiles
  • 计算中位数/百分位数前确认已排序
  • Locate exact positions (i) and adjacent values
  • 精确定位 i 值与相邻观测
  • Cross-highlight frequent values for mode
  • 高亮频繁值以识别众数

Quality Checks (质量核验)

  • Range sanity (min/max) and unusual spikes
  • 检查范围(最小/最大)与异常跳点
  • Consistency between list, totals, and Excel outputs
  • 列表、总和与 Excel 结果一致性

9. Quick Reference (速查表)

Key Numbers (关键数值)

  • Mean = 490.80; Median = 475; Mode = 450
  • 均值 490.80;中位数 475;众数 450
  • 80th ≈ 542; Q3 = 525
  • 第 80 百分位 ≈ 542;Q3 = 525

Core Formulas (核心公式)

  • \bar{x} = (∑ x_i)/n, \mu = (∑ x_i)/N
  • \bar{x} = (∑ x_i)/n\mu = (∑ x_i)/N
  • i = (p/100)*n (percentile position)
  • i = (p/100)*n(百分位位置)

10. Summary & Takeaways (总结与要点)

Selection Matters (选择很重要)

  • Use measure aligned with distribution & decision need
  • 度量选择需匹配分布形态与决策需求

Case-Driven (案例驱动)

  • Apartment rents illustrate full pipeline: compute → verify → interpret → decide
  • 公寓租金案例贯穿流程:计算→核验→解读→决策