Lecture 6 — Numerical Measures (第六讲——数值度量)
1. Roadmap & Objectives (路线图与目标)
Scope (范围)
- Location measures: mean, median, mode, trimmed mean, percentiles, quartiles
- 位置度量:均值、中位数、众数、截尾均值、百分位数、四分位数
Outcomes (学习目标)
- Compute & interpret location measures for business data
- 计算并解释商业数据的多种位置度量
- Choose appropriate measures under skewness/outliers
- 在偏态与极端值情形下选择合适度量
2. Sample vs. Population (样本与总体)
Definitions (定义)
- Sample: subset of population used for analysis
- 样本:用于分析的总体子集
- Population: all elements of interest
- 总体:研究关注的全部对象
Statistics vs. Parameters (统计量与参数)
- Sample statistics estimate population parameters (point estimation)
- 样本统计量用于估计总体参数(点估计)
Notation (符号)
- Sample mean
\bar{x} vs Population mean \mu
- 样本均值
\bar{x} 与总体均值 \mu
3. Measures of Location (位置度量)
Mean (均值)
- Definition: arithmetic average of all observations
- 定义:所有观测值的算术平均
- Formulas (公式):
- Sample:
\bar{x} = (∑_{i=1}^{n} x_i)/n
- 样本:
\bar{x} = (∑_{i=1}^{n} x_i)/n
- Population:
\mu = (∑_{i=1}^{N} x_i)/N
- 总体:
\mu = (∑_{i=1}^{N} x_i)/N
- Properties (性质):sensitive to outliers; efficient under symmetry
- 性质:对极端值敏感;在对称分布下效率高
- When to use (适用场景):symmetric distributions, cost/price averaging
- 适用:近似对称的分布、成本/价格平均
- Excel (函数):
=AVERAGE(range)
- Excel(函数):
=AVERAGE(range)
- Interpretation (解读):represents center of gravity of data
- 解读:代表“数据重心”的中心位置
- Definition: 50th percentile; middle of ordered data
- 定义:第50百分位;排序后位于中间
- Odd/even n (奇偶样本量):
- Odd: position
(n+1)/2
- 奇数:位置
(n+1)/2
- Even: average of
n/2 and n/2+1
- 偶数:第
n/2 与 n/2+1 的平均
- Robustness (稳健性):resists outlier influence
- 稳健:对极端值不敏感
- Use cases (应用):income, property values, executive pay
- 应用:收入、房价、高管薪酬
- Excel (函数):
=MEDIAN(range)
- Excel(函数):
=MEDIAN(range)
Mode (众数)
- Definition: most frequent value(s)
- 定义:出现频率最高的数值
- Types (类型):unimodal, bimodal, multimodal
- 类型:单峰、双峰、多峰
- Categorical/Discrete (分类/离散):particularly useful
- 对分类/离散变量尤为有效
- Excel (函数):
=MODE.SNGL(range) (first mode only)
- Excel(函数):
=MODE.SNGL(range)(仅返回首个众数)
Trimmed Mean (截尾均值)
- Idea (思想):drop lowest & highest p% then average remaining
- 思想:去掉最低与最高 p% 后再求平均
- Steps (步骤):sort → trim → average
- 步骤:排序→截尾→求均值
- Trade-off (权衡):more robust than mean; closer to mean than median
- 权衡:比均值更稳健,结果又比中位数更接近总体均值
- Typical p (常见 p):5% 或 10%
- 常见取值:5% 或 10%
- Excel (函数):
=TRIMMEAN(range, 2p)
- Excel(函数):
=TRIMMEAN(range, 2p)
Percentiles (百分位数)
- Definition (定义):value below which p% of data fall
- 定义:p% 的数据不超过该数值
- Position (位置):
i = (p/100) * n
- 位置:
i = (p/100) * n
- Interpolation (插值):if i not integer, average adjacent as needed
- 插值:i 非整数时在相邻位置间插值/取平均
- Excel (函数):
=PERCENTILE.INC(range, p) / =PERCENTILE.EXC
- Excel(函数):
=PERCENTILE.INC(range, p) / =PERCENTILE.EXC
- Uses (用途):cutoffs, market segmentation, benchmarks
- 用途:阈值、市场细分、基准线
Quartiles (四分位数)
- Relationship (关系):Q1=25th, Q2=50th(=Median), Q3=75th
- 关系:Q1=25%、Q2=50%(中位数)、Q3=75%
- Boxplot & IQR (箱线图与四分位距):
IQR = Q3 − Q1
- 箱线图与 IQR:
IQR = Q3 − Q1
- Outliers (异常值界定):often via IQR rules in practice
- 异常值:实践中常结合 IQR 规则识别
4. Worked Example — Apartment Rents (完整案例——公寓租金)
Data Overview (数据概览)
- n = 70 efficiency apartments; range 425–615
- 70 套单间,租金范围 425–615
- Sum = 34,356; Mean = 490.80
- 总和 34,356;均值 490.80
- Median = 475 (35th & 36th in ordered list)
- 中位数 = 475(排序后第 35 与 36 个值的平均)
- Mode = 450 (most frequent)
- 众数 = 450(频数最高)
Percentiles (百分位数示例)
- 80th:
i = 0.80*70 = 56 → avg of 56th & 57th = (535+549)/2 = 542
- 第 80 百分位:
i = 0.80*70 = 56 → 取第 56 与 57 个平均 (535+549)/2 = 542
Quartile (四分位数示例)
- Q3 (75th):
i = 0.75*70 = 52.5 → 取第 53 个值 = 525
- Q3(第 75 百分位):
i = 0.75*70 = 52.5 → 第 53 个观测值 = 525
Interpretation (解读)
- Typical rent ≈ 491; half ≤ 475; common price point at 450
- 典型租金约 491;一半 ≤ 475;450 为常见价位
- Top 20% rents ≥ ~542 → premium segment
- 顶部 20% 租金 ≥ ~542 → 高端市场分段
5. Computation Workflow (计算流程)
Clean & Order (清洗与排序)
- Remove errors/duplicates; sort ascending
- 去错/去重;升序排列
Choose Measure (选择度量)
- Symmetric: mean; Skewed/outliers: median/trimmed mean
- 对称分布:均值;偏态/极端值:中位数或截尾均值
Calculate & Verify (计算与核验)
- Cross-check manual vs Excel outputs
- 手算与 Excel 结果交叉核验
- Record positions (for percentiles/quartiles)
- 记录位置索引(用于百分位/四分位)
6. Excel Implementation (Excel 实现)
Functions (函数速览)
- Mean:
=AVERAGE(B2:B71)
- 均值:
=AVERAGE(B2:B71)
- Median:
=MEDIAN(B2:B71)
- 中位数:
=MEDIAN(B2:B71)
- Mode:
=MODE.SNGL(B2:B71)
- 众数:
=MODE.SNGL(B2:B71)
- Percentile:
=PERCENTILE.INC(B2:B71, 0.80)
- 百分位:
=PERCENTILE.INC(B2:B71, 0.80)
- Trimmed mean:
=TRIMMEAN(B2:B71, 0.10) (5% each tail)
- 截尾均值:
=TRIMMEAN(B2:B71, 0.10)(两端各 5%)
Tips (技巧)
- Use absolute references for reusable formulas
- 使用绝对引用便于批量复用
- Document assumptions (INC vs EXC)
- 记录假设(含端/不含端)
7. Interpretation & Decision Logic (解读与决策逻辑)
Market Insights (市场洞察)
- Mean for budgeting; Median for affordability; High percentiles for premium pricing
- 均值用于预算;中位数用于可负担性;高百分位用于高端定价
Policy/Management (政策/管理)
- Identify rent thresholds for subsidies or caps
- 设定补贴/限价阈值
- Track shifts in median/mode over time
- 追踪中位数/众数的时间变化
8. Image/Table Insights (图片/表格要点)
What to Read (阅读要点)
- Confirm ordering before median/percentiles
- 计算中位数/百分位数前确认已排序
- Locate exact positions (i) and adjacent values
- 精确定位 i 值与相邻观测
- Cross-highlight frequent values for mode
- 高亮频繁值以识别众数
Quality Checks (质量核验)
- Range sanity (min/max) and unusual spikes
- 检查范围(最小/最大)与异常跳点
- Consistency between list, totals, and Excel outputs
- 列表、总和与 Excel 结果一致性
9. Quick Reference (速查表)
Key Numbers (关键数值)
- Mean = 490.80; Median = 475; Mode = 450
- 均值 490.80;中位数 475;众数 450
- 80th ≈ 542; Q3 = 525
- 第 80 百分位 ≈ 542;Q3 = 525
\bar{x} = (∑ x_i)/n, \mu = (∑ x_i)/N
\bar{x} = (∑ x_i)/n,\mu = (∑ x_i)/N
i = (p/100)*n (percentile position)
i = (p/100)*n(百分位位置)
10. Summary & Takeaways (总结与要点)
Selection Matters (选择很重要)
- Use measure aligned with distribution & decision need
- 度量选择需匹配分布形态与决策需求
Case-Driven (案例驱动)
- Apartment rents illustrate full pipeline: compute → verify → interpret → decide
- 公寓租金案例贯穿流程:计算→核验→解读→决策