Group A — MGS 2150 Lecture 2 · Chap 1-1（10 题 · 数据类型、抽样、可视化与数据质量）

Question A1 — Scale Choice for Satisfaction / 满意度量表与度量选择

A manager wants to compare “average satisfaction” across 4 stores. Ratings are recorded as Very bad/Bad/Neutral/Good/Very good. What summary and chart are appropriate? 某经理想比较 4 家门店的“平均满意度”。评分为“非常差/差/中/好/非常好”。应采用什么汇总指标与图表？

📖 点击查看答案

Answer｜答案: Use median and quartiles (ordinal scale), not arithmetic mean. Visualize with a stacked bar (proportion) or box plot if converted to scores with caution. 使用中位数与四分位数（有序尺度），不宜用算术均值。图表可用堆叠条形图（比例）或在谨慎编码后用箱线图。

📝 点击查看解析

Why｜解析: Ordinal categories are not equally spaced; averaging numbers after naive coding is misleading. Median/IQR respect order without assuming equal intervals. 有序分类并非等距，直接给分求均值会误导；中位数/IQR 能反映“典型水平”和分散度。

Question A2 — Cross-Section vs Time Series / 横截面 vs. 时间序列

You have April 2025 “same-day” delivery times for all stores (cross-section) and monthly series from 2023–2025 (time series). What charts for each and why? 你有 2025 年 4 月所有门店“当日达”时长（横截面）和 2023–2025 的月度序列（时间序列）。各用何图？为什么？

📖 点击查看答案

Answer｜答案: Cross-section: box/violin plots by store to compare distributions. Time series: line chart with moving average and seasonal markers. 横截面用箱线/小提琴按门店比分布；时间序列用折线 + 移动平均并标注季节因素。

📝 点击查看解析

Why｜解析: Cross-section needs distributional comparison; time series needs trend/seasonality detection. 横截面强调分布差异；时间序列要识别趋势与季节性。

Question A3 — Sampling Frame / 抽样框设计

To estimate average app waiting time in a day with strong peaks, how to sample? 要估计一天内 App 等待时长的平均值（高峰明显），如何抽样？

📖 点击查看答案

Answer｜答案: Stratified sampling by hour; within each hour, systematic sampling at fixed intervals; weight strata by traffic. 按小时分层抽样，层内系统抽样，按流量加权。

📝 点击查看解析

Why｜解析: Reduces variance and avoids peak-time bias; systematic sampling reduces autocorrelation. 分层降方差、避免高峰偏差；系统抽样可弱化自相关。

Question A4 — Data Quality & Outliers / 数据质量与极端值

A few tickets show “> 200 hours” delivery. Delete or not? 有少量工单显示配送时长“>200 小时”。是否删除？

📖 点击查看答案

Answer｜答案: Do business check first (holiday, re-routing), then robust diagnostics (IQR/MAD). If confirmed errors, correct/flag; for reporting, use winsorization or robust stats (median/IQR). 先业务核验，再做稳健诊断；确属错误则更正/标注；报表可温莎化或用稳健统计（中位/IQR）。

📝 点击查看解析

Why｜解析: Blind deletion risks bias; robust summaries reduce outlier impact. 盲删会引入偏差；稳健汇总能缓和极端值影响。

Question A5 — Variable Types / 变量类型识别

Classify: (a) coupon amount, (b) member or not, (c) store tier, (d) reorder rate. 分类：(a) 优惠券面额；(b) 是否会员；(c) 门店等级；(d) 复购率。

📖 点击查看答案

Answer｜答案: (a) Ratio, (b) Nominal binary, (c) Ordinal, (d) Ratio (proportion). (a) 比率；(b) 名义二元；(c) 有序；(d) 比率（比例）。

📝 点击查看解析

Why｜解析: Scale choice drives allowed summaries/tests. 量表决定可用的统计与检验。

Question A6 — Dashboard Ethics / 仪表盘伦理

A chart starts y-axis at 90% to exaggerate improvement. What to do? 图表把纵轴从 90% 起点，夸大改进。怎么办？

📖 点击查看答案

Answer｜答案: Use zero-based axis for KPIs or clearly show axis break and provide both absolute and % change tables. KPI 采用零基坐标；如断轴需显著标注，并同时提供绝对值与百分比表。

📝 点击查看解析

Why｜解析: Prevents misinterpretation; transparency builds trust. 防误读，透明提升信任。

Question A7 — Simpson’s Paradox Lite / 轻量辛普森悖论

Overall conversion is lower for Channel A than B, but within “new vs returning” segments A is higher. What to report? 总体上 A 转化低于 B，但在“新客/老客”分组内 A 更高。怎么汇报？

📖 点击查看答案

Answer｜答案: Report stratified metrics and a standardized overall rate using common segment weights. 同时报告分层指标与基于统一权重的标准化总体率。

📝 点击查看解析

Why｜解析: Segment mix is a confounder; standardization removes structure bias. 结构差异是混杂；标准化可剥离。

Question A8 — Survey Weighting / 调查加权

Satisfaction survey over-samples heavy users. Without re-surveying, how to adjust? 满意度样本重度偏向高频用户，如何在不重采的前提下矫正？

📖 点击查看答案

Answer｜答案: Post-stratification / raking by visit frequency and membership; report weighted vs unweighted side-by-side. 用后分层/迭代比例按来店频率、会员身份加权；并同时展示加权与未加权结果。

📝 点击查看解析

Why｜解析: Reduces selection bias and clarifies variance inflation. 降低选择偏误，并说明方差变化。

Question A9 — Frequency Table & Bin Alignment / 频数分组对齐

When building a histogram for call duration, how to choose bin width to align with an SLA threshold at 120s? 制作通话时长直方图，如何让组距与 120 秒 SLA 对齐？

📖 点击查看答案

Answer｜答案: Start with Freedman–Diaconis rule, then shift edges so 120s is a bin boundary; check sensitivity. 先用FD 规则求组距，再移动组边界使 120 秒成为分界；做敏感性检查。

📝 点击查看解析

Why｜解析: Preserves statistical guidance and business interpretability. 兼顾统计性与业务可读性。

Question A10 — Metric Definition / 指标口径修订

“Store-visit conversion = store visits / app opens” is distorted by remote opens. Fix it. “到店转化率=到店人数/APP 打开人数”被大量异地打开扭曲。如何修订？

📖 点击查看答案

Answer｜答案: Use geo-matched denominator (local MAU) or stage rates (open→coupon→visit). 改为同城活跃为分母，或用分段率（打开→领券→到店）。

📝 点击查看解析

Why｜解析: Denominator must reflect actionable exposure. 分母需匹配可影响的人群。

Group B — MGS 2150 Lecture 3 · Chap 1-2 v.1（10 题 · 表格/图表设计、帕累托、折线/注释、热力图）

Question B1 — Pareto for Returns / 退货帕累托优先级

After plotting a Pareto chart of return reasons, how do you pick the first fixes for the next 30 days? 画出退货原因帕累托图后，如何选定未来 30 天的优先修复项？

📖 点击查看答案

Answer｜答案: Target causes covering ~80% cumulative and rank by impact × controllability × cost. 选择累计约 80% 的主因，并按影响×可控性×成本打分排序。

📝 点击查看解析

Why｜解析: Pareto shows contribution; feasibility triage turns charts into actions. 帕累托给出贡献度；可行性分级才能落地。

Question B2 — Small Multiples / 小倍图对比

You must compare distribution of turnover days for 12 stores in one page. Best approach? 一页内比较 12 家门店周转天数分布，最佳做法？

📖 点击查看答案

Answer｜答案: Small-multiples box plots with same axis and a target line; sort by median or IQR. 小倍数箱线图（同一坐标），加目标线；按中位数或 IQR 排序。

📝 点击查看解析

Why｜解析: Same scale enables visual comparison; target line relates to goals. 同尺度便于直观比较；目标线连接绩效。

Question B3 — Dual-Axis Caution / 双轴图注意

Why can a sales vs. ad-spend dual-axis line mislead and what to use instead? 销售额与广告费双轴折线为何易误导？替代方案？

📖 点击查看答案

Answer｜答案: Axis scaling is arbitrary → pseudo-correlation. Use indexed lines (base=100) or scatter with trend, plus event annotations. 双轴刻度任意导致“伪相关”。改用指数化折线（基期=100）或散点+趋势，并加事件注释。

📝 点击查看解析

Why｜解析: Standardization clarifies real co-movement. 标准化使同步性更可信。

Question B4 — Heatmap Standardization / 热力图标准化

A region×category heatmap highlights big cities. How to avoid population bias? 地区×品类热力图总被一线城市“亮瞎”。如何避免人口尺度偏差？

📖 点击查看答案

Answer｜答案: Use per-capita or per-10k-household rates; provide raw totals in a side table. 采用人均或“每万户”比率，并在侧表给出总量对照。

📝 点击查看解析

Why｜解析: Separates scale from performance. 剥离规模影响看真实表现。

Question B5 — Annotated Lines / 注释折线

How to show the effect of a pricing policy change in a time series without clutter? 如何在时间序列中提示“价格政策变化”的影响且不拥挤？

📖 点击查看答案

Answer｜答案: Add a vertical marker with short note; consider pre/post window averages or segmented trend. 加竖线标记+注释；可做前/后窗口均值或分段趋势。

📝 点击查看解析

Why｜解析: Links data to events and supports causal discussion. 把数据与事件建立联系。

Question B6 — Table for Decisions / 决策型表格

List three must-have columns for an executive action table. 高层决策表必须包含的三列？

📖 点击查看答案

Answer｜答案: Absolute value, % change vs. target, and gap to target; highlight exceptions. 绝对值、相对变化与距离目标的差；并突出异常。

📝 点击查看解析

Why｜解析: Converts numbers into actions. 让数字直接服务决策。

Question B7 — Category Granularity / 分类粒度

Combining “stock-out” and “supply break” caused diagnosis failure. Fix? 把“缺货/断供”合并成“无货”导致根因分析失败，如何修正？

📖 点击查看答案

Answer｜答案: Restore hierarchical categories and visualize with tree/sankey to show flow. 恢复分层分类，用树图/桑基图表示流转路径。

📝 点击查看解析

Why｜解析: Granularity is key to actionability. 粒度适当才可落地。

Question B8 — Stem-and-Leaf Use / 茎叶图使用

When is a stem-and-leaf better than a box plot for managers? 在什么情况下茎叶图优于箱线图？

📖 点击查看答案

Answer｜答案: When sample is small-to-medium and exact values matter (quality reviews). 中小样本且需要看到精确数值时。

📝 点击查看解析

Why｜解析: Preserves each observation while showing shape. 同时保留形状与观测值。

Question B9 — Real-time vs. Accuracy / 时效与准确

Daily dashboard is T+1. How to add timely insight without “passing off estimates as truth”? 日报 T+1，如何在不“以估作真”前提下提升时效？

📖 点击查看答案

Answer｜答案: Add a nowcast panel with uncertainty band; auto-replace by final data next day. 增设近实时预估并配不确定性带，次日用最终数据回填。

📝 点击查看解析

Why｜解析: Separates estimate from final numbers transparently. 透明区分估计与定稿。

Question B10 — Consistency Checks / 一致性校验

POS sales vs. WMS shipments differ due to returns timing. What documentation and checks? POS 销售与 WMS 出库因退货时点不同而不一致。需要哪些文档与校验？

📖 点击查看答案

Answer｜答案: Metric dictionary, reconciliation matrix, and time-alignment rules. 指标字典、对账差异矩阵与时间口径对齐规则。

📝 点击查看解析

Why｜解析: Prevents “definition drift” and audit issues. 防止口径漂移，便于审计。

Group C — MGS 2150 Lecture 4 · Chap 2-1（10 题 · 集中趋势、离散度、稳健度量）

Question C1 — Mean vs Median / 均值 vs. 中位数

Delivery times are right-skewed. Which center measure for “typical user experience”? 配送时长右偏，哪种中心更能代表“典型体验”？

📖 点击查看答案

Answer｜答案: Median (with IQR). 中位数（配合 IQR）。

📝 点击查看解析

Why｜解析: Median is robust to long right tail. 中位数不受长右尾强影响。

Question C2 — Weighted Mean / 加权平均

Why use weighted mean for average discounted price? 折后均价为何要用加权平均？

📖 点击查看答案

Answer｜答案: Weight by sales volume; simple mean overstates discount if small-volume deep-discount SKUs exist. 以销量为权重，否则小销量大折扣会夸大降价效果。

📝 点击查看解析

Why｜解析: Pricing decisions depend on revenue-weighted reality. 定价要基于营收加权的现实。

Question C3 — Geometric Mean / 几何平均

For 3 yearly growth rates 10%, −5%, 15%, why prefer geometric mean for multi-year growth? 三年增长 10%、−5%、15%，为何多期增长用几何平均？

📖 点击查看答案

Answer｜答案: It respects compounding; arithmetic mean misstates multi-period effect. 反映复利效果；算术平均会误差多期效果。

📝 点击查看解析

Why｜解析: Multi-year growth is a product, not a sum. 多年增长是乘积过程。

Question C4 — Range, IQR, SD / 极差、IQR、标准差

Suggest a simple stability KPI less sensitive to outliers. 给一个对异常值不敏感的“稳定性 KPI”。

📖 点击查看答案

Answer｜答案: IQR or trimmed SD (e.g., remove top/bottom 5%). 四分位距 IQR或截尾标准差（如上下各 5%）。

📝 点击查看解析

Why｜解析: Robust metrics better reflect typical variability. 稳健度量更能反映典型波动。

Question C5 — Coefficient of Variation / 变异系数

Two lines: μ₁=100, σ₁=8; μ₂=60, σ₂=7. Which is more stable? 两条产线：μ₁=100, σ₁=8；μ₂=60, σ₂=7。哪条更稳定？

📖 点击查看答案

Answer｜答案: Compute CV: 0.08 vs 0.1167 → Line 1 is more stable. CV：0.08 vs 0.1167，产线 1 更稳定。

📝 点击查看解析

Why｜解析: CV enables comparison across different means. CV 可跨均值比较稳定性。

Question C6 — Empirical Rule (Appropriate Use) / 经验法则的适用

When can you use the 68-95-99.7 rule to flag unusual observations? 何时可用 68-95-99.7 经验法则标记异常？

📖 点击查看答案

Answer｜答案: When data are approximately normal (symmetric, light-tailed). 近似正态（对称、轻尾）时。

📝 点击查看解析

Why｜解析: Otherwise use quantiles or MAD. 非正态应用分位/MAD。

Question C7 — Chebyshev (Distribution-Free) / 切比雪夫不等式

Provide a conservative 2-SD bound coverage for any distribution. 给出任意分布下“均值±2σ”的保守覆盖率。

📖 点击查看答案

Answer｜答案: At least 1 − 1/2² = 75%. 至少 75%。

📝 点击查看解析

Why｜解析: Distribution-free bound; useful for worst-case planning. 与分布无关，适合保守规划。

Question C8 — z-Scores & Standardization / z 分数与标准化

Why standardize metrics before comparing across categories with different scales? 跨品类、不同尺度的指标为何要标准化后比较？

📖 点击查看答案

Answer｜答案: Removes scale effects; compare deviations relative to own variability. 去除尺度差异，以本品类波动为参照比较偏离。

📝 点击查看解析

Why｜解析: Prevents unfair ranking due to scale. 避免因尺度不同导致的误判。

Question C9 — Outlier Policy / 异常值政策

Give two principles for an outlier policy in reporting. 报表中的异常值政策给出两条原则。

📖 点击查看答案

Answer｜答案: (1) Business-first verification; (2) Transparent thresholds and impact notes (e.g., winsor levels). (1) 业务优先核验；(2) 阈值透明并说明影响（如温莎比例）。

📝 点击查看解析

Why｜解析: Ensures auditability and trust. 便于审计并建立信任。

Question C10 — Percentiles for SLA / 用分位数管理 SLA

SLA: “95% orders delivered within 24h”. What should the weekly report include? SLA：“95% 24 小时内送达”。周报应包含哪些统计？

📖 点击查看答案

Answer｜答案: P50/P90/P95/P99, late-case breakdown, and trend vs last week. P50/P90/P95/P99、超时工单分解、与上周对比趋势。

📝 点击查看解析

Why｜解析: Percentiles target both typical and tail performance. 既管典型表现又管尾部风险。

Group D — Lecture 5 · Chap 2-2（四分位、箱线图、偏度峰度、稳健统计、展示规范）
Group E — Lecture 6 · Chap 3-1（加法/乘法/条件概率、独立性、计数）
Group F — Lecture 7 · Chap 3-2（贝叶斯、先验更新、阈值与代价）
Group G — Lecture 8 · Chap 4-1（离散型/泊松近似、几何/负二项、混合/稀疏化）
Group H — Lecture 9 · Chap 4-2（正态/指数/均匀、CLT、区间估计与样本量）
Group I — MGS 2150 6th（综合运用：指标栈、治理、A/B 合规、情景分析）

Group D — MGS 2150 Lecture 5 · Chap 2-2（10 题 · 分位数/箱线图/偏度峰度/稳健统计/展示规范）

Question D1 — Percentiles for SLA / 用分位数管理 SLA

A courier service promises “95% orders ≤ 24h”. Which percentiles should the weekly report show? 某快递承诺“95% 订单 24 小时内送达”。周报应展示哪些分位数？

📖 点击查看答案

P50（中位）、P90、P95、P99；并列出超 P95 的原因分解与数量。 Show P50, P90, P95, P99; plus breakdown counts/reasons beyond P95.

📝 点击查看解析

分位数能同时反映典型表现与尾部风险；P95 对应承诺点，P99 帮助识别极端延迟。

Question D2 — Boxplot for Carriers / 用箱线图选择承运商

Three carriers’ delivery times are summarized by boxplots. Which features guide a contract decision? 三家承运商箱线图已给出。签约时应关注哪些要素？

📖 点击查看答案

更小的中位数与 IQR、更短上须、更少离群点；对长尾可设置罚则或服务等级条款。 Lower median & IQR, shorter upper whisker, fewer outliers; add tail-risk penalties.

📝 点击查看解析

箱线图直接反映稳定性与尾部风险，比单看均值更可靠。

Question D3 — Special vs. Common Cause / 特殊原因 vs 常见原因

A single-day spike of late orders is observed. How to decide whether to exclude from KPI? 某天超时订单激增，是否应从 KPI 考核剔除？

📖 点击查看答案

查系统/天气/节假日日志；若与一次性事件匹配→“特殊原因”，可标注并单列；否则视为波动的一部分。 Match with incident logs; if special cause, annotate & separate; else keep.

📝 点击查看解析

SPC 思路：特殊原因应纠正流程，不应惩罚一线；常见原因需持续改进。

Question D4 — IQR Stability KPI / IQR 稳定性指标

Design a simple stability KPI using IQR. 用 IQR 设计一个“稳定性”KPI。

📖 点击查看答案

KPI：当日 IQR ≤ 目标阈值（如 ≤ 40 分钟）；与中位数目标同时达成才视为合格。 IQR threshold + median target.

📝 点击查看解析

IQR 抗异常，能衡量“集中程度”；与中位数结合避免“普遍慢但很稳”的误导。

Question D5 — Skewness Interpretation / 偏度的运营含义

Customer spending shows positive skewness. What promotion idea fits this pattern? 客单价正偏（右长尾）。哪种促销更合适？

📖 点击查看答案

对高客单群体推“高阶权益/捆绑”分层优惠；大众只需基础折扣。 Tiered benefits for high spenders; basic discount for mass.

📝 点击查看解析

右尾代表少量高额交易，分层策略更有效。

Question D6 — Kurtosis & Tail Risk / 峰度与尾部风险

High kurtosis is observed in delivery times. Operational risk? 配送时长峰度偏高，风险在哪里？

📖 点击查看答案

更肥尾意味着偶发性极慢单多；需配置峰值产能/备用线路，并以 P99 监控。 Fat tails → more extreme delays; add surge capacity, monitor P99.

📝 点击查看解析

峰度高强调尾部概率上升，均值/中位难覆盖。

Question D7 — Combine Plots / 组合图避免信息过载

How to show distribution and threshold without clutter? 如何同时展示分布与阈值而不过载？

📖 点击查看答案

ECDF（累计分布）叠加 24h 竖线 + 小倍数箱线图分门店；同一尺度、先总后分。 ECDF + threshold line; small-multiple boxplots per store.

📝 点击查看解析

先整体看覆盖率，再下钻差异。

Question D8 — Robust Center / 稳健中心

Why prefer median over mean when outliers exist? 有离群值时，为什么首选中位数？

📖 点击查看答案

中位数对少量极端值不敏感，更能代表“典型体验”。 Median is robust to outliers.

📝 点击查看解析

均值会被极端值拉动，导致治理优先级失真。

Question D9 — Winsorization Policy / 温莎化政策

When is winsorization appropriate and how to document it? 何时适合温莎化？如何记录以便审计？

📖 点击查看答案

仅在报表/分析层，用固定比例（如上下 1%）；记录阈值、比例、影响评估与审批。 Reporting-layer only; log cut points & impacts.

📝 点击查看解析

保留原始数据，确保可追溯与一致性。

Question D10 — Dashboard Guardrails / 图表护栏

List three guardrails to reduce misinterpretation. 列出三条图表“护栏”。

📖 点击查看答案

KPI 零基坐标或明显断轴、显示样本量/分布、同时给绝对值与分位/相对变化。 Zero-based or marked breaks; show N & distribution; absolute + percentiles.

📝 点击查看解析

透明与上下文信息能显著降低误读。

Group E — MGS 2150 Lecture 6 · Chap 3-1（10 题 · 概率加法/乘法/条件/独立/计数）

Question E1 — Addition Rule / 加法法则

70% members, 40% used coupons, 25% both. What is P(at least one)? 会员占 70%，用券 40%，交集 25%。至少使用一项概率？

📖 点击查看答案

0.70 + 0.40 − 0.25 = 0.85。 85%.

📝 点击查看解析

P(A∪B)=P(A)+P(B)−P(A∩B)。

Question E2 — Only One of Two / 仅其一

Using the same data, probability of “only one of the two”? 在同一数据下，“仅其一”的概率？

📖 点击查看答案

(0.70−0.25)+(0.40−0.25)=0.60。 60%.

📝 点击查看解析

去掉交集部分即可。

Question E3 — Complement / 补事件

System success 99.3%. Failure probability? 系统成功率 99.3%，失败概率？

📖 点击查看答案

1 − 0.993 = 0.007（0.7%）。 0.7%.

📝 点击查看解析

补集更直观可用于排班与容灾容量。

Question E4 — Independence Check / 独立性直觉

If P(coupon | new)=0.6 but P(coupon)=0.4, independent? 若 P(券|新客)=0.6，而 P(券)=0.4，是否独立？

📖 点击查看答案

否，不独立（正相关）。 Not independent.

📝 点击查看解析

独立要求条件概率等于总体概率。

Question E5 — Conditional Probability / 条件概率

Click-to-purchase = 12%, overall purchase = 3%. What is click rate? 点击后购买率 12%，总体购买 3%。点击率？

📖 点击查看答案

0.03 = 0.12 × P(click) → 0.25（25%）。 25%.

📝 点击查看解析

全概率公式：P(Buy)=P(Buy|Click)P(Click)。

Question E6 — Law of Total Probability / 全概率

Overall churn 8%; new users 30% with 15% churn. Old-user churn? 总体流失 8%，新客占 30% 且其流失 15%。老客流失率？

📖 点击查看答案

0.08 = 0.3×0.15 + 0.7×x → x=5%。 5%.

📝 点击查看解析

分层权重求和等于总体率。

Question E7 — Basic Counting: Combinations / 组合

“12 choose 2” bundles and “at least 1 of 3 new SKUs”? 12 选 2 组合数；且“至少含 1 个 3 款新品”？

📖 点击查看答案

C(12,2)=66；减去“0 新品”C(9,2)=36，得 30。 66 and 30.

📝 点击查看解析

先总数，后排除法更快。

Question E8 — Basic Counting: Permutations / 排列

Three hosts for Mon/Tue/Wed, one per day, no repeat. Arrangements? 三位主播排三天，每天一人不重复。排法数？

📖 点击查看答案

3! = 6。若周三固定 A，则剩下 2! = 2。 6; with Wed fixed, 2.

📝 点击查看解析

排列考虑顺序；限定条件先固定。

Question E9 — Multiplication Rule / 乘法法则

P(A)=0.4, P(B)=0.5, A and B independent. P(A∩B)? 独立事件 A、B：P(A)=0.4，P(B)=0.5，求交集。

📖 点击查看答案

0.4 × 0.5 = 0.20。 20%.

📝 点击查看解析

独立即连乘。

Question E10 — Bayes Lite (Interpretation) / 简易贝叶斯直觉

High-score leads 20%; P(sale|high)=15%, P(sale|low)=2%. If a random sale occurs, is it more likely from high-score? 高分线索 20%，其转化 15%，低分 80% 转化 2%。已知出现一笔成交，更可能来自高分吗？

📖 点击查看答案

是。后验 ≈ 65% 来自高分（计算见 F1）。 Yes, ~65% likely high-score.

📝 点击查看解析

直觉：高分虽人数少，但单位成交率高。

Group F — MGS 2150 Lecture 7 · Chap 3-2（10 题 · 贝叶斯/先验更新/阈值与代价）

Question F1 — Bayes Posterior / 后验概率计算

High 20% with 15% conv; low 80% with 2%. Given a sale, P(high)? 高分 20%（转化 15%），低分 80%（2%）。已成交，来自高分的概率？

📖 点击查看答案

0.15×0.20 / (0.15×0.20 + 0.02×0.80) = ≈65.2%。 ≈65.2%.

📝 点击查看解析

后验 ∝ 先验×似然；分子高分路径，分母为所有路径之和。

Question F2 — Base Rate Fallacy / 基准率陷阱

Disease rate 1%, sensitivity 95%, false positive 5%. P(disease | positive)? 患病率 1%，敏感度 95%，假阳性 5%。阳性后患病概率？

📖 点击查看答案

(0.95×0.01)/(0.95×0.01+0.05×0.99) ≈ 16.1%。 16.1%.

📝 点击查看解析

低基准率下阳性预测值有限→需二次检测。

Question F3 — Precision Under Imbalance / 类别极不平衡的精确率

Alert fire rate 5%, true threat 0.2%, TPR 90%. Approx precision? 告警触发 5%，真实威胁 0.2%，召回 90%。精确率约多少？

📖 点击查看答案

TP≈0.18%，FP≈4.82%，Precision≈0.18/(0.18+4.82)=≈3.6%。 ≈3.6%.

📝 点击查看解析

失衡下应看 PR 曲线与阈值成本。

Question F4 — Decision Threshold by Cost / 代价驱动阈值

When to block a transaction by posterior probability and costs? 如何用后验概率与代价设定拦截阈值？

📖 点击查看答案

若 P(Fraud|x) ≥ C(FN)/(C(FN)+C(FP)) 则拦截；可设“复核带”。 Threshold = C_FN/(C_FN+C_FP).

📝 点击查看解析

以期望成本最小化为准则，结合队列产能设两段阈值。

Question F5 — Cascaded Screening / 级联筛查

Low-precision first screen + high-precision review: why two thresholds? 低精度初筛 + 高精度复核，为什么要两段阈值？

📖 点击查看答案

初筛设低阈提高召回；复核设高阈控制误杀；总体最小化期望损失。 Recall first, then precision.

📝 点击查看解析

分层比单阈值更贴合业务资源与风险。

Question F6 — Sequential Updating / 顺序更新

How to update lead probability after each new signal? 每次新证据到来如何更新线索成交概率？

📖 点击查看答案

Posterior_t ∝ Likelihood_t × Posterior_{t−1}（归一化）。 Multiply by likelihood and renormalize.

📝 点击查看解析

CRM 在线学习的基本框架。

Question F7 — Prior Setting / 先验设定

Entering a new city with few observations, how to set priors? 新城市数据少，先验如何设？

📖 点击查看答案

使用层级先验借鉴相似城市（分层贝叶斯），并做敏感性分析。 Hierarchical priors + sensitivity.

📝 点击查看解析

防止过度自信或过度保守。

Question F8 — Posterior to Action / 后验到动作

A coupon costs $5; e x p ec t e d ma r g in u pl i f t Δ G . Dec i s i o n r u l e ? 发券成本$ 5，预期毛利提升 ΔG。如何基于后验做决策？

📖 点击查看答案

若 P(Buy|x,券) × ΔG ≥ 5 则发券；否则不发。 Trigger if expected gain ≥ cost.

📝 点击查看解析

把概率转化为期望收益比较。

Question F9 — Value of Information / 信息价值

When to stop testing and ship the better variant? 何时停止试验并上线更优版本？

📖 点击查看答案

当“预期改判概率 × 改判带来的收益” < “继续采样成本”时停止。 Stop if VOI < sampling cost.

📝 点击查看解析

简化为一条性价比规则。

Question F10 — Communicating Posterior / 沟通后验

How to present posterior to non-technical managers? 如何向非技术经理解释后验？

📖 点击查看答案

提供区间（如 60–70%）、阈值比较、与代价情景（拦截/放行/复核）三段结论。 Show interval, threshold line, and cost-based actions.

📝 点击查看解析

概率→动作对照表最易理解。

Group G — MGS 2150 Lecture 8 · Chap 4-1（10 题 · 二项/泊松/几何/负二项/混合/稀疏化）

Question G1 — Binomial Basics / 二项分布基础

Email delivery p=0.98, n=1000. Approx P(X=980) idea? 到达率 0.98，群发 1000，近似 P(X=980) 的思路？

📖 点击查看答案

用正态近似：μ=980，σ≈√(1000·0.98·0.02)≈4.427，并做连续性校正。 Normal approx with continuity correction.

📝 点击查看解析

n 大、p 不极端时，二项可正态近似。

Question G2 — Hypergeometric QA / 超几何抽检

200 items, 10 defects, draw 5 without replacement. P(at least one defect)? 200 件，10 次品，不放回抽 5。至少 1 次品概率？

📖 点击查看答案

1 − C(190,5)/C(200,5)。 1 − C(190,5)/C(200,5).

📝 点击查看解析

有限总体、无放回→超几何。

Question G3 — Poisson Tail / 泊松尾部概率

λ=0.8/hr. P(≥2 next hour)? λ=0.8/小时。下一小时至少 2 起的概率？

📖 点击查看答案

1 − e^{-0.8}(1+0.8)。 1 − e^{-0.8}(1+0.8).

📝 点击查看解析

由 P(0)+P(1) 互补求得尾部。

Question G4 — Geometric Expectation / 几何分布期望

First-call resolution p=0.7. Expected number of calls? 首次解决率 0.7。期望通话次数？

📖 点击查看答案

1/p = ≈1.43。 ≈1.43.

📝 点击查看解析

几何分布：直到首次成功的尝试次数。

Question G5 — Negative Binomial / 负二项期望

Need 3 successes, success p=0.6. Expected trials? 需要 3 次成功，单次成功率 0.6，期望尝试数？

📖 点击查看答案

r/p = 5。 5.

📝 点击查看解析

负二项：达到 r 次成功的总尝试。

Question G6 — Poisson Approximation / 泊松近似二项

When can Bin(n,p) ≈ Pois(λ=np)? Give an example. 何时二项可近似泊松？举例。

📖 点击查看答案

n 大、p 小、np 适中；如百万请求中错误率 0.0005。 Large n, small p; e.g., rare errors.

📝 点击查看解析

便于快速估计稀有事件。

Question G7 — Thinning Property / 稀疏化

Events ~ Pois(λ=10/hr). Keep 20% alerts. New process? 告警流泊松(10/h)，仅保留 20%。新过程？

📖 点击查看答案

仍为泊松，λ’=2/h。 Poisson with λ’=2/h.

📝 点击查看解析

泊松稀疏化性质。

Question G8 — Mixture Mean & Var / 混合分布的均值方差

30% high λ=12, 70% low λ=4. Find mean and variance. 高峰(30%) λ=12、平峰(70%) λ=4 的混合。求期望与方差。

📖 点击查看答案

E=6.4；Var=6.4 + 0.3(12−6.4)² + 0.7(4−6.4)² ≈ 15.64。 Mean 6.4; Var ≈ 15.64.

📝 点击查看解析

方差=组内方差（=E）+组间方差。

Question G9 — Compound Poisson Sum / 复合泊松

Orders N~Pois(λ), amount i.i.d. with mean μ. E(total sales)? 订单数泊松，单笔金额独立同分布均值 μ。总销售额期望？

📖 点击查看答案

E(Sum)=E(N)·E(Amount)=λ·μ。 λ·μ.

📝 点击查看解析

线性期望性。

Question G10 — Net Difference: Skellam / 泊松差分

Two independent Poissons λ_A, λ_B. Distribution of (A−B)? 两独立泊松流 A、B 的差 A−B 的分布？

📖 点击查看答案

Skellam 分布；E=λ_A−λ_B，Var=λ_A+λ_B。 Skellam(λ_A, λ_B).

📝 点击查看解析

适用于“净增/净流量”分析。

Group H — MGS 2150 Lecture 9 · Chap 4-2（10 题 · 正态/指数/均匀/CLT/区间/样本量）

Question H1 — Normal Tail Fee / 正态尾部加费

Weight ~ N(1.00, 0.06²) kg. Fee if >1.10 kg. What proportion pays fee? 重量正态(1.00, 0.06²)，>1.10kg 加费。比例？

📖 点击查看答案

Z=(1.10−1.00)/0.06≈1.667 → ≈4.78%。 ≈4.78%.

📝 点击查看解析

计算正态上尾概率。

Question H2 — Exponential Waiting / 指数等待

λ=6/min (mean 10s). P(wait > 30s)? λ=6/分（均值 10 秒）。等待超过 30 秒概率？

📖 点击查看答案

e^{-6×0.5}=e^{-3}≈4.98%。 4.98%.

📝 点击查看解析

记忆无关，适合排队近似。

Question H3 — Uniform Tolerance / 均匀公差

X~U[−0.5,0.5] mm. P(|X|>0.3)? 均匀分布在 [−0.5,0.5]，超过 |0.3| 的概率？

📖 点击查看答案

1 − 0.6/1 = 0.4。 40%.

📝 点击查看解析

区间长度比例。

Question H4 — CLT Sample Mean / 中心极限定理

σ=0.12, n=36, μ=1.00. P( X̄ >1.03 )? 总体 σ=0.12，n=36，均值 1.00。样本均值 >1.03 的概率？

📖 点击查看答案

Z=(1.03−1.00)/(0.12/√36)=1.5 → ≈6.68%。 ≈6.68%.

📝 点击查看解析

X̄ 的标准误 = σ/√n。

Question H5 — z-Interval for Mean / 已知 σ 的均值区间

σ=0.12, n=100, X̄=1.02 kg. 95% CI? 已知 σ=0.12，样本 100，样本均值 1.02。95% 区间？

📖 点击查看答案

1.02 ± 1.96×0.12/√100 → [1.001, 1.039] kg。 [1.001, 1.039] kg.

📝 点击查看解析

解释为“程序覆盖率”，非单次概率。

Question H6 — CI for Proportion / 比例区间

n=200, success=182. 95% CI? 样本 200，成功 182。95% 比例区间？

📖 点击查看答案

p̂=0.91，SE≈0.020；CI≈0.91±1.96×0.020 → [0.871, 0.949]。 [0.871, 0.949].

📝 点击查看解析

小样本可用 Wilson 更稳健。

Question H7 — t vs z / 何时用 t

When to use t-interval for mean instead of z? 均值区间何时用 t 而不是 z？

📖 点击查看答案

σ 未知、样本较小且近似正态时用 t；样本大时 z 近似可行。 Use t when σ unknown & small n.

📝 点击查看解析

t 反映“估计 σ”的额外不确定性。

Question H8 — One- vs Two-Tailed / 单尾与双尾

You only want to show “time decreased”. Which tail and why pre-register? 只想证明“时间缩短”，该用单尾；为何要预注册方向？

📖 点击查看答案

单尾；预注册可避免事后改方向导致 I 类错误膨胀。 One-tailed; preregistration controls Type I error.

📝 点击查看解析

课堂强调：先设备择，再收数据。

Question H9 — Practical vs Statistical / 实质 vs 统计显著

p=0.02 but mean time reduces only 0.2 minutes. How to communicate? p=0.02，但均值仅缩短 0.2 分钟。如何沟通？

📖 点击查看答案

“统计显著但效应小”，给出置信区间与 ROI；建议先灰度上线。 Statistically significant, practically small; pilot first.

📝 点击查看解析

将显著性与业务阈值一并呈现。

Question H10 — Sample Size for Proportion / 比例样本量

Target p≈0.05, margin ±0.5pct, 95% CI. Rough n? 预估 p=0.05，误差 ±0.5 个百分点，95% 置信。样本量级？

📖 点击查看答案

n≈z²p(1−p)/E² → 约 7.3k–7.4k。 ≈7,300–7,400.

📝 点击查看解析

若未知 p，取 0.5 更保守。

Group I — MGS 2150 6th.pdf（10 题 · 综合：指标栈/治理/A-B 合规/情景分析）

Question I1 — KPI Stack / 指标全栈

Design a traceable KPI stack from raw logs to decisions. 设计一套“可追溯”的 KPI 栈（原始日志→指标→决策）。

📖 点击查看答案

原始层（只读）→清洗层（异常标注）→分析层（中位/IQR/分位）→决策层（阈值+ROI）→字典/版本与审计日志。 Raw→Clean→Analytics→Decision + dictionary & audit.

📝 点击查看解析

统计方法与数据治理需同时落地。

Question I2 — Mixed Methods / 混合方法

Sales drop has both data and process changes. How to combine quant & qual? 销量下滑既有数据波动也有流程变化，如何混合方法诊断？

📖 点击查看答案

定量：分层对比/断点前后；定性：访谈/走查；三角验证统一结论。 Stratified contrasts + interviews.

📝 点击查看解析

课堂强调“证据三角化”提升可信度。

Question I3 — Metric Hierarchy / 指标层级

Build North Star → drivers → operational metrics to avoid local optima. 构建“北极星→驱动→操作”指标，避免局部最优。

📖 点击查看答案

北极星：长期价值；驱动：转化/留存；操作：点击/到达；用因果图约束不伤害下游。 Use causal map and guardrails.

📝 点击查看解析

层级清晰才能对齐激励。

Question I4 — Experiment Under Constraint / 样本受限的实验

Three variants but limited traffic. Design efficiently. 三方案但流量有限，如何高效设计？

📖 点击查看答案

析因设计或多臂赌博；设停止规则与最小效应阈值。 Factorial or multi-armed bandit.

📝 点击查看解析

提高“信息/样本”比。

Question I5 — Robust Weekly Pack / 稳健周报

Name five standard elements in a robust weekly analytics pack. 稳健的周报应固定展示的五项内容？

📖 点击查看答案

P50/IQR、P90/P95、样本量 N、异常处理说明、与上期差异分解。 P50/IQR, P90/P95, N, outlier policy, week-over-week drivers.

📝 点击查看解析

统一模板便于复盘与审核。

Question I6 — Scenario Planning / 情景规划

Demand is uncertain. How to plan inventory with scenarios? 需求不确定，如何做情景库存规划？

📖 点击查看答案

P50/P90 需求区间→服务水平法算安全库存；乐观/基准/谨慎三情景 + 回补策略。 Quantile bands + service level.

📝 点击查看解析

把不确定性转化为可执行参数。

Question I7 — A/B Governance / A/B 合规

List key guardrails to prevent p-hacking. 防止 p-hacking 的关键机制？

📖 点击查看答案

预注册假设与主指标、固定样本或序贯规则、单一停规则、审计日志与版本控制。 Preregistration, fixed/seq plan, single stop rule, audit trails.

📝 点击查看解析

课堂要求“先计划，后试验”。

Question I8 — Metric Drift / 指标漂移

Define and monitor metric drift. 如何定义并监控“指标漂移”？

📖 点击查看答案

指标版本化、数据源校验、分布漂移监控（如 PSI）、异常报警与回滚策略。 Version metrics, source checks, PSI, alarms & rollback.

📝 点击查看解析

指标即契约，需要版本与变更记录。

Question I9 — From Correlation to Causation / 从相关到因果

Propose a roadmap from correlation to causal inference. 从相关到因果的路线图？

📖 点击查看答案

控制/分层→面板固定效应→自然实验/IV/断点→随机实验；各步做平衡与稳健性检验。 Stratify→FE panel→IV/RD→RCT.

📝 点击查看解析

逐步提升识别强度。

Question I10 — Three-Region Threshold / 三段阈值

Turn posterior into “block / review / pass” three regions with costs. 将后验概率转化为“拦截/复核/放行”三段阈值。

📖 点击查看答案

t₁=C(FN)/(C(FN)+C(FP))；设 t₂>t₁ 结合队列容量：p≥t₂ 拦截；t₁≤p<t₂ 复核；p<t₁ 放行。 Use cost-based t₁, capacity-tuned t₂.

📝 点击查看解析

以期望成本最小化与资源约束共同决定。

Quartz 4

Explorer

Quiz 1.1 （选做）

Group A — MGS 2150 Lecture 2 · Chap 1-1（10 题 · 数据类型、抽样、可视化与数据质量）

Question A1 — Scale Choice for Satisfaction / 满意度量表与度量选择

Question A2 — Cross-Section vs Time Series / 横截面 vs. 时间序列

Question A3 — Sampling Frame / 抽样框设计

Question A4 — Data Quality & Outliers / 数据质量与极端值

Question A5 — Variable Types / 变量类型识别

Question A6 — Dashboard Ethics / 仪表盘伦理

Question A7 — Simpson’s Paradox Lite / 轻量辛普森悖论

Question A8 — Survey Weighting / 调查加权

Question A9 — Frequency Table & Bin Alignment / 频数分组对齐

Question A10 — Metric Definition / 指标口径修订

Group B — MGS 2150 Lecture 3 · Chap 1-2 v.1（10 题 · 表格/图表设计、帕累托、折线/注释、热力图）

Question B1 — Pareto for Returns / 退货帕累托优先级

Question B2 — Small Multiples / 小倍图对比

Question B3 — Dual-Axis Caution / 双轴图注意

Question B4 — Heatmap Standardization / 热力图标准化

Question B5 — Annotated Lines / 注释折线

Question B6 — Table for Decisions / 决策型表格

Question B7 — Category Granularity / 分类粒度

Question B8 — Stem-and-Leaf Use / 茎叶图使用

Question B9 — Real-time vs. Accuracy / 时效与准确

Question B10 — Consistency Checks / 一致性校验

Group C — MGS 2150 Lecture 4 · Chap 2-1（10 题 · 集中趋势、离散度、稳健度量）

Question C1 — Mean vs Median / 均值 vs. 中位数

Question C2 — Weighted Mean / 加权平均

Question C3 — Geometric Mean / 几何平均

Question C4 — Range, IQR, SD / 极差、IQR、标准差

Question C5 — Coefficient of Variation / 变异系数

Question C6 — Empirical Rule (Appropriate Use) / 经验法则的适用

Question C7 — Chebyshev (Distribution-Free) / 切比雪夫不等式

Question C8 — z-Scores & Standardization / z 分数与标准化

Question C9 — Outlier Policy / 异常值政策

Question C10 — Percentiles for SLA / 用分位数管理 SLA

Group D — MGS 2150 Lecture 5 · Chap 2-2（10 题 · 分位数/箱线图/偏度峰度/稳健统计/展示规范）

Question D1 — Percentiles for SLA / 用分位数管理 SLA

Question D2 — Boxplot for Carriers / 用箱线图选择承运商

Question D3 — Special vs. Common Cause / 特殊原因 vs 常见原因

Question D4 — IQR Stability KPI / IQR 稳定性指标

Question D5 — Skewness Interpretation / 偏度的运营含义

Question D6 — Kurtosis & Tail Risk / 峰度与尾部风险

Question D7 — Combine Plots / 组合图避免信息过载

Question D8 — Robust Center / 稳健中心

Question D9 — Winsorization Policy / 温莎化政策

Question D10 — Dashboard Guardrails / 图表护栏

Group E — MGS 2150 Lecture 6 · Chap 3-1（10 题 · 概率加法/乘法/条件/独立/计数）

Question E1 — Addition Rule / 加法法则

Question E2 — Only One of Two / 仅其一

Question E3 — Complement / 补事件

Question E4 — Independence Check / 独立性直觉

Question E5 — Conditional Probability / 条件概率

Question E6 — Law of Total Probability / 全概率

Question E7 — Basic Counting: Combinations / 组合

Question E8 — Basic Counting: Permutations / 排列

Question E9 — Multiplication Rule / 乘法法则

Question E10 — Bayes Lite (Interpretation) / 简易贝叶斯直觉

Group F — MGS 2150 Lecture 7 · Chap 3-2（10 题 · 贝叶斯/先验更新/阈值与代价）

Question F1 — Bayes Posterior / 后验概率计算

Question F2 — Base Rate Fallacy / 基准率陷阱

Question F3 — Precision Under Imbalance / 类别极不平衡的精确率

Question F4 — Decision Threshold by Cost / 代价驱动阈值

Question F5 — Cascaded Screening / 级联筛查

Question F6 — Sequential Updating / 顺序更新

Question F7 — Prior Setting / 先验设定

Question F8 — Posterior to Action / 后验到动作

Question F9 — Value of Information / 信息价值

Question F10 — Communicating Posterior / 沟通后验

Group G — MGS 2150 Lecture 8 · Chap 4-1（10 题 · 二项/泊松/几何/负二项/混合/稀疏化）

Question G1 — Binomial Basics / 二项分布基础

Question G2 — Hypergeometric QA / 超几何抽检

Question G3 — Poisson Tail / 泊松尾部概率

Question G4 — Geometric Expectation / 几何分布期望

Question G5 — Negative Binomial / 负二项期望

Question G6 — Poisson Approximation / 泊松近似二项

Question G7 — Thinning Property / 稀疏化

Question G8 — Mixture Mean & Var / 混合分布的均值方差

Question G9 — Compound Poisson Sum / 复合泊松

Question G10 — Net Difference: Skellam / 泊松差分