Group A — MGS 2150 Lecture 2 · Chap 1-1(10 题 · 数据类型、抽样、可视化与数据质量)
Question A1 — Scale Choice for Satisfaction / 满意度量表与度量选择
A manager wants to compare “average satisfaction” across 4 stores. Ratings are recorded as Very bad/Bad/Neutral/Good/Very good. What summary and chart are appropriate? 某经理想比较 4 家门店的“平均满意度”。评分为“非常差/差/中/好/非常好”。应采用什么汇总指标与图表?
📖 点击查看答案
Answer|答案: Use median and quartiles (ordinal scale), not arithmetic mean. Visualize with a stacked bar (proportion) or box plot if converted to scores with caution. 使用中位数与四分位数(有序尺度),不宜用算术均值。图表可用堆叠条形图(比例)或在谨慎编码后用箱线图。
📝 点击查看解析
Why|解析: Ordinal categories are not equally spaced; averaging numbers after naive coding is misleading. Median/IQR respect order without assuming equal intervals. 有序分类并非等距,直接给分求均值会误导;中位数/IQR 能反映“典型水平”和分散度。
Question A2 — Cross-Section vs Time Series / 横截面 vs. 时间序列
You have April 2025 “same-day” delivery times for all stores (cross-section) and monthly series from 2023–2025 (time series). What charts for each and why? 你有 2025 年 4 月所有门店“当日达”时长(横截面)和 2023–2025 的月度序列(时间序列)。各用何图?为什么?
📖 点击查看答案
Answer|答案: Cross-section: box/violin plots by store to compare distributions. Time series: line chart with moving average and seasonal markers. 横截面用箱线/小提琴按门店比分布;时间序列用折线 + 移动平均并标注季节因素。
📝 点击查看解析
Why|解析: Cross-section needs distributional comparison; time series needs trend/seasonality detection. 横截面强调分布差异;时间序列要识别趋势与季节性。
Question A3 — Sampling Frame / 抽样框设计
To estimate average app waiting time in a day with strong peaks, how to sample? 要估计一天内 App 等待时长的平均值(高峰明显),如何抽样?
📖 点击查看答案
Answer|答案: Stratified sampling by hour; within each hour, systematic sampling at fixed intervals; weight strata by traffic. 按小时分层抽样,层内系统抽样,按流量加权。
📝 点击查看解析
Why|解析: Reduces variance and avoids peak-time bias; systematic sampling reduces autocorrelation. 分层降方差、避免高峰偏差;系统抽样可弱化自相关。
Question A4 — Data Quality & Outliers / 数据质量与极端值
A few tickets show “> 200 hours” delivery. Delete or not? 有少量工单显示配送时长“>200 小时”。是否删除?
📖 点击查看答案
Answer|答案: Do business check first (holiday, re-routing), then robust diagnostics (IQR/MAD). If confirmed errors, correct/flag; for reporting, use winsorization or robust stats (median/IQR). 先业务核验,再做稳健诊断;确属错误则更正/标注;报表可温莎化或用稳健统计(中位/IQR)。
📝 点击查看解析
Why|解析: Blind deletion risks bias; robust summaries reduce outlier impact. 盲删会引入偏差;稳健汇总能缓和极端值影响。
Question A5 — Variable Types / 变量类型识别
Classify: (a) coupon amount, (b) member or not, (c) store tier, (d) reorder rate. 分类:(a) 优惠券面额;(b) 是否会员;(c) 门店等级;(d) 复购率。
📖 点击查看答案
Answer|答案: (a) Ratio, (b) Nominal binary, (c) Ordinal, (d) Ratio (proportion). (a) 比率;(b) 名义二元;(c) 有序;(d) 比率(比例)。
📝 点击查看解析
Why|解析: Scale choice drives allowed summaries/tests. 量表决定可用的统计与检验。
Question A6 — Dashboard Ethics / 仪表盘伦理
A chart starts y-axis at 90% to exaggerate improvement. What to do? 图表把纵轴从 90% 起点,夸大改进。怎么办?
📖 点击查看答案
Answer|答案: Use zero-based axis for KPIs or clearly show axis break and provide both absolute and % change tables. KPI 采用零基坐标;如断轴需显著标注,并同时提供绝对值与百分比表。
📝 点击查看解析
Why|解析: Prevents misinterpretation; transparency builds trust. 防误读,透明提升信任。
Question A7 — Simpson’s Paradox Lite / 轻量辛普森悖论
Overall conversion is lower for Channel A than B, but within “new vs returning” segments A is higher. What to report? 总体上 A 转化低于 B,但在“新客/老客”分组内 A 更高。怎么汇报?
📖 点击查看答案
Answer|答案: Report stratified metrics and a standardized overall rate using common segment weights. 同时报告分层指标与基于统一权重的标准化总体率。
📝 点击查看解析
Why|解析: Segment mix is a confounder; standardization removes structure bias. 结构差异是混杂;标准化可剥离。
Question A8 — Survey Weighting / 调查加权
Satisfaction survey over-samples heavy users. Without re-surveying, how to adjust? 满意度样本重度偏向高频用户,如何在不重采的前提下矫正?
📖 点击查看答案
Answer|答案: Post-stratification / raking by visit frequency and membership; report weighted vs unweighted side-by-side. 用后分层/迭代比例按来店频率、会员身份加权;并同时展示加权与未加权结果。
📝 点击查看解析
Why|解析: Reduces selection bias and clarifies variance inflation. 降低选择偏误,并说明方差变化。
Question A9 — Frequency Table & Bin Alignment / 频数分组对齐
When building a histogram for call duration, how to choose bin width to align with an SLA threshold at 120s? 制作通话时长直方图,如何让组距与 120 秒 SLA 对齐?
📖 点击查看答案
Answer|答案: Start with Freedman–Diaconis rule, then shift edges so 120s is a bin boundary; check sensitivity. 先用FD 规则求组距,再移动组边界使 120 秒成为分界;做敏感性检查。
📝 点击查看解析
Why|解析: Preserves statistical guidance and business interpretability. 兼顾统计性与业务可读性。
Question A10 — Metric Definition / 指标口径修订
“Store-visit conversion = store visits / app opens” is distorted by remote opens. Fix it. “到店转化率=到店人数/APP 打开人数”被大量异地打开扭曲。如何修订?
📖 点击查看答案
Answer|答案: Use geo-matched denominator (local MAU) or stage rates (open→coupon→visit). 改为同城活跃为分母,或用分段率(打开→领券→到店)。
📝 点击查看解析
Why|解析: Denominator must reflect actionable exposure. 分母需匹配可影响的人群。
Group B — MGS 2150 Lecture 3 · Chap 1-2 v.1(10 题 · 表格/图表设计、帕累托、折线/注释、热力图)
Question B1 — Pareto for Returns / 退货帕累托优先级
After plotting a Pareto chart of return reasons, how do you pick the first fixes for the next 30 days? 画出退货原因帕累托图后,如何选定未来 30 天的优先修复项?
📖 点击查看答案
Answer|答案: Target causes covering ~80% cumulative and rank by impact × controllability × cost. 选择累计约 80% 的主因,并按影响×可控性×成本打分排序。
📝 点击查看解析
Why|解析: Pareto shows contribution; feasibility triage turns charts into actions. 帕累托给出贡献度;可行性分级才能落地。
Question B2 — Small Multiples / 小倍图对比
You must compare distribution of turnover days for 12 stores in one page. Best approach? 一页内比较 12 家门店周转天数分布,最佳做法?
📖 点击查看答案
Answer|答案: Small-multiples box plots with same axis and a target line; sort by median or IQR. 小倍数箱线图(同一坐标),加目标线;按中位数或 IQR 排序。
📝 点击查看解析
Why|解析: Same scale enables visual comparison; target line relates to goals. 同尺度便于直观比较;目标线连接绩效。
Question B3 — Dual-Axis Caution / 双轴图注意
Why can a sales vs. ad-spend dual-axis line mislead and what to use instead? 销售额与广告费双轴折线为何易误导?替代方案?
📖 点击查看答案
Answer|答案: Axis scaling is arbitrary → pseudo-correlation. Use indexed lines (base=100) or scatter with trend, plus event annotations. 双轴刻度任意导致“伪相关”。改用指数化折线(基期=100)或散点+趋势,并加事件注释。
📝 点击查看解析
Why|解析: Standardization clarifies real co-movement. 标准化使同步性更可信。
Question B4 — Heatmap Standardization / 热力图标准化
A region×category heatmap highlights big cities. How to avoid population bias? 地区×品类热力图总被一线城市“亮瞎”。如何避免人口尺度偏差?
📖 点击查看答案
Answer|答案: Use per-capita or per-10k-household rates; provide raw totals in a side table. 采用人均或“每万户”比率,并在侧表给出总量对照。
📝 点击查看解析
Why|解析: Separates scale from performance. 剥离规模影响看真实表现。
Question B5 — Annotated Lines / 注释折线
How to show the effect of a pricing policy change in a time series without clutter? 如何在时间序列中提示“价格政策变化”的影响且不拥挤?
📖 点击查看答案
Answer|答案: Add a vertical marker with short note; consider pre/post window averages or segmented trend. 加竖线标记+注释;可做前/后窗口均值或分段趋势。
📝 点击查看解析
Why|解析: Links data to events and supports causal discussion. 把数据与事件建立联系。
Question B6 — Table for Decisions / 决策型表格
List three must-have columns for an executive action table. 高层决策表必须包含的三列?
📖 点击查看答案
Answer|答案: Absolute value, % change vs. target, and gap to target; highlight exceptions. 绝对值、相对变化与距离目标的差;并突出异常。
📝 点击查看解析
Why|解析: Converts numbers into actions. 让数字直接服务决策。
Question B7 — Category Granularity / 分类粒度
Combining “stock-out” and “supply break” caused diagnosis failure. Fix? 把“缺货/断供”合并成“无货”导致根因分析失败,如何修正?
📖 点击查看答案
Answer|答案: Restore hierarchical categories and visualize with tree/sankey to show flow. 恢复分层分类,用树图/桑基图表示流转路径。
📝 点击查看解析
Why|解析: Granularity is key to actionability. 粒度适当才可落地。
Question B8 — Stem-and-Leaf Use / 茎叶图使用
When is a stem-and-leaf better than a box plot for managers? 在什么情况下茎叶图优于箱线图?
📖 点击查看答案
Answer|答案: When sample is small-to-medium and exact values matter (quality reviews). 中小样本且需要看到精确数值时。
📝 点击查看解析
Why|解析: Preserves each observation while showing shape. 同时保留形状与观测值。
Question B9 — Real-time vs. Accuracy / 时效与准确
Daily dashboard is T+1. How to add timely insight without “passing off estimates as truth”? 日报 T+1,如何在不“以估作真”前提下提升时效?
📖 点击查看答案
Answer|答案: Add a nowcast panel with uncertainty band; auto-replace by final data next day. 增设近实时预估并配不确定性带,次日用最终数据回填。
📝 点击查看解析
Why|解析: Separates estimate from final numbers transparently. 透明区分估计与定稿。
Question B10 — Consistency Checks / 一致性校验
POS sales vs. WMS shipments differ due to returns timing. What documentation and checks? POS 销售与 WMS 出库因退货时点不同而不一致。需要哪些文档与校验?
📖 点击查看答案
Answer|答案: Metric dictionary, reconciliation matrix, and time-alignment rules. 指标字典、对账差异矩阵与时间口径对齐规则。
📝 点击查看解析
Why|解析: Prevents “definition drift” and audit issues. 防止口径漂移,便于审计。
Group C — MGS 2150 Lecture 4 · Chap 2-1(10 题 · 集中趋势、离散度、稳健度量)
Question C1 — Mean vs Median / 均值 vs. 中位数
Delivery times are right-skewed. Which center measure for “typical user experience”? 配送时长右偏,哪种中心更能代表“典型体验”?
📖 点击查看答案
Answer|答案: Median (with IQR). 中位数(配合 IQR)。
📝 点击查看解析
Why|解析: Median is robust to long right tail. 中位数不受长右尾强影响。
Question C2 — Weighted Mean / 加权平均
Why use weighted mean for average discounted price? 折后均价为何要用加权平均?
📖 点击查看答案
Answer|答案: Weight by sales volume; simple mean overstates discount if small-volume deep-discount SKUs exist. 以销量为权重,否则小销量大折扣会夸大降价效果。
📝 点击查看解析
Why|解析: Pricing decisions depend on revenue-weighted reality. 定价要基于营收加权的现实。
Question C3 — Geometric Mean / 几何平均
For 3 yearly growth rates 10%, −5%, 15%, why prefer geometric mean for multi-year growth? 三年增长 10%、−5%、15%,为何多期增长用几何平均?
📖 点击查看答案
Answer|答案: It respects compounding; arithmetic mean misstates multi-period effect. 反映复利效果;算术平均会误差多期效果。
📝 点击查看解析
Why|解析: Multi-year growth is a product, not a sum. 多年增长是乘积过程。
Question C4 — Range, IQR, SD / 极差、IQR、标准差
Suggest a simple stability KPI less sensitive to outliers. 给一个对异常值不敏感的“稳定性 KPI”。
📖 点击查看答案
Answer|答案: IQR or trimmed SD (e.g., remove top/bottom 5%). 四分位距 IQR或截尾标准差(如上下各 5%)。
📝 点击查看解析
Why|解析: Robust metrics better reflect typical variability. 稳健度量更能反映典型波动。
Question C5 — Coefficient of Variation / 变异系数
Two lines: μ₁=100, σ₁=8; μ₂=60, σ₂=7. Which is more stable? 两条产线:μ₁=100, σ₁=8;μ₂=60, σ₂=7。哪条更稳定?
📖 点击查看答案
Answer|答案: Compute CV: 0.08 vs 0.1167 → Line 1 is more stable. CV:0.08 vs 0.1167,产线 1 更稳定。
📝 点击查看解析
Why|解析: CV enables comparison across different means. CV 可跨均值比较稳定性。
Question C6 — Empirical Rule (Appropriate Use) / 经验法则的适用
When can you use the 68-95-99.7 rule to flag unusual observations? 何时可用 68-95-99.7 经验法则标记异常?
📖 点击查看答案
Answer|答案: When data are approximately normal (symmetric, light-tailed). 近似正态(对称、轻尾)时。
📝 点击查看解析
Why|解析: Otherwise use quantiles or MAD. 非正态应用分位/MAD。
Question C7 — Chebyshev (Distribution-Free) / 切比雪夫不等式
Provide a conservative 2-SD bound coverage for any distribution. 给出任意分布下“均值±2σ”的保守覆盖率。
📖 点击查看答案
Answer|答案: At least 1 − 1/2² = 75%. 至少 75%。
📝 点击查看解析
Why|解析: Distribution-free bound; useful for worst-case planning. 与分布无关,适合保守规划。
Question C8 — z-Scores & Standardization / z 分数与标准化
Why standardize metrics before comparing across categories with different scales? 跨品类、不同尺度的指标为何要标准化后比较?
📖 点击查看答案
Answer|答案: Removes scale effects; compare deviations relative to own variability. 去除尺度差异,以本品类波动为参照比较偏离。
📝 点击查看解析
Why|解析: Prevents unfair ranking due to scale. 避免因尺度不同导致的误判。
Question C9 — Outlier Policy / 异常值政策
Give two principles for an outlier policy in reporting. 报表中的异常值政策给出两条原则。
📖 点击查看答案
Answer|答案: (1) Business-first verification; (2) Transparent thresholds and impact notes (e.g., winsor levels). (1) 业务优先核验;(2) 阈值透明并说明影响(如温莎比例)。
📝 点击查看解析
Why|解析: Ensures auditability and trust. 便于审计并建立信任。
Question C10 — Percentiles for SLA / 用分位数管理 SLA
SLA: “95% orders delivered within 24h”. What should the weekly report include? SLA:“95% 24 小时内送达”。周报应包含哪些统计?
📖 点击查看答案
Answer|答案: P50/P90/P95/P99, late-case breakdown, and trend vs last week. P50/P90/P95/P99、超时工单分解、与上周对比趋势。
📝 点击查看解析
Why|解析: Percentiles target both typical and tail performance. 既管典型表现又管尾部风险。
- Group D — Lecture 5 · Chap 2-2(四分位、箱线图、偏度峰度、稳健统计、展示规范)
- Group E — Lecture 6 · Chap 3-1(加法/乘法/条件概率、独立性、计数)
- Group F — Lecture 7 · Chap 3-2(贝叶斯、先验更新、阈值与代价)
- Group G — Lecture 8 · Chap 4-1(离散型/泊松近似、几何/负二项、混合/稀疏化)
- Group H — Lecture 9 · Chap 4-2(正态/指数/均匀、CLT、区间估计与样本量)
- Group I — MGS 2150 6th(综合运用:指标栈、治理、A/B 合规、情景分析)
Group D — MGS 2150 Lecture 5 · Chap 2-2(10 题 · 分位数/箱线图/偏度峰度/稳健统计/展示规范)
Question D1 — Percentiles for SLA / 用分位数管理 SLA
A courier service promises “95% orders ≤ 24h”. Which percentiles should the weekly report show? 某快递承诺“95% 订单 24 小时内送达”。周报应展示哪些分位数?
📖 点击查看答案
P50(中位)、P90、P95、P99;并列出超 P95 的原因分解与数量。 Show P50, P90, P95, P99; plus breakdown counts/reasons beyond P95.
📝 点击查看解析
分位数能同时反映典型表现与尾部风险;P95 对应承诺点,P99 帮助识别极端延迟。
Question D2 — Boxplot for Carriers / 用箱线图选择承运商
Three carriers’ delivery times are summarized by boxplots. Which features guide a contract decision? 三家承运商箱线图已给出。签约时应关注哪些要素?
📖 点击查看答案
更小的中位数与 IQR、更短上须、更少离群点;对长尾可设置罚则或服务等级条款。 Lower median & IQR, shorter upper whisker, fewer outliers; add tail-risk penalties.
📝 点击查看解析
箱线图直接反映稳定性与尾部风险,比单看均值更可靠。
Question D3 — Special vs. Common Cause / 特殊原因 vs 常见原因
A single-day spike of late orders is observed. How to decide whether to exclude from KPI? 某天超时订单激增,是否应从 KPI 考核剔除?
📖 点击查看答案
查系统/天气/节假日日志;若与一次性事件匹配→“特殊原因”,可标注并单列;否则视为波动的一部分。 Match with incident logs; if special cause, annotate & separate; else keep.
📝 点击查看解析
SPC 思路:特殊原因应纠正流程,不应惩罚一线;常见原因需持续改进。
Question D4 — IQR Stability KPI / IQR 稳定性指标
Design a simple stability KPI using IQR. 用 IQR 设计一个“稳定性”KPI。
📖 点击查看答案
KPI:当日 IQR ≤ 目标阈值(如 ≤ 40 分钟);与中位数目标同时达成才视为合格。 IQR threshold + median target.
📝 点击查看解析
IQR 抗异常,能衡量“集中程度”;与中位数结合避免“普遍慢但很稳”的误导。
Question D5 — Skewness Interpretation / 偏度的运营含义
Customer spending shows positive skewness. What promotion idea fits this pattern? 客单价正偏(右长尾)。哪种促销更合适?
📖 点击查看答案
对高客单群体推“高阶权益/捆绑”分层优惠;大众只需基础折扣。 Tiered benefits for high spenders; basic discount for mass.
📝 点击查看解析
右尾代表少量高额交易,分层策略更有效。
Question D6 — Kurtosis & Tail Risk / 峰度与尾部风险
High kurtosis is observed in delivery times. Operational risk? 配送时长峰度偏高,风险在哪里?
📖 点击查看答案
更肥尾意味着偶发性极慢单多;需配置峰值产能/备用线路,并以 P99 监控。 Fat tails → more extreme delays; add surge capacity, monitor P99.
📝 点击查看解析
峰度高强调尾部概率上升,均值/中位难覆盖。
Question D7 — Combine Plots / 组合图避免信息过载
How to show distribution and threshold without clutter? 如何同时展示分布与阈值而不过载?
📖 点击查看答案
ECDF(累计分布)叠加 24h 竖线 + 小倍数箱线图分门店;同一尺度、先总后分。 ECDF + threshold line; small-multiple boxplots per store.
📝 点击查看解析
先整体看覆盖率,再下钻差异。
Question D8 — Robust Center / 稳健中心
Why prefer median over mean when outliers exist? 有离群值时,为什么首选中位数?
📖 点击查看答案
中位数对少量极端值不敏感,更能代表“典型体验”。 Median is robust to outliers.
📝 点击查看解析
均值会被极端值拉动,导致治理优先级失真。
Question D9 — Winsorization Policy / 温莎化政策
When is winsorization appropriate and how to document it? 何时适合温莎化?如何记录以便审计?
📖 点击查看答案
仅在报表/分析层,用固定比例(如上下 1%);记录阈值、比例、影响评估与审批。 Reporting-layer only; log cut points & impacts.
📝 点击查看解析
保留原始数据,确保可追溯与一致性。
Question D10 — Dashboard Guardrails / 图表护栏
List three guardrails to reduce misinterpretation. 列出三条图表“护栏”。
📖 点击查看答案
KPI 零基坐标或明显断轴、显示样本量/分布、同时给绝对值与分位/相对变化。 Zero-based or marked breaks; show N & distribution; absolute + percentiles.
📝 点击查看解析
透明与上下文信息能显著降低误读。
Group E — MGS 2150 Lecture 6 · Chap 3-1(10 题 · 概率加法/乘法/条件/独立/计数)
Question E1 — Addition Rule / 加法法则
70% members, 40% used coupons, 25% both. What is P(at least one)? 会员占 70%,用券 40%,交集 25%。至少使用一项概率?
📖 点击查看答案
0.70 + 0.40 − 0.25 = 0.85。 85%.
📝 点击查看解析
P(A∪B)=P(A)+P(B)−P(A∩B)。
Question E2 — Only One of Two / 仅其一
Using the same data, probability of “only one of the two”? 在同一数据下,“仅其一”的概率?
📖 点击查看答案
(0.70−0.25)+(0.40−0.25)=0.60。 60%.
📝 点击查看解析
去掉交集部分即可。
Question E3 — Complement / 补事件
System success 99.3%. Failure probability? 系统成功率 99.3%,失败概率?
📖 点击查看答案
1 − 0.993 = 0.007(0.7%)。 0.7%.
📝 点击查看解析
补集更直观可用于排班与容灾容量。
Question E4 — Independence Check / 独立性直觉
If P(coupon | new)=0.6 but P(coupon)=0.4, independent? 若 P(券|新客)=0.6,而 P(券)=0.4,是否独立?
📖 点击查看答案
否,不独立(正相关)。 Not independent.
📝 点击查看解析
独立要求条件概率等于总体概率。
Question E5 — Conditional Probability / 条件概率
Click-to-purchase = 12%, overall purchase = 3%. What is click rate? 点击后购买率 12%,总体购买 3%。点击率?
📖 点击查看答案
0.03 = 0.12 × P(click) → 0.25(25%)。 25%.
📝 点击查看解析
全概率公式:P(Buy)=P(Buy|Click)P(Click)。
Question E6 — Law of Total Probability / 全概率
Overall churn 8%; new users 30% with 15% churn. Old-user churn? 总体流失 8%,新客占 30% 且其流失 15%。老客流失率?
📖 点击查看答案
0.08 = 0.3×0.15 + 0.7×x → x=5%。 5%.
📝 点击查看解析
分层权重求和等于总体率。
Question E7 — Basic Counting: Combinations / 组合
“12 choose 2” bundles and “at least 1 of 3 new SKUs”? 12 选 2 组合数;且“至少含 1 个 3 款新品”?
📖 点击查看答案
C(12,2)=66;减去“0 新品”C(9,2)=36,得 30。 66 and 30.
📝 点击查看解析
先总数,后排除法更快。
Question E8 — Basic Counting: Permutations / 排列
Three hosts for Mon/Tue/Wed, one per day, no repeat. Arrangements? 三位主播排三天,每天一人不重复。排法数?
📖 点击查看答案
3! = 6。若周三固定 A,则剩下 2! = 2。 6; with Wed fixed, 2.
📝 点击查看解析
排列考虑顺序;限定条件先固定。
Question E9 — Multiplication Rule / 乘法法则
P(A)=0.4, P(B)=0.5, A and B independent. P(A∩B)? 独立事件 A、B:P(A)=0.4,P(B)=0.5,求交集。
📖 点击查看答案
0.4 × 0.5 = 0.20。 20%.
📝 点击查看解析
独立即连乘。
Question E10 — Bayes Lite (Interpretation) / 简易贝叶斯直觉
High-score leads 20%; P(sale|high)=15%, P(sale|low)=2%. If a random sale occurs, is it more likely from high-score? 高分线索 20%,其转化 15%,低分 80% 转化 2%。已知出现一笔成交,更可能来自高分吗?
📖 点击查看答案
是。后验 ≈ 65% 来自高分(计算见 F1)。 Yes, ~65% likely high-score.
📝 点击查看解析
直觉:高分虽人数少,但单位成交率高。
Group F — MGS 2150 Lecture 7 · Chap 3-2(10 题 · 贝叶斯/先验更新/阈值与代价)
Question F1 — Bayes Posterior / 后验概率计算
High 20% with 15% conv; low 80% with 2%. Given a sale, P(high)? 高分 20%(转化 15%),低分 80%(2%)。已成交,来自高分的概率?
📖 点击查看答案
0.15×0.20 / (0.15×0.20 + 0.02×0.80) = ≈65.2%。 ≈65.2%.
📝 点击查看解析
后验 ∝ 先验×似然;分子高分路径,分母为所有路径之和。
Question F2 — Base Rate Fallacy / 基准率陷阱
Disease rate 1%, sensitivity 95%, false positive 5%. P(disease | positive)? 患病率 1%,敏感度 95%,假阳性 5%。阳性后患病概率?
📖 点击查看答案
(0.95×0.01)/(0.95×0.01+0.05×0.99) ≈ 16.1%。 16.1%.
📝 点击查看解析
低基准率下阳性预测值有限→需二次检测。
Question F3 — Precision Under Imbalance / 类别极不平衡的精确率
Alert fire rate 5%, true threat 0.2%, TPR 90%. Approx precision? 告警触发 5%,真实威胁 0.2%,召回 90%。精确率约多少?
📖 点击查看答案
TP≈0.18%,FP≈4.82%,Precision≈0.18/(0.18+4.82)=≈3.6%。 ≈3.6%.
📝 点击查看解析
失衡下应看 PR 曲线与阈值成本。
Question F4 — Decision Threshold by Cost / 代价驱动阈值
When to block a transaction by posterior probability and costs? 如何用后验概率与代价设定拦截阈值?
📖 点击查看答案
若 P(Fraud|x) ≥ C(FN)/(C(FN)+C(FP)) 则拦截;可设“复核带”。 Threshold = C_FN/(C_FN+C_FP).
📝 点击查看解析
以期望成本最小化为准则,结合队列产能设两段阈值。
Question F5 — Cascaded Screening / 级联筛查
Low-precision first screen + high-precision review: why two thresholds? 低精度初筛 + 高精度复核,为什么要两段阈值?
📖 点击查看答案
初筛设低阈提高召回;复核设高阈控制误杀;总体最小化期望损失。 Recall first, then precision.
📝 点击查看解析
分层比单阈值更贴合业务资源与风险。
Question F6 — Sequential Updating / 顺序更新
How to update lead probability after each new signal? 每次新证据到来如何更新线索成交概率?
📖 点击查看答案
Posterior_t ∝ Likelihood_t × Posterior_{t−1}(归一化)。 Multiply by likelihood and renormalize.
📝 点击查看解析
CRM 在线学习的基本框架。
Question F7 — Prior Setting / 先验设定
Entering a new city with few observations, how to set priors? 新城市数据少,先验如何设?
📖 点击查看答案
使用层级先验借鉴相似城市(分层贝叶斯),并做敏感性分析。 Hierarchical priors + sensitivity.
📝 点击查看解析
防止过度自信或过度保守。
Question F8 — Posterior to Action / 后验到动作
A coupon costs 5,预期毛利提升 ΔG。如何基于后验做决策?
📖 点击查看答案
若 P(Buy|x,券) × ΔG ≥ 5 则发券;否则不发。 Trigger if expected gain ≥ cost.
📝 点击查看解析
把概率转化为期望收益比较。
Question F9 — Value of Information / 信息价值
When to stop testing and ship the better variant? 何时停止试验并上线更优版本?
📖 点击查看答案
当“预期改判概率 × 改判带来的收益” < “继续采样成本”时停止。 Stop if VOI < sampling cost.
📝 点击查看解析
简化为一条性价比规则。
Question F10 — Communicating Posterior / 沟通后验
How to present posterior to non-technical managers? 如何向非技术经理解释后验?
📖 点击查看答案
提供区间(如 60–70%)、阈值比较、与代价情景(拦截/放行/复核)三段结论。 Show interval, threshold line, and cost-based actions.
📝 点击查看解析
概率→动作对照表最易理解。
Group G — MGS 2150 Lecture 8 · Chap 4-1(10 题 · 二项/泊松/几何/负二项/混合/稀疏化)
Question G1 — Binomial Basics / 二项分布基础
Email delivery p=0.98, n=1000. Approx P(X=980) idea? 到达率 0.98,群发 1000,近似 P(X=980) 的思路?
📖 点击查看答案
用正态近似:μ=980,σ≈√(1000·0.98·0.02)≈4.427,并做连续性校正。 Normal approx with continuity correction.
📝 点击查看解析
n 大、p 不极端时,二项可正态近似。
Question G2 — Hypergeometric QA / 超几何抽检
200 items, 10 defects, draw 5 without replacement. P(at least one defect)? 200 件,10 次品,不放回抽 5。至少 1 次品概率?
📖 点击查看答案
1 − C(190,5)/C(200,5)。 1 − C(190,5)/C(200,5).
📝 点击查看解析
有限总体、无放回→超几何。
Question G3 — Poisson Tail / 泊松尾部概率
λ=0.8/hr. P(≥2 next hour)? λ=0.8/小时。下一小时至少 2 起的概率?
📖 点击查看答案
1 − e^{-0.8}(1+0.8)。 1 − e^{-0.8}(1+0.8).
📝 点击查看解析
由 P(0)+P(1) 互补求得尾部。
Question G4 — Geometric Expectation / 几何分布期望
First-call resolution p=0.7. Expected number of calls? 首次解决率 0.7。期望通话次数?
📖 点击查看答案
1/p = ≈1.43。 ≈1.43.
📝 点击查看解析
几何分布:直到首次成功的尝试次数。
Question G5 — Negative Binomial / 负二项期望
Need 3 successes, success p=0.6. Expected trials? 需要 3 次成功,单次成功率 0.6,期望尝试数?
📖 点击查看答案
r/p = 5。 5.
📝 点击查看解析
负二项:达到 r 次成功的总尝试。
Question G6 — Poisson Approximation / 泊松近似二项
When can Bin(n,p) ≈ Pois(λ=np)? Give an example. 何时二项可近似泊松?举例。
📖 点击查看答案
n 大、p 小、np 适中;如百万请求中错误率 0.0005。 Large n, small p; e.g., rare errors.
📝 点击查看解析
便于快速估计稀有事件。
Question G7 — Thinning Property / 稀疏化
Events ~ Pois(λ=10/hr). Keep 20% alerts. New process? 告警流泊松(10/h),仅保留 20%。新过程?
📖 点击查看答案
仍为泊松,λ’=2/h。 Poisson with λ’=2/h.
📝 点击查看解析
泊松稀疏化性质。
Question G8 — Mixture Mean & Var / 混合分布的均值方差
30% high λ=12, 70% low λ=4. Find mean and variance. 高峰(30%) λ=12、平峰(70%) λ=4 的混合。求期望与方差。
📖 点击查看答案
E=6.4;Var=6.4 + 0.3(12−6.4)² + 0.7(4−6.4)² ≈ 15.64。 Mean 6.4; Var ≈ 15.64.
📝 点击查看解析
方差=组内方差(=E)+组间方差。
Question G9 — Compound Poisson Sum / 复合泊松
Orders N~Pois(λ), amount i.i.d. with mean μ. E(total sales)? 订单数泊松,单笔金额独立同分布均值 μ。总销售额期望?
📖 点击查看答案
E(Sum)=E(N)·E(Amount)=λ·μ。 λ·μ.
📝 点击查看解析
线性期望性。
Question G10 — Net Difference: Skellam / 泊松差分
Two independent Poissons λ_A, λ_B. Distribution of (A−B)? 两独立泊松流 A、B 的差 A−B 的分布?
📖 点击查看答案
Skellam 分布;E=λ_A−λ_B,Var=λ_A+λ_B。 Skellam(λ_A, λ_B).
📝 点击查看解析
适用于“净增/净流量”分析。
Group H — MGS 2150 Lecture 9 · Chap 4-2(10 题 · 正态/指数/均匀/CLT/区间/样本量)
Question H1 — Normal Tail Fee / 正态尾部加费
Weight ~ N(1.00, 0.06²) kg. Fee if >1.10 kg. What proportion pays fee? 重量正态(1.00, 0.06²),>1.10kg 加费。比例?
📖 点击查看答案
Z=(1.10−1.00)/0.06≈1.667 → ≈4.78%。 ≈4.78%.
📝 点击查看解析
计算正态上尾概率。
Question H2 — Exponential Waiting / 指数等待
λ=6/min (mean 10s). P(wait > 30s)? λ=6/分(均值 10 秒)。等待超过 30 秒概率?
📖 点击查看答案
e^{-6×0.5}=e^{-3}≈4.98%。 4.98%.
📝 点击查看解析
记忆无关,适合排队近似。
Question H3 — Uniform Tolerance / 均匀公差
X~U[−0.5,0.5] mm. P(|X|>0.3)? 均匀分布在 [−0.5,0.5],超过 |0.3| 的概率?
📖 点击查看答案
1 − 0.6/1 = 0.4。 40%.
📝 点击查看解析
区间长度比例。
Question H4 — CLT Sample Mean / 中心极限定理
σ=0.12, n=36, μ=1.00. P( X̄ >1.03 )? 总体 σ=0.12,n=36,均值 1.00。样本均值 >1.03 的概率?
📖 点击查看答案
Z=(1.03−1.00)/(0.12/√36)=1.5 → ≈6.68%。 ≈6.68%.
📝 点击查看解析
X̄ 的标准误 = σ/√n。
Question H5 — z-Interval for Mean / 已知 σ 的均值区间
σ=0.12, n=100, X̄=1.02 kg. 95% CI? 已知 σ=0.12,样本 100,样本均值 1.02。95% 区间?
📖 点击查看答案
1.02 ± 1.96×0.12/√100 → [1.001, 1.039] kg。 [1.001, 1.039] kg.
📝 点击查看解析
解释为“程序覆盖率”,非单次概率。
Question H6 — CI for Proportion / 比例区间
n=200, success=182. 95% CI? 样本 200,成功 182。95% 比例区间?
📖 点击查看答案
p̂=0.91,SE≈0.020;CI≈0.91±1.96×0.020 → [0.871, 0.949]。 [0.871, 0.949].
📝 点击查看解析
小样本可用 Wilson 更稳健。
Question H7 — t vs z / 何时用 t
When to use t-interval for mean instead of z? 均值区间何时用 t 而不是 z?
📖 点击查看答案
σ 未知、样本较小且近似正态时用 t;样本大时 z 近似可行。 Use t when σ unknown & small n.
📝 点击查看解析
t 反映“估计 σ”的额外不确定性。
Question H8 — One- vs Two-Tailed / 单尾与双尾
You only want to show “time decreased”. Which tail and why pre-register? 只想证明“时间缩短”,该用单尾;为何要预注册方向?
📖 点击查看答案
单尾;预注册可避免事后改方向导致 I 类错误膨胀。 One-tailed; preregistration controls Type I error.
📝 点击查看解析
课堂强调:先设备择,再收数据。
Question H9 — Practical vs Statistical / 实质 vs 统计显著
p=0.02 but mean time reduces only 0.2 minutes. How to communicate? p=0.02,但均值仅缩短 0.2 分钟。如何沟通?
📖 点击查看答案
“统计显著但效应小”,给出置信区间与 ROI;建议先灰度上线。 Statistically significant, practically small; pilot first.
📝 点击查看解析
将显著性与业务阈值一并呈现。
Question H10 — Sample Size for Proportion / 比例样本量
Target p≈0.05, margin ±0.5pct, 95% CI. Rough n? 预估 p=0.05,误差 ±0.5 个百分点,95% 置信。样本量级?
📖 点击查看答案
n≈z²p(1−p)/E² → 约 7.3k–7.4k。 ≈7,300–7,400.
📝 点击查看解析
若未知 p,取 0.5 更保守。
Group I — MGS 2150 6th.pdf(10 题 · 综合:指标栈/治理/A-B 合规/情景分析)
Question I1 — KPI Stack / 指标全栈
Design a traceable KPI stack from raw logs to decisions. 设计一套“可追溯”的 KPI 栈(原始日志→指标→决策)。
📖 点击查看答案
原始层(只读)→清洗层(异常标注)→分析层(中位/IQR/分位)→决策层(阈值+ROI)→字典/版本与审计日志。 Raw→Clean→Analytics→Decision + dictionary & audit.
📝 点击查看解析
统计方法与数据治理需同时落地。
Question I2 — Mixed Methods / 混合方法
Sales drop has both data and process changes. How to combine quant & qual? 销量下滑既有数据波动也有流程变化,如何混合方法诊断?
📖 点击查看答案
定量:分层对比/断点前后;定性:访谈/走查;三角验证统一结论。 Stratified contrasts + interviews.
📝 点击查看解析
课堂强调“证据三角化”提升可信度。
Question I3 — Metric Hierarchy / 指标层级
Build North Star → drivers → operational metrics to avoid local optima. 构建“北极星→驱动→操作”指标,避免局部最优。
📖 点击查看答案
北极星:长期价值;驱动:转化/留存;操作:点击/到达;用因果图约束不伤害下游。 Use causal map and guardrails.
📝 点击查看解析
层级清晰才能对齐激励。
Question I4 — Experiment Under Constraint / 样本受限的实验
Three variants but limited traffic. Design efficiently. 三方案但流量有限,如何高效设计?
📖 点击查看答案
析因设计或多臂赌博;设停止规则与最小效应阈值。 Factorial or multi-armed bandit.
📝 点击查看解析
提高“信息/样本”比。
Question I5 — Robust Weekly Pack / 稳健周报
Name five standard elements in a robust weekly analytics pack. 稳健的周报应固定展示的五项内容?
📖 点击查看答案
P50/IQR、P90/P95、样本量 N、异常处理说明、与上期差异分解。 P50/IQR, P90/P95, N, outlier policy, week-over-week drivers.
📝 点击查看解析
统一模板便于复盘与审核。
Question I6 — Scenario Planning / 情景规划
Demand is uncertain. How to plan inventory with scenarios? 需求不确定,如何做情景库存规划?
📖 点击查看答案
P50/P90 需求区间→服务水平法算安全库存;乐观/基准/谨慎三情景 + 回补策略。 Quantile bands + service level.
📝 点击查看解析
把不确定性转化为可执行参数。
Question I7 — A/B Governance / A/B 合规
List key guardrails to prevent p-hacking. 防止 p-hacking 的关键机制?
📖 点击查看答案
预注册假设与主指标、固定样本或序贯规则、单一停规则、审计日志与版本控制。 Preregistration, fixed/seq plan, single stop rule, audit trails.
📝 点击查看解析
课堂要求“先计划,后试验”。
Question I8 — Metric Drift / 指标漂移
Define and monitor metric drift. 如何定义并监控“指标漂移”?
📖 点击查看答案
指标版本化、数据源校验、分布漂移监控(如 PSI)、异常报警与回滚策略。 Version metrics, source checks, PSI, alarms & rollback.
📝 点击查看解析
指标即契约,需要版本与变更记录。
Question I9 — From Correlation to Causation / 从相关到因果
Propose a roadmap from correlation to causal inference. 从相关到因果的路线图?
📖 点击查看答案
控制/分层→面板固定效应→自然实验/IV/断点→随机实验;各步做平衡与稳健性检验。 Stratify→FE panel→IV/RD→RCT.
📝 点击查看解析
逐步提升识别强度。
Question I10 — Three-Region Threshold / 三段阈值
Turn posterior into “block / review / pass” three regions with costs. 将后验概率转化为“拦截/复核/放行”三段阈值。
📖 点击查看答案
t₁=C(FN)/(C(FN)+C(FP));设 t₂>t₁ 结合队列容量:p≥t₂ 拦截;t₁≤p<t₂ 复核;p<t₁ 放行。 Use cost-based t₁, capacity-tuned t₂.
📝 点击查看解析
以期望成本最小化与资源约束共同决定。