Question 1 — Elements, Variables, and Observations

In a study about student performance, “students” are the ______, “GPA” is a ______, and “each student’s GPA record” is a(n) ______.

📖 点击查看答案 Elements; Variable; Observation 元素;变量;观测值
📝 点击查看解析 统计学中,元素是研究对象(学生),变量是特征(GPA),观测值是每个元素的测量数据(学生的GPA记录)。

Question 2 — Nominal Scale

Which of the following is measured on a nominal scale?
A) Temperature in Celsius
B) Student ID number
C) Credit rating (AAA, AA, A, B)
D) Student GPA

📖 点击查看答案 B) Student ID number 学号
📝 点击查看解析 名义尺度只用于分类或标记,学号只是身份识别,没有顺序或数量意义。

Question 3 — Ordinal Scale

A company evaluates customer satisfaction as “Very satisfied, Satisfied, Neutral, Dissatisfied.” Which measurement scale is used?

📖 点击查看答案 Ordinal scale (顺序尺度)
📝 点击查看解析 顺序尺度强调排序意义,尽管等级间差距未必相等。

Question 4 — Interval Scale

A student scores 600 on TOEFL while another scores 550. Which scale is used and why?

📖 点击查看答案 Interval scale (区间尺度)
📝 点击查看解析 区间尺度有固定间隔,600 与 550 相差 50 分,但不存在绝对零点。

Question 5 — Ratio Scale

If Kevin has 80kg weight and Melissa has 40kg weight, what scale is applied?

📖 点击查看答案 Ratio scale (比率尺度)
📝 点击查看解析 比率尺度有绝对零点,可以进行比率比较。Kevin 的体重是 Melissa 的两倍。

Question 6 — Categorical vs Quantitative

Gender of a person is ______ data, while their height is ______ data.

📖 点击查看答案 Categorical; Quantitative 类别型;数量型
📝 点击查看解析 性别是分类信息;身高是可以测量的数值数据。

Question 7 — Cross-sectional Data

Which dataset is cross-sectional?
A) Quarterly profits of Apple from 2015 to 2024
B) Survey of 200 people’s income levels in 2024
C) Daily stock price of Tesla in 2023

📖 点击查看答案 B) Survey of 200 people’s income levels in 2024
📝 点击查看解析 横截面数据在同一时间点收集;A 和 C 都涉及时间序列。

Question 8 — Time Series Data

Tracking U.S. gasoline prices from 2012 to 2018 is an example of ______ data.

📖 点击查看答案 Time series data (时间序列数据)
📝 点击查看解析 时间序列数据强调随时间收集的变化趋势。

Question 9 — Observational Study

Observing how long customers stay in Walmart without interference is an example of ______ study.

📖 点击查看答案 Observational study (观察性研究)
📝 点击查看解析 观察性研究不控制变量,只是记录数据。

Question 10 — Experimental Study

In a medical trial, one group receives a vaccine and another receives a placebo. What type of study is this?

📖 点击查看答案 Experimental study (实验性研究)
📝 点击查看解析 实验性研究通过控制变量(是否接种疫苗)来观察结果。

Question 11 — Data Acquisition Challenges

Name two key challenges of data acquisition.

📖 点击查看答案 Time requirement; Cost of acquisition; Data errors 时间需求;获取成本;数据错误
📝 点击查看解析 获取数据可能耗时且昂贵,错误数据可能误导分析。

Question 12 — Mean Calculation

If the costs of five parts are 60, 80, $90, what is the mean cost?

📖 点击查看答案 Mean = (50+60+70+80+90)/5 = $70 均值 = $70
📝 点击查看解析 均值反映集中趋势,是最常见的数值描述统计。

Question 13 — Population vs Sample

Surveying all students in WKU is a ______, while surveying 200 students is a ______.

📖 点击查看答案 Census (普查); Sample survey (抽样调查)
📝 点击查看解析 普查是对总体所有元素进行测量,抽样只研究总体的一部分。

Question 14 — Descriptive vs Inferential Statistics

Which type of statistics is used to “summarize GPA of surveyed students” and which is used to “predict GPA of all students”?

📖 点击查看答案 Descriptive statistics; Inferential statistics 描述统计;推断统计
📝 点击查看解析 描述统计只总结已有数据,推断统计用样本去推测总体。

Question 15 — Excel Functions

Which Excel function calculates the maximum value in a dataset?

📖 点击查看答案 =MAX(range)
📝 点击查看解析 Excel 提供 MAX、MIN、AVERAGE 等函数来进行基础统计。

Question 16 — Analytics Types

Matching:

  1. Descriptive analysis
  2. Predictive analysis
  3. Prescriptive analysis

A) Uses past data to forecast future outcomes
B) Summarizes what happened in the past
C) Recommends best actions

📖 点击查看答案 1 → B; 2 → A; 3 → C
📝 点击查看解析 描述性分析总结过去,预测性分析预测未来,规范性分析推荐最佳行动。

Question 17 — Big Data

Walmart captures 20–30 million transactions per day. This is an example of ______.

📖 点击查看答案 Big Data (大数据)
📝 点击查看解析 大数据指规模庞大且复杂的数据集,需要数据仓储进行管理。

Question 18 — Data Warehousing

The process of capturing, storing, and maintaining data is called ______.

📖 点击查看答案 Data warehousing (数据仓储)
📝 点击查看解析 数据仓储是大数据处理中核心环节。

Question 19 — Data Mining Applications

Name one business application of data mining.

📖 点击查看答案 - Recommending related products - Identifying customers for discounts 推荐相关产品;识别优惠客户
📝 点击查看解析 数据挖掘常用于零售、金融、通信领域,提升客户价值与利润。

Question 20 — Overfitting in Data Mining

Why is overfitting a risk in data mining?

📖 点击查看答案 Because a model may fit the training data well but fail on new data. 因为模型可能在训练数据上表现良好,但在新数据上失效。
📝 点击查看解析 过拟合会导致虚假的关联与误导结论,因此必须通过测试集验证并谨慎解释。