Distinctions in Data & Measurement (数据与测量的区分)

1. Data, Elements, Variables, Observations (数据、元素、变量、观测值的区分)

  • 1.1 Data (数据)
    • Definition / 定义: Fact-based information such as numbers, figures, tables. (基于事实的信息,例如数字、图形、表格。)
    • Key Idea / 核心要点: Data = the whole collection (整张表或数据集).
    • Example / 例子: January sales = 2000 units. (一月份销售额 = 2000件。)
  • 1.2 Elements (元素)
    • Definition / 定义: Objects/entities on which data are collected. (数据收集的对象或实体。)
    • Example / 例子: A student, a product, a country. (学生、产品、国家。)
  • 1.3 Variables (变量)
    • Definition / 定义: Characteristics or attributes of elements. (元素的特征或属性。)
    • Example / 例子: Age, gender, GPA. (年龄、性别、绩点。)
  • 1.4 Observations (观测值)
    • Definition / 定义: The set of values for each variable on an element. (元素在各变量上的一组取值。)
    • Example / 例子: Student A: gender = female, age = 20, GPA = 3.5. (学生A:性别=女,年龄=20,GPA=3.5。)

2. Data Types (数据类型的区分)

  • 2.1 Categorical Data (分类数据)
    • Categorical Data
      • Definition / 定义: Group-based or label-based (Nominal & Ordinal). (基于组别或标签的数据,包括名义和顺序。)
      • Example / 例子: Gender, opinion (male/female, agree/disagree). (性别,意见。)
      • Visualization / 可视化: Bar chart, pie chart. (条形图、饼图。)
    • 2.1.1 Nominal (名义尺度)
      • Definition / 定义: Classification without order. (分类,无顺序。)
      • Example / 例子: Gender (男/女), Blood type (血型).
      • Key Use / 用途: Counting and grouping only. (用于计数和分组。)
    • 2.1.2 Ordinal (顺序尺度)
      • Definition / 定义: Ordered but intervals not equal. (有顺序,但间隔不一定相等。)
      • Example / 例子: Satisfaction rating 1–5. (满意度1–5。)
      • Key Use / 用途: Ranking analysis. (排序分析。)
  • 2.2 Quantitative Data (数量数据)
    • Quantitative Data
      • Definition / 定义: Numeric with measurable meaning (Interval & Ratio). (有度量意义的数值,包括区间和比率。)
      • Example / 例子: Age, distance, income. (年龄、距离、收入。)
      • Visualization / 可视化: Histogram, line chart. (直方图、折线图。)
    • 2.2.1 Interval (区间尺度)
      • Definition / 定义: Ordered, equal intervals, no true zero. (有顺序,间隔相等,无绝对零点。)
      • Example / 例子: Celsius temperature, calendar years. (摄氏温度、年份。)
      • Key Use / 用途: Differences are meaningful, ratios meaningless. (差值有意义,比例无意义。)
    • 2.2.2 Ratio (比率尺度)
      • Definition / 定义: Ordered, equal intervals, with true zero. (有顺序,间隔相等,有绝对零点。)
      • Example / 例子: Income, weight, age, distance. (收入、体重、年龄、距离。)
      • Key Use / 用途: All arithmetic including ratios. (可进行所有算术运算,包括比例。)

3. Quick Rules to Distinguish (快速区分法则)

  • Step 1 → Ask: Categorical or Quantitative? (先判断分类还是数量)
  • Step 2 → If Categorical → Nominal or Ordinal? (分类数据 → 名义或顺序)
  • Step 3 → If Quantitative → Interval or Ratio? (数量数据 → 区间或比率)
  • Nominal / 名义: Just labels. (只有标签)
  • Ordinal / 顺序: Has order, no equal gaps. (有顺序,无相等间隔)
  • Interval / 区间: Equal intervals, no true zero. (等间隔,无零点)
  • Ratio / 比率: Equal intervals + true zero. (等间隔+有零点)

4. Common Confusions (常见混淆点)

  • Numbers as labels / 数字作标签: ZIP code, product ID = Nominal. (邮编、编号 → 名义尺度)
  • Likert scale / 李克特量表: Satisfaction 1–5 = Ordinal, often treated ~Interval. (满意度1–5 = 顺序,常近似区间)
  • Temperature / 温度: Celsius/Fahrenheit = Interval; Kelvin = Ratio. (摄氏/华氏=区间;开尔文=比率)
  • Age vs Year / 年龄与年份: Age = Ratio; Birth year = Interval. (年龄=比率;出生年份=区间)
  • Income / 收入: Ratio (0=no income). Negative possible but ratios across signs meaningless. (比率;跨正负倍数无意义)
  • Ranks / 名次: Ordinal not Interval. (名次=顺序,不是区间)

5. Mini Cheat-Sheet (速查表)

  • Data / 数据: Whole collection (整体信息)
  • Elements / 元素: Rows/things (行/对象)
  • Variables / 变量: Columns/features (列/特征)
  • Observations / 观测值: One row’s full record (单行完整记录)
  • Categorical / 分类: Nominal + Ordinal
  • Quantitative / 数量: Interval + Ratio