Files
clutch/docs/DATABASE_LOGICAL_STRUCTURE.md
2026-02-05 23:26:03 +08:00

3.6 KiB
Raw Permalink Blame History

Database Logical Structure (ER Diagram)

This diagram illustrates the logical relationships and data flow between the storage layers (L1, L2, L3) in the optimized architecture.

erDiagram
    %% ==========================================
    %% L1 LAYER: RAW DATA (Data Lake)
    %% ==========================================
    
    L1A_raw_iframe_network {
        string match_id PK
        json content "Raw API Response"
        timestamp processed_at
    }

    L1B_tick_snapshots_parquet {
        string match_id FK
        int tick
        int round
        json player_states "Positions, HP, Equip"
        json bomb_state
        string file_path "Parquet File Location"
    }

    %% ==========================================
    %% L2 LAYER: DATA WAREHOUSE (Structured)
    %% ==========================================

    dim_players {
        string steam_id_64 PK
        string username
        float rating
        float avg_clutch_win_rate
    }

    dim_maps {
        int map_id PK
        string map_name "de_mirage"
        string nav_mesh_path
    }

    fact_matches {
        string match_id PK
        int map_id FK
        timestamp start_time
        int winner_team
        int final_score_ct
        int final_score_t
    }

    fact_rounds {
        string round_id PK
        string match_id FK
        int round_num
        int winner_side
        string win_reason "Elimination/Bomb/Time"
    }

    L2_Spatial_NavMesh {
        string map_name PK
        string zone_id
        binary distance_matrix "Pre-calculated paths"
    }

    %% ==========================================
    %% L3 LAYER: FEATURE STORE (AI Ready)
    %% ==========================================

    L3_Offline_Features {
        string snapshot_id PK
        float feature_tpi "Time Pressure Index"
        float feature_crossfire "Tactical Score"
        float feature_equipment_diff
        int label_is_win "Target Variable"
    }

    %% ==========================================
    %% RELATIONSHIPS
    %% ==========================================

    %% L1 -> L2 Flow
    L1A_raw_iframe_network ||--|{ fact_matches : "Extracts to"
    L1A_raw_iframe_network ||--|{ dim_players : "Extracts to"
    L1B_tick_snapshots_parquet }|--|| fact_matches : "Belongs to"
    L1B_tick_snapshots_parquet }|--|| fact_rounds : "Details"

    %% L2 Relations
    fact_matches }|--|| dim_maps : "Played on"
    fact_rounds }|--|| fact_matches : "Part of"
    
    %% L2 -> L3 Flow (Feature Engineering)
    L3_Offline_Features }|--|| L1B_tick_snapshots_parquet : "Computed from"
    L3_Offline_Features }|--|| L2_Spatial_NavMesh : "Uses Physics from"
    L3_Offline_Features }|--|| dim_players : "Enriched with"

结构说明 (Structure Explanation)

  1. L1 源数据层:

    • 左上 (L1A): 传统的数据库表,存储比赛结果元数据。
    • 左下 (L1B): 虚线框表示的文件系统。虽然物理上是 Parquet 文件但在逻辑上它是一张巨大的“Tick 级快照表”,通过 match_id 与其他层关联。
  2. L2 数仓层:

    • 核心 (Dim/Fact): 标准的星型模型。fact_matches 是核心事实表,关联 dim_players (人) 和 dim_maps (地)。
    • 空间 (Spatial): 独立的查找表逻辑,为每一张 dim_maps 提供物理距离计算支持。
  3. L3 特征层:

    • 右侧 (Features): 这是宽表Wide Table每一行直接对应模型的一个训练样本。它不存储原始数据而是存储计算后的数值 (如 TPI 指数),直接由 L1B (位置) + L2 Spatial (距离) + Dim Players (能力) 融合计算而来。