clutch/docs/DATABASE_LOGICAL_STRUCTURE.md

# Database Logical Structure (ER Diagram)

This diagram illustrates the logical relationships and data flow between the storage layers (L1, L2, L3) in the optimized architecture.

```mermaid
erDiagram
    %% ==========================================
    %% L1 LAYER: RAW DATA (Data Lake)
    %% ==========================================

    L1A_raw_iframe_network {
        string match_id PK
        json content "Raw API Response"
        timestamp processed_at
    }

    L1B_tick_snapshots_parquet {
        string match_id FK
        int tick
        int round
        json player_states "Positions, HP, Equip"
        json bomb_state
        string file_path "Parquet File Location"
    }

    %% ==========================================
    %% L2 LAYER: DATA WAREHOUSE (Structured)
    %% ==========================================

    dim_players {
        string steam_id_64 PK
        string username
        float rating
        float avg_clutch_win_rate
    }

    dim_maps {
        int map_id PK
        string map_name "de_mirage"
        string nav_mesh_path
    }

    fact_matches {
        string match_id PK
        int map_id FK
        timestamp start_time
        int winner_team
        int final_score_ct
        int final_score_t
    }

    fact_rounds {
        string round_id PK
        string match_id FK
        int round_num
        int winner_side
        string win_reason "Elimination/Bomb/Time"
    }

    L2_Spatial_NavMesh {
        string map_name PK
        string zone_id
        binary distance_matrix "Pre-calculated paths"
    }

    %% ==========================================
    %% L3 LAYER: FEATURE STORE (AI Ready)
    %% ==========================================

    L3_Offline_Features {
        string snapshot_id PK
        float feature_tpi "Time Pressure Index"
        float feature_crossfire "Tactical Score"
        float feature_equipment_diff
        int label_is_win "Target Variable"
    }

    %% ==========================================
    %% RELATIONSHIPS
    %% ==========================================

    %% L1 -> L2 Flow
    L1A_raw_iframe_network ||--|{ fact_matches : "Extracts to"
    L1A_raw_iframe_network ||--|{ dim_players : "Extracts to"
    L1B_tick_snapshots_parquet }|--|| fact_matches : "Belongs to"
    L1B_tick_snapshots_parquet }|--|| fact_rounds : "Details"

    %% L2 Relations
    fact_matches }|--|| dim_maps : "Played on"
    fact_rounds }|--|| fact_matches : "Part of"

    %% L2 -> L3 Flow (Feature Engineering)
    L3_Offline_Features }|--|| L1B_tick_snapshots_parquet : "Computed from"
    L3_Offline_Features }|--|| L2_Spatial_NavMesh : "Uses Physics from"
    L3_Offline_Features }|--|| dim_players : "Enriched with"
```

## 结构说明 (Structure Explanation)

1.  **L1 源数据层**:
    *   **左上 (L1A)**: 传统的数据库表，存储比赛结果元数据。
    *   **左下 (L1B)**: **虚线框表示的文件系统**。虽然物理上是 Parquet 文件，但在逻辑上它是一张巨大的“Tick 级快照表”，通过 `match_id` 与其他层关联。

2.  **L2 数仓层**:
    *   **核心 (Dim/Fact)**: 标准的星型模型。`fact_matches` 是核心事实表，关联 `dim_players` (人) 和 `dim_maps` (地)。
    *   **空间 (Spatial)**: 独立的查找表逻辑，为每一张 `dim_maps` 提供物理距离计算支持。

3.  **L3 特征层**:
    *   **右侧 (Features)**: 这是宽表（Wide Table），每一行直接对应模型的一个训练样本。它不存储原始数据，而是存储**计算后的数值** (如 TPI 指数)，直接由 L1B (位置) + L2 Spatial (距离) + Dim Players (能力) 融合计算而来。