Files
clutch/docs/DATABASE_LOGICAL_STRUCTURE.md
2026-02-05 23:26:03 +08:00

110 lines
3.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Database Logical Structure (ER Diagram)
This diagram illustrates the logical relationships and data flow between the storage layers (L1, L2, L3) in the optimized architecture.
```mermaid
erDiagram
%% ==========================================
%% L1 LAYER: RAW DATA (Data Lake)
%% ==========================================
L1A_raw_iframe_network {
string match_id PK
json content "Raw API Response"
timestamp processed_at
}
L1B_tick_snapshots_parquet {
string match_id FK
int tick
int round
json player_states "Positions, HP, Equip"
json bomb_state
string file_path "Parquet File Location"
}
%% ==========================================
%% L2 LAYER: DATA WAREHOUSE (Structured)
%% ==========================================
dim_players {
string steam_id_64 PK
string username
float rating
float avg_clutch_win_rate
}
dim_maps {
int map_id PK
string map_name "de_mirage"
string nav_mesh_path
}
fact_matches {
string match_id PK
int map_id FK
timestamp start_time
int winner_team
int final_score_ct
int final_score_t
}
fact_rounds {
string round_id PK
string match_id FK
int round_num
int winner_side
string win_reason "Elimination/Bomb/Time"
}
L2_Spatial_NavMesh {
string map_name PK
string zone_id
binary distance_matrix "Pre-calculated paths"
}
%% ==========================================
%% L3 LAYER: FEATURE STORE (AI Ready)
%% ==========================================
L3_Offline_Features {
string snapshot_id PK
float feature_tpi "Time Pressure Index"
float feature_crossfire "Tactical Score"
float feature_equipment_diff
int label_is_win "Target Variable"
}
%% ==========================================
%% RELATIONSHIPS
%% ==========================================
%% L1 -> L2 Flow
L1A_raw_iframe_network ||--|{ fact_matches : "Extracts to"
L1A_raw_iframe_network ||--|{ dim_players : "Extracts to"
L1B_tick_snapshots_parquet }|--|| fact_matches : "Belongs to"
L1B_tick_snapshots_parquet }|--|| fact_rounds : "Details"
%% L2 Relations
fact_matches }|--|| dim_maps : "Played on"
fact_rounds }|--|| fact_matches : "Part of"
%% L2 -> L3 Flow (Feature Engineering)
L3_Offline_Features }|--|| L1B_tick_snapshots_parquet : "Computed from"
L3_Offline_Features }|--|| L2_Spatial_NavMesh : "Uses Physics from"
L3_Offline_Features }|--|| dim_players : "Enriched with"
```
## 结构说明 (Structure Explanation)
1. **L1 源数据层**:
* **左上 (L1A)**: 传统的数据库表,存储比赛结果元数据。
* **左下 (L1B)**: **虚线框表示的文件系统**。虽然物理上是 Parquet 文件但在逻辑上它是一张巨大的“Tick 级快照表”,通过 `match_id` 与其他层关联。
2. **L2 数仓层**:
* **核心 (Dim/Fact)**: 标准的星型模型。`fact_matches` 是核心事实表,关联 `dim_players` (人) 和 `dim_maps` (地)。
* **空间 (Spatial)**: 独立的查找表逻辑,为每一张 `dim_maps` 提供物理距离计算支持。
3. **L3 特征层**:
* **右侧 (Features)**: 这是宽表Wide Table每一行直接对应模型的一个训练样本。它不存储原始数据而是存储**计算后的数值** (如 TPI 指数),直接由 L1B (位置) + L2 Spatial (距离) + Dim Players (能力) 融合计算而来。