feat: Initial commit of Clutch-IQ project
This commit is contained in:
76
docs/API_INTERFACE_GUIDE.md
Normal file
76
docs/API_INTERFACE_GUIDE.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# Clutch-IQ Inference API Interface Guide
|
||||
|
||||
## Overview
|
||||
The Inference Service (`src/inference/app.py`) supports **two types of payloads** to accommodate different use cases: Real-time Game Integration and Strategy Simulation (Dashboard).
|
||||
|
||||
## 1. Raw Game State Payload (Game Integration)
|
||||
Used when receiving data directly from the CS2 Game State Integration (GSI) or Parser. The server performs Feature Engineering.
|
||||
|
||||
**Use Case:** Real-time match prediction.
|
||||
|
||||
**Payload Structure:**
|
||||
```json
|
||||
{
|
||||
"game_time": 60.0,
|
||||
"is_bomb_planted": 0,
|
||||
"site": 0,
|
||||
"players": [
|
||||
{
|
||||
"team_num": 2, // 2=T, 3=CT
|
||||
"is_alive": true,
|
||||
"health": 100,
|
||||
"X": -1200, "Y": 500, "Z": 128,
|
||||
"active_weapon_name": "ak47",
|
||||
"balance": 4500,
|
||||
"equip_value": 2700
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Processing Logic:**
|
||||
- `process_payload` extracts `players` list.
|
||||
- Calculates `t_alive`, `health_diff`, `t_spread`, `pincer_index`, etc.
|
||||
- Returns feature vector.
|
||||
|
||||
---
|
||||
|
||||
## 2. Pre-calculated Feature Payload (Dashboard/Simulation)
|
||||
Used when the client (e.g., Streamlit Dashboard) manually sets the tactical situation. The server skips feature engineering and uses provided values.
|
||||
|
||||
**Use Case:** "What-if" analysis, Strategy Dashboard.
|
||||
|
||||
**Payload Structure:**
|
||||
```json
|
||||
{
|
||||
"t_alive": 2,
|
||||
"ct_alive": 3,
|
||||
"t_health": 180,
|
||||
"ct_health": 290,
|
||||
"t_equip_value": 8500,
|
||||
"ct_equip_value": 14000,
|
||||
"t_total_cash": 1200,
|
||||
"ct_total_cash": 3500,
|
||||
"team_distance": 1500.5,
|
||||
"t_spread": 400.2,
|
||||
"ct_spread": 800.1,
|
||||
"t_area": 40000.0,
|
||||
"ct_area": 64000.0,
|
||||
"t_pincer_index": 0.45,
|
||||
"ct_pincer_index": 0.22,
|
||||
"is_bomb_planted": 0,
|
||||
"site": 0,
|
||||
"game_time": 60.0
|
||||
}
|
||||
```
|
||||
|
||||
**Processing Logic:**
|
||||
- `process_payload` detects presence of `t_alive` / `ct_alive`.
|
||||
- Uses values directly.
|
||||
- Auto-calculates derived fields like `health_diff` (`ct - t`) if missing.
|
||||
|
||||
## Error Handling
|
||||
If you receive `Error: {"error":"Not supported type for data.<class 'NoneType'>"}`:
|
||||
- **Cause:** You sent a payload that matches neither format (e.g., missing `players` list AND missing direct features).
|
||||
- **Fix:** Ensure your JSON body matches one of the structures above.
|
||||
109
docs/Clutch_Prediction_Implementation_Plan.md
Normal file
109
docs/Clutch_Prediction_Implementation_Plan.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Project Clutch-IQ: CS2 实时胜率预测系统实施方案
|
||||
|
||||
> **Version**: 3.0 (Final Architecture)
|
||||
> **Date**: 2026-01-31
|
||||
> **Status**: Ready for Implementation
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目愿景 (Vision)
|
||||
构建一个**职业级、物理感知、战术驱动**的 CS2 实时残局胜率预测引擎。
|
||||
该系统不仅输出胜率数值(如 "CT Win 30%"),更能解析战术成因(如“因缺少拆弹钳且时间不足”),服务于赛后复盘、直播增强和战术分析。
|
||||
|
||||
---
|
||||
|
||||
## 2. 核心架构 (Architecture)
|
||||
|
||||
### 2.1 三层流水线设计
|
||||
1. **Phase 1: 数据快照引擎 (Snapshot Engine)** - *ETL 层*
|
||||
- 负责从 Demo 解析高频、高精度的“战术切片”。
|
||||
2. **Phase 2: 特征工程工厂 (Feature Factory)** - *逻辑层*
|
||||
- 将原始数据转化为物理特征(路径距离)和博弈特征(交叉火力)。
|
||||
3. **Phase 3: 模型预测服务 (Inference Service)** - *应用层*
|
||||
- 基于 XGBoost/LightGBM 提供毫秒级实时预测。
|
||||
|
||||
---
|
||||
|
||||
## 3. 详细实施蓝图 (Implementation Roadmap)
|
||||
|
||||
### Phase 1: 高精度数据快照 (The Snapshot Engine)
|
||||
|
||||
#### 1.1 智能触发器 (Smart Triggers)
|
||||
为了过滤冗余数据,系统仅在以下时刻捕获快照:
|
||||
* **关键事件**:`Player_Death`, `Bomb_Plant`, `Bomb_Defuse_Start`, `Bomb_Defuse_End`
|
||||
* **状态剧变**:任意玩家 HP 损失 > 20(捕捉对枪结果)
|
||||
* **时间心跳**:残局阶段 (≤3v3) 每 5 秒强制采样一次
|
||||
|
||||
#### 1.2 标准化快照字段 (Snapshot Schema)
|
||||
每个快照包含 4 类核心数据:
|
||||
|
||||
| 类别 | 字段名 | 说明 | 来源 |
|
||||
| :--- | :--- | :--- | :--- |
|
||||
| **元数据** | `match_id`, `round`, `tick` | 唯一索引 | Demo |
|
||||
| **局势** | `bomb_state`, `bomb_timer` | C4 状态 (0:未下, 1:已下, 2:被拆) | Demo |
|
||||
| **局势** | `seconds_remaining` | 回合/C4 倒计时 | Demo |
|
||||
| **人员** | `ct_alive`, `t_alive` | 存活人数 | Demo |
|
||||
| **人员** | `ct_hp_sum`, `t_hp_sum` | 团队总血量 | Demo |
|
||||
| **装备** | `ct_has_kit`, `t_has_c4` | **关键道具** (钳子/C4) | Demo |
|
||||
| **空间** | `ct_positions`, `t_positions` | 原始坐标 (用于后续计算) | Demo |
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: 特征工程与融合 (Feature Engineering)
|
||||
|
||||
#### 2.1 物理感知特征 (Physics-Aware Features)
|
||||
* **F1: 路径距离 (NavMesh Distance)**
|
||||
* *革新点*:放弃欧氏距离,使用地图路网计算真实移动距离。
|
||||
* *实现*:预计算 `Map_Zone_Distance_Matrix`,实时查询。
|
||||
* **F2: 时间压力指数 (Time Pressure Index - TPI)**
|
||||
* *公式*:$TPI = \frac{\text{TravelTime} + \text{DefuseTime}}{\text{TimeRemaining}}$
|
||||
* *判定*:$TPI > 1.0 \rightarrow$ 胜率强制归零。
|
||||
* **F3: 视线与掩体 (Line of Sight)**
|
||||
* *特征*:`is_blind` (致盲状态), `is_in_smoke` (烟雾状态)。
|
||||
|
||||
#### 2.2 战术博弈特征 (Tactical Features)
|
||||
* **F4: 交叉火力系数 (Crossfire Coefficient)**
|
||||
* *逻辑*:计算多名 CT 与目标 T 的夹角。夹角接近 90° 时胜率加成最大。
|
||||
* **F5: 经济势能差 (Economy Momentum)**
|
||||
* *公式*:$\Delta E = \text{CT\_Equip\_Value} - \text{T\_Equip\_Value}$
|
||||
* *作用*:量化“长枪打手枪”的装备压制力。
|
||||
|
||||
#### 2.3 选手画像注入 (Player Profiling)
|
||||
利用 L3 数据库增强模型对“人”的理解:
|
||||
* **F6: 明星光环**:`max_alive_rating` (存活最强选手的 Rating)。
|
||||
* **F7: 残局专家**:`avg_clutch_win_rate` (存活选手的历史残局胜率)。
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: 模型训练与策略 (Modeling Strategy)
|
||||
|
||||
#### 3.1 训练配置
|
||||
* **算法**:**XGBoost** (分类器)
|
||||
* **目标函数**:`LogLoss` (优化概率准确性)
|
||||
* **评估指标**:`AUC` (排序能力), `Brier Score` (校准度)
|
||||
|
||||
#### 3.2 样本清洗策略
|
||||
* **剔除保枪局 (Filter Save Rounds)**:
|
||||
* 若残局结束时:`Damage_Dealt == 0` AND `Dist_To_Enemy > 50m` AND `Weapon_Value > 2000`
|
||||
* 判定为“主动放弃”,剔除样本,防止污染胜率模型。
|
||||
|
||||
---
|
||||
|
||||
## 4. 交付物清单 (Deliverables)
|
||||
|
||||
1. **`extract_snapshots.py`**
|
||||
* 基于 `demoparser2` 的 Python 脚本,批量处理 Demo 生成 CSV 训练集。
|
||||
2. **`map_nav_graph.json`**
|
||||
* 核心地图 (Mirage, Inferno 等) 的区域距离查找表。
|
||||
3. **`Clutch_Predictor_Model.pkl`**
|
||||
* 训练好的 XGBoost 模型文件。
|
||||
4. **`Win_Prob_Service.py`**
|
||||
* 简单的 Flask 接口:输入当前状态 JSON $\rightarrow$ 输出 `{ "ct_win_prob": 0.35, "key_factor": "time_pressure" }`。
|
||||
|
||||
---
|
||||
|
||||
## 5. 下一步行动 (Action Items)
|
||||
|
||||
1. **[High Priority]** 开发 `extract_snapshots.py` 原型,跑通基础数据流。
|
||||
2. **[Medium Priority]** 构建 Mirage 地图的简单网格距离表。
|
||||
3. **[Medium Priority]** 整合 L3 数据库,生成选手能力特征表。
|
||||
109
docs/DATABASE_LOGICAL_STRUCTURE.md
Normal file
109
docs/DATABASE_LOGICAL_STRUCTURE.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Database Logical Structure (ER Diagram)
|
||||
|
||||
This diagram illustrates the logical relationships and data flow between the storage layers (L1, L2, L3) in the optimized architecture.
|
||||
|
||||
```mermaid
|
||||
erDiagram
|
||||
%% ==========================================
|
||||
%% L1 LAYER: RAW DATA (Data Lake)
|
||||
%% ==========================================
|
||||
|
||||
L1A_raw_iframe_network {
|
||||
string match_id PK
|
||||
json content "Raw API Response"
|
||||
timestamp processed_at
|
||||
}
|
||||
|
||||
L1B_tick_snapshots_parquet {
|
||||
string match_id FK
|
||||
int tick
|
||||
int round
|
||||
json player_states "Positions, HP, Equip"
|
||||
json bomb_state
|
||||
string file_path "Parquet File Location"
|
||||
}
|
||||
|
||||
%% ==========================================
|
||||
%% L2 LAYER: DATA WAREHOUSE (Structured)
|
||||
%% ==========================================
|
||||
|
||||
dim_players {
|
||||
string steam_id_64 PK
|
||||
string username
|
||||
float rating
|
||||
float avg_clutch_win_rate
|
||||
}
|
||||
|
||||
dim_maps {
|
||||
int map_id PK
|
||||
string map_name "de_mirage"
|
||||
string nav_mesh_path
|
||||
}
|
||||
|
||||
fact_matches {
|
||||
string match_id PK
|
||||
int map_id FK
|
||||
timestamp start_time
|
||||
int winner_team
|
||||
int final_score_ct
|
||||
int final_score_t
|
||||
}
|
||||
|
||||
fact_rounds {
|
||||
string round_id PK
|
||||
string match_id FK
|
||||
int round_num
|
||||
int winner_side
|
||||
string win_reason "Elimination/Bomb/Time"
|
||||
}
|
||||
|
||||
L2_Spatial_NavMesh {
|
||||
string map_name PK
|
||||
string zone_id
|
||||
binary distance_matrix "Pre-calculated paths"
|
||||
}
|
||||
|
||||
%% ==========================================
|
||||
%% L3 LAYER: FEATURE STORE (AI Ready)
|
||||
%% ==========================================
|
||||
|
||||
L3_Offline_Features {
|
||||
string snapshot_id PK
|
||||
float feature_tpi "Time Pressure Index"
|
||||
float feature_crossfire "Tactical Score"
|
||||
float feature_equipment_diff
|
||||
int label_is_win "Target Variable"
|
||||
}
|
||||
|
||||
%% ==========================================
|
||||
%% RELATIONSHIPS
|
||||
%% ==========================================
|
||||
|
||||
%% L1 -> L2 Flow
|
||||
L1A_raw_iframe_network ||--|{ fact_matches : "Extracts to"
|
||||
L1A_raw_iframe_network ||--|{ dim_players : "Extracts to"
|
||||
L1B_tick_snapshots_parquet }|--|| fact_matches : "Belongs to"
|
||||
L1B_tick_snapshots_parquet }|--|| fact_rounds : "Details"
|
||||
|
||||
%% L2 Relations
|
||||
fact_matches }|--|| dim_maps : "Played on"
|
||||
fact_rounds }|--|| fact_matches : "Part of"
|
||||
|
||||
%% L2 -> L3 Flow (Feature Engineering)
|
||||
L3_Offline_Features }|--|| L1B_tick_snapshots_parquet : "Computed from"
|
||||
L3_Offline_Features }|--|| L2_Spatial_NavMesh : "Uses Physics from"
|
||||
L3_Offline_Features }|--|| dim_players : "Enriched with"
|
||||
```
|
||||
|
||||
## 结构说明 (Structure Explanation)
|
||||
|
||||
1. **L1 源数据层**:
|
||||
* **左上 (L1A)**: 传统的数据库表,存储比赛结果元数据。
|
||||
* **左下 (L1B)**: **虚线框表示的文件系统**。虽然物理上是 Parquet 文件,但在逻辑上它是一张巨大的“Tick 级快照表”,通过 `match_id` 与其他层关联。
|
||||
|
||||
2. **L2 数仓层**:
|
||||
* **核心 (Dim/Fact)**: 标准的星型模型。`fact_matches` 是核心事实表,关联 `dim_players` (人) 和 `dim_maps` (地)。
|
||||
* **空间 (Spatial)**: 独立的查找表逻辑,为每一张 `dim_maps` 提供物理距离计算支持。
|
||||
|
||||
3. **L3 特征层**:
|
||||
* **右侧 (Features)**: 这是宽表(Wide Table),每一行直接对应模型的一个训练样本。它不存储原始数据,而是存储**计算后的数值** (如 TPI 指数),直接由 L1B (位置) + L2 Spatial (距离) + Dim Players (能力) 融合计算而来。
|
||||
130
docs/OPTIMIZED_ARCHITECTURE.md
Normal file
130
docs/OPTIMIZED_ARCHITECTURE.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Clutch-IQ & Data Warehouse Optimized Architecture (v4.0)
|
||||
|
||||
## 0. 本仓库目录映射
|
||||
|
||||
- L1A(网页抓取原始数据,SQLite):`database/L1/L1.db`
|
||||
- L1B(Demo 快照,Parquet):`data/processed/*.parquet`
|
||||
- L2(结构化数仓,SQLite):`database/L2/L2.db`
|
||||
- L3(特征库,SQLite):`database/L3/L3.db`
|
||||
- 离线 ETL:`src/etl/`(Demo → Parquet)
|
||||
- 训练:`src/training/train.py`
|
||||
- 在线推理:`src/inference/app.py`
|
||||
|
||||
## 1. 核心设计理念:混合流批架构 (Hybrid Batch/Stream Architecture)
|
||||
|
||||
为了同时满足 **大规模历史数据分析** (L2/L3) 和 **毫秒级实时胜率预测** (Clutch-IQ),我们将架构优化为现代化的数据平台模式。
|
||||
|
||||
核心变更点:
|
||||
1. **存储分层**: 高频快照(Tick/Frame)使用 **Parquet**;聚合后的业务/特征数据使用 **SQLite**。
|
||||
2. **特征解耦**: 引入 **Feature Store(特征库)** 概念,统一管理离线训练与在线推理使用的特征。
|
||||
3. **闭环反馈(可选)**: 预测结果可回写到 L2/L3,用于后续分析与迭代。
|
||||
|
||||
---
|
||||
|
||||
## 2. 优化后的分层架构图
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
%% Data Sources
|
||||
Web[5eplay Web Data] --> L1A
|
||||
Demo[CS2 .dem Files] --> L1B
|
||||
GSI[Real-time GSI Stream] --> Inference
|
||||
|
||||
%% L1 Layer: Data Lake (Raw)
|
||||
subgraph "L1: Data Lake (Raw Ingestion)"
|
||||
L1A[L1A: Metadata Store] -- SQLite --> L1A_DB[(database/L1/L1.db)]
|
||||
L1B[L1B: Telemetry Engine] -- Parquet --> L1B_Files[(data/processed/*.parquet)]
|
||||
end
|
||||
|
||||
%% L2 Layer: Data Warehouse (Clean)
|
||||
subgraph "L2: Data Warehouse (Structured)"
|
||||
L1A_DB --> L2_ETL
|
||||
L1B_Files --> L2_ETL[L2 Processors]
|
||||
L2_ETL --> L2_SQL[(database/L2/L2.db)]
|
||||
L2_ETL --> L2_Spatial[(L2_Spatial: NavMesh/Grids)]
|
||||
end
|
||||
|
||||
%% L3 Layer: Feature Store (Analytics & AI)
|
||||
subgraph "L3: Feature Store (Machine Learning)"
|
||||
L2_SQL --> L3_Offline
|
||||
L2_Spatial --> L3_Offline
|
||||
L3_Offline[Offline Feature Build] --> L3_DB[(database/L3/L3.db)]
|
||||
L3_Offline -- XGBoost --> Model[Clutch Predictor Model]
|
||||
|
||||
L3_DB --> Inference
|
||||
end
|
||||
|
||||
%% Application Layer
|
||||
subgraph "App: Clutch-IQ Service"
|
||||
Inference[Inference Engine]
|
||||
Model --> Inference
|
||||
Inference --> API[Win Prob API]
|
||||
end
|
||||
|
||||
API -.-> L2_SQL : Feedback Loop (Log Predictions)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 层级详细定义与优化点
|
||||
|
||||
### **L1: 数据湖层 (Data Lake)**
|
||||
* **L1A (Web Metadata)**: 保持现状。
|
||||
* *存储*: SQLite
|
||||
* *内容*: 比赛元数据、比分。
|
||||
* **L1B (Demo Telemetry) [优化重点]**:
|
||||
* *变更*: **不把 Tick/Frame 快照直接塞进 SQLite**。Demo 快照数据量大(64/128 tick/s),SQLite 容易膨胀且读写慢。
|
||||
* *优化*: 使用 **Parquet**(列式存储)保存快照,便于批量训练与分析。
|
||||
* *优势*: 高压缩、高吞吐、与 Pandas/XGBoost 训练流程匹配。
|
||||
|
||||
### **L2: 数仓层 (Data Warehouse)**
|
||||
* **L2 Core (Business)**: 保持现状。
|
||||
* *存储*: SQLite
|
||||
* *内容*: 玩家维度 (Dim_Player)、比赛维度 (Fact_Match) 的清洗数据。
|
||||
* **L2 Spatial (Physics) [新增]**:
|
||||
* *内容*: **地图导航网格 (Nav Mesh)**、距离矩阵、地图区域划分。
|
||||
* *用途*: 为 L3 提供物理计算基础(如计算 A 点到 B 点的真实跑图时间,而非直线距离)。
|
||||
|
||||
### **L3: 特征商店层 (Feature Store)**
|
||||
* **定义**: 不再只是一个 DB,而是一套**特征注册表**。
|
||||
* **Offline Store**:
|
||||
* 从 L2 聚合计算玩家/队伍特征,落到 L3(便于复用与快速查询)。
|
||||
* 训练标签(Label)仍来自比赛结果/回合结果(例如 `round_winner`)。
|
||||
* **Online Store**:
|
||||
* 在线推理时使用的快速查表数据(例如玩家能力/地图预计算数据)。
|
||||
* *例子*: 地图距离矩阵(预先算好的点对点距离),推理时查表以降低延迟。
|
||||
|
||||
---
|
||||
|
||||
## 4. 全方位评价 (Comprehensive Evaluation)
|
||||
|
||||
### ✅ 优势 (Pros)
|
||||
|
||||
1. **高性能 (Performance)**:
|
||||
* 引入 Parquet 解决了海量 Tick 数据的 I/O 瓶颈。
|
||||
* 预计算 L2 Spatial 数据,确保实时预测延迟低于 50ms。
|
||||
|
||||
2. **可扩展性 (Scalability)**:
|
||||
* L1B 和 L3 的文件式存储架构支持分布式处理(未来可迁移至 Spark/Dask)。
|
||||
* 新增地图只需更新 L2 Spatial,不影响模型逻辑。
|
||||
|
||||
3. **即时性与准确性平衡 (Real-time Readiness)**:
|
||||
* 架构明确区分了“离线训练”(追求精度,处理慢)和“在线推理”(追求速度,查表为主)。
|
||||
|
||||
4. **模块化 (Modularity)**:
|
||||
* L1/L2/L3 职责边界清晰,数据污染风险低。Clutch-IQ 只是 L3 的一个“消费者”,不破坏原有数仓结构。
|
||||
|
||||
### ⚠️ 潜在挑战 (Cons)
|
||||
|
||||
1. **技术栈复杂性**:
|
||||
* 引入 Parquet 需要 Python `pyarrow` 或 `fastparquet` 库支持。
|
||||
* 需要维护文件系统(File System)和数据库(SQLite)两种存储范式。
|
||||
|
||||
2. **冷启动成本**:
|
||||
* L2 Spatial 需要针对每张地图(Mirage, Inferno, Nuke...)单独构建导航网格数据,前期工作量大。
|
||||
|
||||
---
|
||||
|
||||
## 5. 结论
|
||||
|
||||
该优化架构从**单机分析型**向**工业级 AI 生产型**转变。它不仅能支持当前的胜率预测,更为未来扩展(如:反作弊行为分析、AI 教练系统)打下了坚实的底层基础。
|
||||
11
docs/README.md
Normal file
11
docs/README.md
Normal file
@@ -0,0 +1,11 @@
|
||||
# docs/
|
||||
|
||||
项目文档集中存放目录。
|
||||
|
||||
## 文档索引
|
||||
|
||||
- OPTIMIZED_ARCHITECTURE.md:仓库整体架构与 L1/L2/L3 分层说明
|
||||
- DATABASE_LOGICAL_STRUCTURE.md:数仓逻辑结构(ER/关系)说明
|
||||
- API_INTERFACE_GUIDE.md:在线推理接口(/predict)入参格式与用法
|
||||
- Clutch_Prediction_Implementation_Plan.md:实施路线与交付物规划
|
||||
|
||||
Reference in New Issue
Block a user