784 lines
27 KiB
Markdown
784 lines
27 KiB
Markdown
|
|
# 三层数据库架构重构计划
|
||
|
|
|
||
|
|
## 一、项目背景与目标
|
||
|
|
|
||
|
|
### 现状分析
|
||
|
|
- **已有三层架构**: L1A(原始JSON) → L2(结构化事实/维度表) → L3(特征集市)
|
||
|
|
- **主要问题**:
|
||
|
|
1. 数据库文件命名不统一(L1A.sqlite, L2_Main.sqlite, L3_Features.sqlite)
|
||
|
|
2. JSON中存在两种Round数据格式(leetify含经济数据, classic含xyz坐标), 目前通过`data_source_type`标记但未完全统一Schema
|
||
|
|
3. web/services层包含大量数据处理逻辑(feature_service.py 2257行, stats_service.py 1113行), 应下沉到数据库构建层
|
||
|
|
4. L2_Builder.py单体文件1470行,缺乏模块化
|
||
|
|
|
||
|
|
### 重构目标
|
||
|
|
1. **标准化命名**: 统一数据库文件为`L1.db`, `L2.db`, `L3.db`
|
||
|
|
2. **Schema优化**: 设计统一Round数据表结构,支持多数据源差异化字段
|
||
|
|
3. **逻辑下沉**: 将聚合计算从web/services迁移至database层的processor模块
|
||
|
|
4. **模块化解耦**: 建立sub-processor模式,按功能域拆分处理器
|
||
|
|
5. **预留L1B**: 为未来Demo直接解析管道预留目录结构
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 二、目录结构重构
|
||
|
|
|
||
|
|
### 2.1 标准化三层目录
|
||
|
|
```
|
||
|
|
database/
|
||
|
|
├── L1/
|
||
|
|
│ ├── L1.db # 标准化命名(原L1A.sqlite)
|
||
|
|
│ ├── L1_Builder.py # 数据入库脚本(原L1A_Builder.py)
|
||
|
|
│ └── README.md
|
||
|
|
├── L1B/ # 预留未来Demo解析管道
|
||
|
|
│ └── README.md # 说明此目录用途及预留原因
|
||
|
|
├── L2/
|
||
|
|
│ ├── L2.db # 标准化命名(原L2_Main.sqlite)
|
||
|
|
│ ├── L2_Builder.py # 主构建器(重构,瘦身)
|
||
|
|
│ ├── schema.sql # 优化后的统一Schema
|
||
|
|
│ ├── processors/ # 新建:子处理器模块目录
|
||
|
|
│ │ ├── __init__.py
|
||
|
|
│ │ ├── match_processor.py # 比赛基础信息处理
|
||
|
|
│ │ ├── player_processor.py # 玩家统计处理
|
||
|
|
│ │ ├── round_processor.py # Round数据统一处理
|
||
|
|
│ │ ├── economy_processor.py # 经济数据处理(leetify)
|
||
|
|
│ │ ├── event_processor.py # 事件流处理(kill/bomb等)
|
||
|
|
│ │ └── spatial_processor.py # 空间坐标处理(classic)
|
||
|
|
│ └── README.md
|
||
|
|
├── L3/
|
||
|
|
│ ├── L3.db # 标准化命名(原L3_Features.sqlite)
|
||
|
|
│ ├── L3_Builder.py # 主构建器(重构)
|
||
|
|
│ ├── schema.sql # 保持现有L3 schema
|
||
|
|
│ ├── processors/ # 新建:特征计算模块
|
||
|
|
│ │ ├── __init__.py
|
||
|
|
│ │ ├── basic_processor.py # 基础特征(avg rating/kd/kast)
|
||
|
|
│ │ ├── sta_processor.py # 稳定性时间序列特征
|
||
|
|
│ │ ├── bat_processor.py # 对抗能力特征
|
||
|
|
│ │ ├── hps_processor.py # 高压场景特征
|
||
|
|
│ │ ├── ptl_processor.py # 手枪局特征
|
||
|
|
│ │ ├── side_processor.py # T/CT阵营特征
|
||
|
|
│ │ ├── util_processor.py # 道具使用特征
|
||
|
|
│ │ ├── eco_processor.py # 经济效率特征
|
||
|
|
│ │ └── pace_processor.py # 节奏侵略性特征
|
||
|
|
│ └── README.md
|
||
|
|
├── original_json_schema/ # 保持不变
|
||
|
|
└── Force_Rebuild.py # 更新引用新路径
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 三、L2层Schema优化
|
||
|
|
|
||
|
|
### 3.1 Round数据统一Schema设计
|
||
|
|
|
||
|
|
**核心思路**: 设计包含所有字段的统一表结构,根据`data_source_type`选择性填充
|
||
|
|
|
||
|
|
#### 3.1.1 fact_rounds表增强
|
||
|
|
```sql
|
||
|
|
CREATE TABLE IF NOT EXISTS fact_rounds (
|
||
|
|
match_id TEXT,
|
||
|
|
round_num INTEGER,
|
||
|
|
|
||
|
|
-- 公共字段(两种数据源均有)
|
||
|
|
winner_side TEXT CHECK(winner_side IN ('CT', 'T', 'None')),
|
||
|
|
win_reason INTEGER,
|
||
|
|
win_reason_desc TEXT,
|
||
|
|
duration REAL,
|
||
|
|
ct_score INTEGER,
|
||
|
|
t_score INTEGER,
|
||
|
|
|
||
|
|
-- Leetify专属字段
|
||
|
|
ct_money_start INTEGER, -- 仅leetify
|
||
|
|
t_money_start INTEGER, -- 仅leetify
|
||
|
|
begin_ts TEXT, -- 仅leetify
|
||
|
|
end_ts TEXT, -- 仅leetify
|
||
|
|
|
||
|
|
-- Classic专属字段
|
||
|
|
end_time_stamp TEXT, -- 仅classic
|
||
|
|
final_round_time INTEGER, -- 仅classic
|
||
|
|
pasttime INTEGER, -- 仅classic
|
||
|
|
|
||
|
|
-- 数据源标记(继承自fact_matches)
|
||
|
|
data_source_type TEXT CHECK(data_source_type IN ('leetify', 'classic', 'unknown')),
|
||
|
|
|
||
|
|
PRIMARY KEY (match_id, round_num),
|
||
|
|
FOREIGN KEY (match_id) REFERENCES fact_matches(match_id) ON DELETE CASCADE
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 3.1.2 fact_round_events表增强
|
||
|
|
```sql
|
||
|
|
CREATE TABLE IF NOT EXISTS fact_round_events (
|
||
|
|
event_id TEXT PRIMARY KEY,
|
||
|
|
match_id TEXT,
|
||
|
|
round_num INTEGER,
|
||
|
|
|
||
|
|
event_type TEXT CHECK(event_type IN ('kill', 'bomb_plant', 'bomb_defuse', 'suicide', 'unknown')),
|
||
|
|
event_time INTEGER,
|
||
|
|
|
||
|
|
-- Kill相关字段
|
||
|
|
attacker_steam_id TEXT,
|
||
|
|
victim_steam_id TEXT,
|
||
|
|
assister_steam_id TEXT,
|
||
|
|
flash_assist_steam_id TEXT,
|
||
|
|
trade_killer_steam_id TEXT,
|
||
|
|
|
||
|
|
weapon TEXT,
|
||
|
|
is_headshot BOOLEAN DEFAULT 0,
|
||
|
|
is_wallbang BOOLEAN DEFAULT 0,
|
||
|
|
is_blind BOOLEAN DEFAULT 0,
|
||
|
|
is_through_smoke BOOLEAN DEFAULT 0,
|
||
|
|
is_noscope BOOLEAN DEFAULT 0,
|
||
|
|
|
||
|
|
-- Classic空间数据(xyz坐标)
|
||
|
|
attacker_pos_x INTEGER, -- 仅classic
|
||
|
|
attacker_pos_y INTEGER, -- 仅classic
|
||
|
|
attacker_pos_z INTEGER, -- 仅classic
|
||
|
|
victim_pos_x INTEGER, -- 仅classic
|
||
|
|
victim_pos_y INTEGER, -- 仅classic
|
||
|
|
victim_pos_z INTEGER, -- 仅classic
|
||
|
|
|
||
|
|
-- Leetify评分影响
|
||
|
|
score_change_attacker REAL, -- 仅leetify
|
||
|
|
score_change_victim REAL, -- 仅leetify
|
||
|
|
twin REAL, -- 仅leetify (team win probability)
|
||
|
|
c_twin REAL, -- 仅leetify
|
||
|
|
twin_change REAL, -- 仅leetify
|
||
|
|
c_twin_change REAL, -- 仅leetify
|
||
|
|
|
||
|
|
-- 数据源标记
|
||
|
|
data_source_type TEXT CHECK(data_source_type IN ('leetify', 'classic', 'unknown')),
|
||
|
|
|
||
|
|
FOREIGN KEY (match_id, round_num) REFERENCES fact_rounds(match_id, round_num) ON DELETE CASCADE
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 3.1.3 fact_round_player_economy表增强
|
||
|
|
```sql
|
||
|
|
CREATE TABLE IF NOT EXISTS fact_round_player_economy (
|
||
|
|
match_id TEXT,
|
||
|
|
round_num INTEGER,
|
||
|
|
steam_id_64 TEXT,
|
||
|
|
|
||
|
|
side TEXT CHECK(side IN ('CT', 'T')),
|
||
|
|
|
||
|
|
-- Leetify经济数据(仅leetify)
|
||
|
|
start_money INTEGER,
|
||
|
|
equipment_value INTEGER,
|
||
|
|
main_weapon TEXT,
|
||
|
|
has_helmet BOOLEAN,
|
||
|
|
has_defuser BOOLEAN,
|
||
|
|
has_zeus BOOLEAN,
|
||
|
|
round_performance_score REAL,
|
||
|
|
|
||
|
|
-- Classic装备快照(仅classic, JSON存储)
|
||
|
|
equipment_snapshot_json TEXT, -- Classic的equiped字段序列化
|
||
|
|
|
||
|
|
-- 数据源标记
|
||
|
|
data_source_type TEXT CHECK(data_source_type IN ('leetify', 'classic', 'unknown')),
|
||
|
|
|
||
|
|
PRIMARY KEY (match_id, round_num, steam_id_64),
|
||
|
|
FOREIGN KEY (match_id, round_num) REFERENCES fact_rounds(match_id, round_num) ON DELETE CASCADE
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3.2 Force Buy修复
|
||
|
|
|
||
|
|
在`fact_round_player_economy`表中确保:
|
||
|
|
- `start_money`和`equipment_value`字段类型为INTEGER
|
||
|
|
- 处理器中正确解析leetify的`bron_equipment`和`player_bron_crash`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 四、L2 Processor模块化设计
|
||
|
|
|
||
|
|
### 4.1 架构模式
|
||
|
|
|
||
|
|
```
|
||
|
|
L2_Builder.py (主控制器, ~300行)
|
||
|
|
↓ 调用
|
||
|
|
processors/
|
||
|
|
├── match_processor.py # 处理fact_matches, fact_match_teams
|
||
|
|
├── player_processor.py # 处理dim_players, fact_match_players
|
||
|
|
├── round_processor.py # 统一调度round数据处理
|
||
|
|
│ ├── 内部调用 economy_processor
|
||
|
|
│ ├── 内部调用 event_processor
|
||
|
|
│ └── 内部调用 spatial_processor
|
||
|
|
├── economy_processor.py # 专门处理leetify经济数据
|
||
|
|
├── event_processor.py # 处理kill/bomb事件
|
||
|
|
└── spatial_processor.py # 处理classic坐标数据
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4.2 Processor接口规范
|
||
|
|
|
||
|
|
每个processor模块提供标准接口:
|
||
|
|
```python
|
||
|
|
class XxxProcessor:
|
||
|
|
@staticmethod
|
||
|
|
def process(match_data: MatchData, conn: sqlite3.Connection) -> bool:
|
||
|
|
"""
|
||
|
|
Args:
|
||
|
|
match_data: 统一的MatchData对象(包含所有原始数据)
|
||
|
|
conn: L2数据库连接
|
||
|
|
Returns:
|
||
|
|
bool: 处理成功返回True
|
||
|
|
"""
|
||
|
|
pass
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4.3 核心Processor功能分配
|
||
|
|
|
||
|
|
#### match_processor.py
|
||
|
|
- **职责**: 处理比赛主表和队伍信息
|
||
|
|
- **输入**: `MatchData.data_match`的main字段
|
||
|
|
- **输出**: 写入`fact_matches`, `fact_match_teams`
|
||
|
|
- **关键逻辑**:
|
||
|
|
- 提取main字段的40+基础信息
|
||
|
|
- 解析group1/group2队伍信息
|
||
|
|
- 存储treat_info_raw等原始JSON
|
||
|
|
- 设置data_source_type标记
|
||
|
|
|
||
|
|
#### player_processor.py
|
||
|
|
- **职责**: 处理玩家维度表和比赛统计
|
||
|
|
- **输入**: `MatchData.data_match`的group_1/group_2玩家列表, data_vip
|
||
|
|
- **输出**: 写入`dim_players`, `fact_match_players`, `fact_match_players_t`, `fact_match_players_ct`
|
||
|
|
- **关键逻辑**:
|
||
|
|
- 合并fight/fight_t/fight_ct三个字段
|
||
|
|
- 处理VIP+高级统计(kast, awp_kill等)
|
||
|
|
- 计算utility usage(从round details累加)
|
||
|
|
- UPSERT dim_players(避免重复)
|
||
|
|
|
||
|
|
#### round_processor.py (调度器)
|
||
|
|
- **职责**: 作为Round数据的统一入口,根据data_source_type分发
|
||
|
|
- **输入**: `MatchData.data_leetify`或`MatchData.data_round_list`
|
||
|
|
- **输出**: 调度其他processor处理
|
||
|
|
- **关键逻辑**:
|
||
|
|
```python
|
||
|
|
if match_data.data_source_type == 'leetify':
|
||
|
|
economy_processor.process_leetify(...)
|
||
|
|
event_processor.process_leetify_events(...)
|
||
|
|
elif match_data.data_source_type == 'classic':
|
||
|
|
event_processor.process_classic_events(...)
|
||
|
|
spatial_processor.process_positions(...)
|
||
|
|
```
|
||
|
|
|
||
|
|
#### economy_processor.py
|
||
|
|
- **职责**: 处理leetify的经济数据
|
||
|
|
- **输入**: `data_leetify['leetify_data']['round_stat']`
|
||
|
|
- **输出**: 写入`fact_round_player_economy`, `fact_rounds`的经济字段
|
||
|
|
- **关键逻辑**:
|
||
|
|
- 解析bron_equipment(装备列表)
|
||
|
|
- 解析player_bron_crash(起始金钱)
|
||
|
|
- 计算equipment_value
|
||
|
|
|
||
|
|
#### event_processor.py
|
||
|
|
- **职责**: 处理击杀/炸弹事件
|
||
|
|
- **输入**: leetify的show_event或classic的all_kill
|
||
|
|
- **输出**: 写入`fact_round_events`
|
||
|
|
- **关键逻辑**:
|
||
|
|
- 生成event_id(UUID)
|
||
|
|
- 区分event_type: kill/bomb_plant/bomb_defuse
|
||
|
|
- leetify: 提取killer_score_change, victim_score_change, twin变化
|
||
|
|
- classic: 提取attacker/victim的pos(x,y,z)
|
||
|
|
|
||
|
|
#### spatial_processor.py
|
||
|
|
- **职责**: 处理classic的空间数据
|
||
|
|
- **输入**: `data_round_list['round_list']`的pos字段
|
||
|
|
- **输出**: 更新`fact_round_events`的坐标字段
|
||
|
|
- **关键逻辑**:
|
||
|
|
- 提取attacker.pos.x/y/z
|
||
|
|
- 提取victim.pos.x/y/z
|
||
|
|
- 为未来热力图/战术板分析做准备
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 五、L3 Processor模块化设计
|
||
|
|
|
||
|
|
### 5.1 现状与问题
|
||
|
|
|
||
|
|
**现状**:
|
||
|
|
- L3_Builder.py目前委托给`web.services.feature_service.FeatureService.rebuild_all_features()`
|
||
|
|
- feature_service.py包含2257行代码,混杂大量特征计算逻辑
|
||
|
|
|
||
|
|
**目标**:
|
||
|
|
- 将特征计算逻辑完全迁移到`database/L3/processors/`
|
||
|
|
- feature_service仅保留查询和缓存逻辑
|
||
|
|
- 按FeatureRDD.md的6大维度+基础特征建立processor
|
||
|
|
|
||
|
|
### 5.2 Processor模块划分
|
||
|
|
|
||
|
|
#### basic_processor.py
|
||
|
|
- **职责**: 计算基础统计特征(0-42个指标)
|
||
|
|
- **数据源**: `fact_match_players`
|
||
|
|
- **特征示例**:
|
||
|
|
- `basic_avg_rating`: AVG(rating)
|
||
|
|
- `basic_avg_kd`: AVG(kills/deaths)
|
||
|
|
- `basic_headshot_rate`: SUM(headshot_count)/SUM(kills)
|
||
|
|
- `basic_first_kill_rate`: SUM(first_kill)/(SUM(first_kill)+SUM(first_death))
|
||
|
|
- **实现方式**: SQL聚合 + 简单Python计算
|
||
|
|
|
||
|
|
#### sta_processor.py (稳定性时间序列)
|
||
|
|
- **职责**: 计算STA维度特征
|
||
|
|
- **数据源**: `fact_match_players`, `fact_matches`(按start_time排序)
|
||
|
|
- **特征示例**:
|
||
|
|
- `sta_last_30_rating`: 近30局平均rating
|
||
|
|
- `sta_win_rating`, `sta_loss_rating`: 胜/败局分组rating
|
||
|
|
- `sta_rating_volatility`: STDDEV(last 10 ratings)
|
||
|
|
- `sta_fatigue_decay`: 同日后期比赛vs前期比赛性能下降
|
||
|
|
- **实现方式**: pandas时间序列分析
|
||
|
|
|
||
|
|
#### bat_processor.py (对抗能力)
|
||
|
|
- **职责**: 计算BAT维度特征
|
||
|
|
- **数据源**: `fact_round_events`(击杀关系网络), `fact_match_players`
|
||
|
|
- **特征示例**:
|
||
|
|
- `bat_kd_diff_high_elo`: 对最高elo对手的KD差
|
||
|
|
- `bat_avg_duel_win_rate`: 1v1对决胜率
|
||
|
|
- `bat_win_rate_close/mid/far`: 不同距离对枪胜率(需classic坐标)
|
||
|
|
- **实现方式**: 对手关系矩阵构建 + 条件聚合
|
||
|
|
|
||
|
|
#### hps_processor.py (高压场景)
|
||
|
|
- **职责**: 计算HPS维度特征
|
||
|
|
- **数据源**: `fact_rounds`, `fact_round_events`, `fact_match_players`
|
||
|
|
- **特征示例**:
|
||
|
|
- `hps_clutch_win_rate_1v1/1v2/1v3_plus`: 残局胜率
|
||
|
|
- `hps_match_point_win_rate`: 赛点表现
|
||
|
|
- `hps_pressure_entry_rate`: 连败后首杀率
|
||
|
|
- `hps_comeback_kd_diff`: 翻盘时KD提升
|
||
|
|
- **实现方式**: 识别特殊场景(赛点/连败/残局) + 条件统计
|
||
|
|
|
||
|
|
#### ptl_processor.py (手枪局)
|
||
|
|
- **职责**: 计算PTL维度特征
|
||
|
|
- **数据源**: `fact_rounds`(round_num=1,13), `fact_round_events`
|
||
|
|
- **特征示例**:
|
||
|
|
- `ptl_pistol_win_rate`: 手枪局胜率
|
||
|
|
- `ptl_pistol_kd`: 手枪局KD
|
||
|
|
- `ptl_pistol_multikills`: 手枪局多杀次数
|
||
|
|
- `ptl_pistol_util_efficiency`: 道具辅助击杀率
|
||
|
|
- **实现方式**: 过滤round_num + 武器类型判断
|
||
|
|
|
||
|
|
#### side_processor.py (T/CT阵营)
|
||
|
|
- **职责**: 计算T/CT维度特征
|
||
|
|
- **数据源**: `fact_match_players_t`, `fact_match_players_ct`
|
||
|
|
- **特征示例**:
|
||
|
|
- `side_rating_t`, `side_rating_ct`: 分阵营rating
|
||
|
|
- `side_kd_diff_ct_t`: CT-T的KD差
|
||
|
|
- `side_first_kill_rate_t/ct`: 分阵营首杀率
|
||
|
|
- `side_plants_t`, `side_defuses_ct`: 下包/拆包数
|
||
|
|
- **实现方式**: 分表聚合 + 差值计算
|
||
|
|
|
||
|
|
#### util_processor.py (道具使用)
|
||
|
|
- **职责**: 计算UTIL维度特征
|
||
|
|
- **数据源**: `fact_match_players`(util_xxx_usage字段)
|
||
|
|
- **特征示例**:
|
||
|
|
- `util_avg_nade_dmg`: 平均手雷伤害
|
||
|
|
- `util_avg_flash_time`: 平均致盲时长
|
||
|
|
- `util_usage_rate`: 道具使用频率
|
||
|
|
- **实现方式**: 简单聚合
|
||
|
|
|
||
|
|
#### eco_processor.py (经济效率)
|
||
|
|
- **职责**: 计算ECO维度特征
|
||
|
|
- **数据源**: `fact_round_player_economy`(仅leetify数据)
|
||
|
|
- **特征示例**:
|
||
|
|
- `eco_avg_damage_per_1k`: 每1000元造成的伤害
|
||
|
|
- `eco_rating_eco_rounds`: ECO局rating
|
||
|
|
- `eco_kd_ratio`: 经济局KD
|
||
|
|
- **实现方式**: 经济分段 + 性能关联
|
||
|
|
- **注意**: 仅leetify数据源可用
|
||
|
|
|
||
|
|
#### pace_processor.py (节奏侵略性)
|
||
|
|
- **职责**: 计算PACE维度特征
|
||
|
|
- **数据源**: `fact_round_events`(event_time)
|
||
|
|
- **特征示例**:
|
||
|
|
- `pace_avg_time_to_first_contact`: 平均首次交火时间
|
||
|
|
- `pace_opening_kill_time`: 开局击杀速度
|
||
|
|
- `pace_trade_kill_rate`: 补枪速率
|
||
|
|
- `rd_phase_kill_early/mid/late_share`: 早/中/后期击杀占比
|
||
|
|
- **实现方式**: 事件时间戳分析
|
||
|
|
|
||
|
|
### 5.3 L3_Builder重构结构
|
||
|
|
|
||
|
|
```python
|
||
|
|
# L3_Builder.py (瘦身至~150行)
|
||
|
|
from database.L3.processors import (
|
||
|
|
basic_processor,
|
||
|
|
sta_processor,
|
||
|
|
bat_processor,
|
||
|
|
hps_processor,
|
||
|
|
ptl_processor,
|
||
|
|
side_processor,
|
||
|
|
util_processor,
|
||
|
|
eco_processor,
|
||
|
|
pace_processor
|
||
|
|
)
|
||
|
|
|
||
|
|
def rebuild_all_features():
|
||
|
|
conn_l2 = sqlite3.connect(L2_DB_PATH)
|
||
|
|
conn_l3 = sqlite3.connect(L3_DB_PATH)
|
||
|
|
|
||
|
|
players = get_all_players(conn_l2)
|
||
|
|
|
||
|
|
for player in players:
|
||
|
|
features = {}
|
||
|
|
|
||
|
|
# 调用各processor
|
||
|
|
features.update(basic_processor.calculate(player, conn_l2))
|
||
|
|
features.update(sta_processor.calculate(player, conn_l2))
|
||
|
|
features.update(bat_processor.calculate(player, conn_l2))
|
||
|
|
features.update(hps_processor.calculate(player, conn_l2))
|
||
|
|
features.update(ptl_processor.calculate(player, conn_l2))
|
||
|
|
features.update(side_processor.calculate(player, conn_l2))
|
||
|
|
features.update(util_processor.calculate(player, conn_l2))
|
||
|
|
features.update(eco_processor.calculate(player, conn_l2))
|
||
|
|
features.update(pace_processor.calculate(player, conn_l2))
|
||
|
|
|
||
|
|
# 写入L3
|
||
|
|
upsert_player_features(conn_l3, player['steam_id_64'], features)
|
||
|
|
|
||
|
|
conn_l2.close()
|
||
|
|
conn_l3.close()
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 六、Web Services解耦
|
||
|
|
|
||
|
|
### 6.1 迁移策略
|
||
|
|
|
||
|
|
**原则**: Web层只做查询和缓存,不做计算
|
||
|
|
|
||
|
|
#### feature_service.py重构
|
||
|
|
- **保留功能**:
|
||
|
|
- `get_player_features(steam_id)`: 从L3查询
|
||
|
|
- `get_players_list()`: 分页查询
|
||
|
|
- **移除功能**(迁移到L3 processors):
|
||
|
|
- `rebuild_all_features()` → L3_Builder.py
|
||
|
|
- 所有`_calculate_xxx()`方法 → L3/processors/xxx_processor.py
|
||
|
|
|
||
|
|
#### stats_service.py重构
|
||
|
|
- **保留功能**:
|
||
|
|
- `get_player_basic_stats()`: 简单查询L2
|
||
|
|
- `get_match_details()`: 查询比赛详情
|
||
|
|
- **优化功能**:
|
||
|
|
- `get_team_stats_summary()`: 改为查询L2 VIEW(新建聚合视图)
|
||
|
|
- 复杂聚合逻辑移至L2 processors或创建数据库VIEW
|
||
|
|
|
||
|
|
### 6.2 新建L2 VIEW
|
||
|
|
|
||
|
|
在`database/L2/schema.sql`中新增:
|
||
|
|
|
||
|
|
```sql
|
||
|
|
-- 玩家全场景统计视图
|
||
|
|
CREATE VIEW IF NOT EXISTS v_player_all_stats AS
|
||
|
|
SELECT
|
||
|
|
steam_id_64,
|
||
|
|
COUNT(DISTINCT match_id) as total_matches,
|
||
|
|
AVG(rating) as avg_rating,
|
||
|
|
AVG(kd_ratio) as avg_kd,
|
||
|
|
AVG(kast) as avg_kast,
|
||
|
|
SUM(kills) as total_kills,
|
||
|
|
SUM(deaths) as total_deaths,
|
||
|
|
SUM(assists) as total_assists,
|
||
|
|
SUM(mvp_count) as total_mvps
|
||
|
|
FROM fact_match_players
|
||
|
|
GROUP BY steam_id_64;
|
||
|
|
|
||
|
|
-- 地图维度统计视图
|
||
|
|
CREATE VIEW IF NOT EXISTS v_map_performance AS
|
||
|
|
SELECT
|
||
|
|
fmp.steam_id_64,
|
||
|
|
fm.map_name,
|
||
|
|
COUNT(*) as matches_on_map,
|
||
|
|
AVG(fmp.rating) as avg_rating,
|
||
|
|
AVG(fmp.kd_ratio) as avg_kd,
|
||
|
|
SUM(CASE WHEN fmp.is_win THEN 1 ELSE 0 END) * 1.0 / COUNT(*) as win_rate
|
||
|
|
FROM fact_match_players fmp
|
||
|
|
JOIN fact_matches fm ON fmp.match_id = fm.match_id
|
||
|
|
GROUP BY fmp.steam_id_64, fm.map_name;
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 七、数据流与交叉引用
|
||
|
|
|
||
|
|
### 7.1 数据流示意图
|
||
|
|
|
||
|
|
```
|
||
|
|
原始数据(output_arena/*/iframe_network.json)
|
||
|
|
↓
|
||
|
|
【L1层】L1.db: raw_iframe_network (1张表)
|
||
|
|
└─ match_id (PK)
|
||
|
|
└─ content (JSON全文)
|
||
|
|
↓
|
||
|
|
【L2层】L2.db: 9张核心表
|
||
|
|
├─ dim_players (玩家维度, 75个字段)
|
||
|
|
├─ dim_maps (地图维度)
|
||
|
|
├─ fact_matches (比赛主表, 50+字段)
|
||
|
|
├─ fact_match_teams (队伍信息)
|
||
|
|
├─ fact_match_players (玩家比赛统计, 100+字段)
|
||
|
|
├─ fact_match_players_t/ct (分阵营统计)
|
||
|
|
├─ fact_rounds (回合主表, 统一Schema)
|
||
|
|
├─ fact_round_events (事件流, 统一Schema)
|
||
|
|
└─ fact_round_player_economy (经济快照, 统一Schema)
|
||
|
|
↓
|
||
|
|
【L3层】L3.db: 特征集市
|
||
|
|
├─ dm_player_features (玩家画像, 150+特征)
|
||
|
|
└─ fact_match_features (单场特征快照, 可选)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 7.2 JSON→L2字段映射表
|
||
|
|
|
||
|
|
| JSON路径 | L2表 | L2字段 | 数据源 | 处理器 |
|
||
|
|
|---------|------|--------|-------|--------|
|
||
|
|
| `data.main.match_code` | fact_matches | match_code | 公共 | match_processor |
|
||
|
|
| `data.main.map` | fact_matches | map_name | 公共 | match_processor |
|
||
|
|
| `data.group_1[].fight.rating` | fact_match_players | rating | 公共 | player_processor |
|
||
|
|
| `data.group_1[].fight_t.kill` | fact_match_players_t | kills | 公共 | player_processor |
|
||
|
|
| `data.<steamid>.kast` | fact_match_players | kast | VIP | player_processor |
|
||
|
|
| `leetify_data.round_stat[].t_money_group` | fact_rounds | t_money_start | leetify | economy_processor |
|
||
|
|
| `leetify_data.round_stat[].bron_equipment` | fact_round_player_economy | equipment_value | leetify | economy_processor |
|
||
|
|
| `leetify_data.round_stat[].show_event[].kill_event` | fact_round_events | weapon, is_headshot | leetify | event_processor |
|
||
|
|
| `leetify_data.round_stat[].show_event[].killer_score_change` | fact_round_events | score_change_attacker | leetify | event_processor |
|
||
|
|
| `round_list[].all_kill[].attacker.pos.x` | fact_round_events | attacker_pos_x | classic | spatial_processor |
|
||
|
|
| `round_list[].c4_event[]` | fact_round_events | event_type='bomb_plant' | classic | event_processor |
|
||
|
|
|
||
|
|
### 7.3 L2→L3特征映射表
|
||
|
|
|
||
|
|
| L3特征字段 | 数据源(L2表) | 计算逻辑 | 处理器 |
|
||
|
|
|-----------|-------------|---------|--------|
|
||
|
|
| `basic_avg_rating` | fact_match_players.rating | AVG() | basic_processor |
|
||
|
|
| `basic_headshot_rate` | fact_match_players | SUM(headshot_count)/SUM(kills) | basic_processor |
|
||
|
|
| `sta_last_30_rating` | fact_match_players + fact_matches.start_time | ORDER BY start_time LIMIT 30 | sta_processor |
|
||
|
|
| `sta_rating_volatility` | fact_match_players.rating | STDDEV(last_10_ratings) | sta_processor |
|
||
|
|
| `bat_kd_diff_high_elo` | fact_match_players + fact_match_teams.group_origin_elo | 对最高elo对手的击杀-被杀 | bat_processor |
|
||
|
|
| `hps_clutch_win_rate_1v1` | fact_round_events + fact_rounds.winner_side | 识别1v1场景+胜负统计 | hps_processor |
|
||
|
|
| `ptl_pistol_win_rate` | fact_rounds(round_num=1,13) + fact_match_players | 手枪局胜率 | ptl_processor |
|
||
|
|
| `side_kd_diff_ct_t` | fact_match_players_ct.kd_ratio - fact_match_players_t.kd_ratio | 阵营KD差 | side_processor |
|
||
|
|
| `eco_avg_damage_per_1k` | fact_round_player_economy.equipment_value + fact_match_players.damage_total | damage/equipment_value*1000 | eco_processor |
|
||
|
|
| `pace_opening_kill_time` | fact_round_events.event_time (first kill) | AVG(首次击杀时间) | pace_processor |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 八、实施步骤
|
||
|
|
|
||
|
|
### Phase 1: 目录与命名标准化 (1-2小时)
|
||
|
|
1. **重命名数据库文件**:
|
||
|
|
- `database/L1A/L1A.sqlite` → `database/L1/L1.db`
|
||
|
|
- `database/L2/L2_Main.sqlite` → `database/L2/L2.db`
|
||
|
|
- `database/L3/L3_Features.sqlite` → `database/L3/L3.db`
|
||
|
|
2. **重命名Builder脚本**:
|
||
|
|
- `L1A_Builder.py` → `L1_Builder.py`
|
||
|
|
3. **更新所有引用路径**:
|
||
|
|
- `web/config.py`
|
||
|
|
- `Force_Rebuild.py`
|
||
|
|
- 各Builder脚本内部路径
|
||
|
|
4. **创建processor目录结构**:
|
||
|
|
```bash
|
||
|
|
mkdir database/L2/processors
|
||
|
|
mkdir database/L3/processors
|
||
|
|
touch database/L2/processors/__init__.py
|
||
|
|
touch database/L3/processors/__init__.py
|
||
|
|
```
|
||
|
|
5. **创建L1B预留目录**:
|
||
|
|
- 创建`database/L1B/README.md`说明用途
|
||
|
|
|
||
|
|
### Phase 2: L2 Schema优化 (2-3小时)
|
||
|
|
1. **修改`database/L2/schema.sql`**:
|
||
|
|
- 更新`fact_rounds`增加leetify/classic差异字段
|
||
|
|
- 更新`fact_round_events`增加坐标和评分字段
|
||
|
|
- 更新`fact_round_player_economy`增加data_source_type和equipment_snapshot_json
|
||
|
|
- 新增VIEW: `v_player_all_stats`, `v_map_performance`
|
||
|
|
2. **验证Schema兼容性**:
|
||
|
|
- 创建测试数据库执行新Schema
|
||
|
|
- 确认外键约束和CHECK约束正常
|
||
|
|
|
||
|
|
### Phase 3: L2 Processor开发 (8-10小时)
|
||
|
|
按依赖顺序开发:
|
||
|
|
1. **match_processor.py** (1h):
|
||
|
|
- 从L2_Builder.py提取`_parse_base_info()`逻辑
|
||
|
|
- 实现`process(match_data, conn)`接口
|
||
|
|
2. **player_processor.py** (2h):
|
||
|
|
- 提取`_parse_players_base()`, `_parse_players_vip()`
|
||
|
|
- 合并fight/fight_t/fight_ct
|
||
|
|
- 处理dim_players UPSERT
|
||
|
|
3. **round_processor.py** (0.5h):
|
||
|
|
- 实现数据源分发逻辑
|
||
|
|
4. **economy_processor.py** (2h):
|
||
|
|
- 解析leetify bron_equipment
|
||
|
|
- 计算equipment_value
|
||
|
|
- 写入fact_round_player_economy
|
||
|
|
5. **event_processor.py** (2h):
|
||
|
|
- 统一处理leetify和classic的kill事件
|
||
|
|
- 提取bomb_plant/defuse事件
|
||
|
|
- 生成UUID event_id
|
||
|
|
6. **spatial_processor.py** (1h):
|
||
|
|
- 提取classic的xyz坐标
|
||
|
|
- 关联到fact_round_events
|
||
|
|
7. **L2_Builder.py重构** (1.5h):
|
||
|
|
- 瘦身至~300行
|
||
|
|
- 调用各processor
|
||
|
|
- 实现错误处理和日志
|
||
|
|
|
||
|
|
### Phase 4: L3 Processor开发 (12-15小时)
|
||
|
|
1. **basic_processor.py** (1.5h):
|
||
|
|
- 实现42个基础特征计算
|
||
|
|
- SQL聚合+pandas处理
|
||
|
|
2. **sta_processor.py** (2h):
|
||
|
|
- 时间序列分析
|
||
|
|
- 滑动窗口计算
|
||
|
|
3. **bat_processor.py** (2.5h):
|
||
|
|
- 对手关系网络构建
|
||
|
|
- 对决矩阵分析
|
||
|
|
4. **hps_processor.py** (2.5h):
|
||
|
|
- 场景识别(残局/赛点/连败)
|
||
|
|
- 条件统计
|
||
|
|
5. **ptl_processor.py** (1h):
|
||
|
|
- 手枪局过滤
|
||
|
|
- 武器类型判断
|
||
|
|
6. **side_processor.py** (1.5h):
|
||
|
|
- T/CT分表聚合
|
||
|
|
- 差值计算
|
||
|
|
7. **util_processor.py** (0.5h):
|
||
|
|
- 简单聚合
|
||
|
|
8. **eco_processor.py** (1h):
|
||
|
|
- 经济分段逻辑
|
||
|
|
- 性能关联
|
||
|
|
9. **pace_processor.py** (1.5h):
|
||
|
|
- 事件时间戳分析
|
||
|
|
- 时间窗口划分
|
||
|
|
10. **L3_Builder.py重构** (1h):
|
||
|
|
- 调度各processor
|
||
|
|
- 批量更新dm_player_features
|
||
|
|
|
||
|
|
### Phase 5: Web Services解耦 (4-5小时)
|
||
|
|
1. **feature_service.py瘦身** (2h):
|
||
|
|
- 移除所有计算逻辑
|
||
|
|
- 保留查询功能
|
||
|
|
- 更新单元测试
|
||
|
|
2. **stats_service.py优化** (1.5h):
|
||
|
|
- 改用L2 VIEW查询
|
||
|
|
- 简化聚合逻辑
|
||
|
|
3. **路由层适配** (1h):
|
||
|
|
- 更新`web/routes/players.py`等
|
||
|
|
- 确认profile页面正常渲染
|
||
|
|
4. **缓存策略** (0.5h):
|
||
|
|
- 考虑L3特征的缓存机制
|
||
|
|
|
||
|
|
### Phase 6: 测试与验证 (3-4小时)
|
||
|
|
1. **单元测试**:
|
||
|
|
- 为每个processor编写测试用例
|
||
|
|
- Mock数据验证输出
|
||
|
|
2. **集成测试**:
|
||
|
|
- 完整运行L1→L2→L3 pipeline
|
||
|
|
- 对比重构前后特征值
|
||
|
|
3. **数据质量校验**:
|
||
|
|
- 运行`verify_L2.py`
|
||
|
|
- 检查字段覆盖率
|
||
|
|
4. **性能测试**:
|
||
|
|
- 测量pipeline耗时
|
||
|
|
- 优化SQL查询
|
||
|
|
|
||
|
|
### Phase 7: 文档与交付 (2小时)
|
||
|
|
1. **更新README.md**:
|
||
|
|
- 新的目录结构
|
||
|
|
- Processor模块说明
|
||
|
|
2. **编写Processor README**:
|
||
|
|
- `database/L2/processors/README.md`
|
||
|
|
- `database/L3/processors/README.md`
|
||
|
|
3. **API文档更新**:
|
||
|
|
- web/services API变更说明
|
||
|
|
4. **Schema映射表**:
|
||
|
|
- 生成完整的JSON→L2→L3字段映射Excel
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 九、风险与注意事项
|
||
|
|
|
||
|
|
### 9.1 数据一致性
|
||
|
|
- **风险**: 重构过程中Schema变化可能导致旧数据不兼容
|
||
|
|
- **缓解**:
|
||
|
|
- 使用`Force_Rebuild.py`全量重建
|
||
|
|
- 保留L1原始数据,随时可回溯
|
||
|
|
|
||
|
|
### 9.2 性能影响
|
||
|
|
- **风险**: Processor模块化可能增加函数调用开销
|
||
|
|
- **缓解**:
|
||
|
|
- 批量处理(一次处理多个match)
|
||
|
|
- 使用executemany()优化INSERT
|
||
|
|
- 关键路径使用SQL聚合而非Python循环
|
||
|
|
|
||
|
|
### 9.3 Leetify vs Classic覆盖率
|
||
|
|
- **风险**: 部分特征(如eco, spatial)仅单数据源可用
|
||
|
|
- **缓解**:
|
||
|
|
- 在processor中判断data_source_type
|
||
|
|
- 不可用特征标记为NULL
|
||
|
|
- 文档中明确标注依赖
|
||
|
|
|
||
|
|
### 9.4 Web服务中断
|
||
|
|
- **风险**: feature_service重构可能影响线上功能
|
||
|
|
- **缓解**:
|
||
|
|
- 先完成L2/L3 processor,再改web层
|
||
|
|
- 使用特性开关(feature flag)
|
||
|
|
- 灰度发布
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 十、预期成果
|
||
|
|
|
||
|
|
### 10.1 目录结构清晰
|
||
|
|
```
|
||
|
|
database/
|
||
|
|
├── L1/ # 统一命名
|
||
|
|
├── L1B/ # 预留清晰
|
||
|
|
├── L2/ # 模块化processors
|
||
|
|
├── L3/ # 模块化processors
|
||
|
|
└── Force_Rebuild.py
|
||
|
|
```
|
||
|
|
|
||
|
|
### 10.2 Schema完备性
|
||
|
|
- Round数据统一Schema,支持leetify和classic差异字段
|
||
|
|
- 清晰的data_source_type标记
|
||
|
|
- 完整的外键和约束
|
||
|
|
|
||
|
|
### 10.3 代码可维护性
|
||
|
|
- L2_Builder.py从1470行降至~300行
|
||
|
|
- L3_Builder.py从委托web服务改为调度本地processors
|
||
|
|
- web/services从4000+行降至~1000行
|
||
|
|
|
||
|
|
### 10.4 可扩展性
|
||
|
|
- 新增特征只需添加processor模块
|
||
|
|
- 新增数据源只需扩展Schema和processor
|
||
|
|
- L1B预留未来Demo解析管道
|
||
|
|
|
||
|
|
### 10.5 文档完整性
|
||
|
|
- JSON→L2→L3完整映射表
|
||
|
|
- 每个processor的功能和依赖说明
|
||
|
|
- 数据流示意图
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 十一、后续优化方向
|
||
|
|
|
||
|
|
### 11.1 性能优化
|
||
|
|
- 考虑L2/L3的materialized view(SQLite不原生支持,可手动实现)
|
||
|
|
- 增量更新机制(当前为全量重建)
|
||
|
|
- 并行处理多个match
|
||
|
|
|
||
|
|
### 11.2 功能扩展
|
||
|
|
- L1B层完整设计(Demo解析)
|
||
|
|
- 更多L3特征(FeatureRDD.md中的Phase 5内容)
|
||
|
|
- 实时特征更新API
|
||
|
|
|
||
|
|
### 11.3 工具增强
|
||
|
|
- 可视化Schema关系图
|
||
|
|
- Processor依赖图生成
|
||
|
|
- 自动化数据质量报告
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 总结
|
||
|
|
|
||
|
|
本计划提供了从目录结构、Schema设计、代码重构到测试交付的完整路径。核心目标是:
|
||
|
|
1. **标准化**: 统一命名和目录结构
|
||
|
|
2. **模块化**: 按功能域拆分processor
|
||
|
|
3. **解耦**: 将计算逻辑从web层下沉到database层
|
||
|
|
4. **可扩展**: 为未来数据源和特征预留扩展点
|
||
|
|
|
||
|
|
预计总工时: **35-40小时**,可分阶段实施,每个Phase独立可验证。
|