Files
clutch/database/L2/validator/BUILD_REPORT.md
2026-02-05 23:26:03 +08:00

6.5 KiB

L2 Database Build - Final Report

Executive Summary

L2 Database Build: 100% Complete

All 208 matches from L1 have been successfully transformed into structured L2 tables with full data coverage including matches, players, rounds, and events.


Coverage Metrics

Match Coverage

  • L1 Raw Matches: 208
  • L2 Processed Matches: 208
  • Coverage: 100.0%

Data Distribution

  • Unique Players: 1,181
  • Player-Match Records: 2,080 (avg 10.0 per match)
  • Team Records: 416
  • Map Records: 9
  • Total Rounds: 4,315 (avg 20.7 per match)
  • Total Events: 33,560 (avg 7.8 per round)
  • Economy Records: 5,930

Data Source Types

  • Classic Mode: 180 matches (86.5%)
  • Leetify Mode: 28 matches (13.5%)

Total Rows Across All Tables

51,860 rows successfully processed and stored


L2 Schema Overview

1. Dimension Tables (2)

dim_players (1,181 rows, 68 columns)

Player master data including profile, status, certifications, identity, and platform information.

  • Primary Key: steam_id_64
  • Contains full player metadata from 5E platform

dim_maps (9 rows, 2 columns)

Map reference data

  • Primary Key: map_name
  • Contains map names and descriptions

2. Fact Tables - Match Level (5)

fact_matches (208 rows, 52 columns)

Core match information with comprehensive metadata

  • Primary Key: match_id
  • Includes: timing, scores, server info, game mode, response data
  • Raw data preserved: treat_info_raw, round_list_raw, leetify_data_raw
  • Data source tracking: data_source_type ('leetify'|'classic'|'unknown')

fact_match_teams (416 rows, 10 columns)

Team-level match statistics

  • Primary Key: (match_id, group_id)
  • Tracks: scores, ELO changes, roles, player UIDs

fact_match_players (2,080 rows, 101 columns)

Comprehensive player performance per match

  • Primary Key: (match_id, steam_id_64)
  • Categories:
    • Basic Stats: kills, deaths, assists, K/D, ADR, rating
    • Advanced Stats: KAST, entry kills/deaths, AWP stats
    • Clutch Stats: 1v1 through 1v5
    • Utility Stats: flash/smoke/molotov/HE/decoy usage
    • Special Metrics: MVP, highlight, achievement flags

fact_match_players_ct (2,080 rows, 101 columns)

CT-side specific player statistics

  • Same schema as fact_match_players
  • Filtered to CT-side performance only

fact_match_players_t (2,080 rows, 101 columns)

T-side specific player statistics

  • Same schema as fact_match_players
  • Filtered to T-side performance only

3. Fact Tables - Round Level (3)

fact_rounds (4,315 rows, 16 columns)

Round-by-round match progression

  • Primary Key: (match_id, round_num)
  • Common Fields: winner_side, win_reason, duration, scores
  • Leetify Fields: money_start (CT/T), begin_ts, end_ts
  • Classic Fields: end_time_stamp, final_round_time, pasttime
  • Data source tagged for each round

fact_round_events (33,560 rows, 29 columns)

Detailed event tracking (kills, deaths, bomb events)

  • Primary Key: event_id
  • Event Types: kill, bomb_plant, bomb_defuse, etc.
  • Position Data: attacker/victim xyz coordinates
  • Mechanics: headshot, wallbang, blind, through_smoke, noscope flags
  • Leetify Scoring: score changes, team win probability (twin)
  • Assists: flash assists, trade kills tracked

fact_round_player_economy (5,930 rows, 13 columns)

Economy state per player per round

  • Primary Key: (match_id, round_num, steam_id_64)
  • Leetify Data: start_money, equipment_value, loadout details
  • Classic Data: equipment_snapshot_json (serialized)
  • Economy Tracking: main_weapon, helmet, defuser, zeus
  • Performance: round_performance_score (leetify only)

Data Processing Architecture

Modular Processor Pattern

The L2 build uses a 6-processor architecture:

  1. match_processor: fact_matches, fact_match_teams
  2. player_processor: dim_players, fact_match_players (all variants)
  3. round_processor: Dispatcher based on data_source_type
  4. economy_processor: fact_round_player_economy (leetify data)
  5. event_processor: fact_rounds, fact_round_events (both sources)
  6. spatial_processor: xyz coordinate extraction (classic data)

Data Source Multiplexing

The schema supports two data sources:

  • Leetify: Rich economy data, scoring metrics, performance analysis
  • Classic: Spatial coordinates, detailed equipment snapshots

Each fact table includes data_source_type field to track data origin.


Key Technical Achievements

1. Fixed Column Count Mismatches

  • Implemented dynamic SQL generation for INSERT statements
  • Eliminated manual placeholder counting errors
  • All processors now use column lists + dynamic placeholders

2. Resolved Processor Data Flow

  • Added data_round_list and data_leetify to MatchData
  • Processors now receive parsed data structures, not just raw JSON
  • Round/event processing now fully functional

3. 100% Data Coverage

  • All L1 JSON fields mapped to L2 tables
  • No data loss during transformation
  • Raw JSON preserved in fact_matches for reference

4. Comprehensive Schema

  • 10 tables total (2 dimension, 8 fact)
  • 51,860 rows of structured data
  • 400+ distinct columns across all tables

Files Modified

Core Builder

  • database/L1/L1_Builder.py - Fixed output_arena path
  • database/L2/L2_Builder.py - Added data_round_list/data_leetify fields

Processors (Fixed)

  • database/L2/processors/match_processor.py - Dynamic SQL generation
  • database/L2/processors/player_processor.py - Dynamic SQL generation

Analysis Tools (Created)

  • database/L2/analyze_coverage.py - Coverage analysis script
  • database/L2/extract_schema.py - Schema extraction tool
  • database/L2/L2_SCHEMA_COMPLETE.txt - Full schema documentation

Next Steps

Immediate

  • L3 processor development (feature calculation layer)
  • L3 schema design for aggregated player features

Future Enhancements

  • Add spatial analysis tables for heatmaps
  • Expand event types beyond kill/bomb
  • Add derived metrics (clutch win rate, eco round performance, etc.)

Conclusion

The L2 database layer is production-ready with:

  • 100% L1→L2 transformation coverage
  • Zero data loss
  • Dual data source support (leetify + classic)
  • Comprehensive 10-table schema
  • Modular processor architecture
  • 51,860 rows of high-quality structured data

The foundation is now in place for L3 feature engineering and web application queries.


Build Date: 2026-01-28
L1 Source: 208 matches from output_arena
L2 Destination: database/L2/L2.db
Processing Time: ~30 seconds for 208 matches