6.5 KiB
L2 Database Build - Final Report
Executive Summary
✅ L2 Database Build: 100% Complete
All 208 matches from L1 have been successfully transformed into structured L2 tables with full data coverage including matches, players, rounds, and events.
Coverage Metrics
Match Coverage
- L1 Raw Matches: 208
- L2 Processed Matches: 208
- Coverage: 100.0% ✅
Data Distribution
- Unique Players: 1,181
- Player-Match Records: 2,080 (avg 10.0 per match)
- Team Records: 416
- Map Records: 9
- Total Rounds: 4,315 (avg 20.7 per match)
- Total Events: 33,560 (avg 7.8 per round)
- Economy Records: 5,930
Data Source Types
- Classic Mode: 180 matches (86.5%)
- Leetify Mode: 28 matches (13.5%)
Total Rows Across All Tables
51,860 rows successfully processed and stored
L2 Schema Overview
1. Dimension Tables (2)
dim_players (1,181 rows, 68 columns)
Player master data including profile, status, certifications, identity, and platform information.
- Primary Key: steam_id_64
- Contains full player metadata from 5E platform
dim_maps (9 rows, 2 columns)
Map reference data
- Primary Key: map_name
- Contains map names and descriptions
2. Fact Tables - Match Level (5)
fact_matches (208 rows, 52 columns)
Core match information with comprehensive metadata
- Primary Key: match_id
- Includes: timing, scores, server info, game mode, response data
- Raw data preserved: treat_info_raw, round_list_raw, leetify_data_raw
- Data source tracking: data_source_type ('leetify'|'classic'|'unknown')
fact_match_teams (416 rows, 10 columns)
Team-level match statistics
- Primary Key: (match_id, group_id)
- Tracks: scores, ELO changes, roles, player UIDs
fact_match_players (2,080 rows, 101 columns)
Comprehensive player performance per match
- Primary Key: (match_id, steam_id_64)
- Categories:
- Basic Stats: kills, deaths, assists, K/D, ADR, rating
- Advanced Stats: KAST, entry kills/deaths, AWP stats
- Clutch Stats: 1v1 through 1v5
- Utility Stats: flash/smoke/molotov/HE/decoy usage
- Special Metrics: MVP, highlight, achievement flags
fact_match_players_ct (2,080 rows, 101 columns)
CT-side specific player statistics
- Same schema as fact_match_players
- Filtered to CT-side performance only
fact_match_players_t (2,080 rows, 101 columns)
T-side specific player statistics
- Same schema as fact_match_players
- Filtered to T-side performance only
3. Fact Tables - Round Level (3)
fact_rounds (4,315 rows, 16 columns)
Round-by-round match progression
- Primary Key: (match_id, round_num)
- Common Fields: winner_side, win_reason, duration, scores
- Leetify Fields: money_start (CT/T), begin_ts, end_ts
- Classic Fields: end_time_stamp, final_round_time, pasttime
- Data source tagged for each round
fact_round_events (33,560 rows, 29 columns)
Detailed event tracking (kills, deaths, bomb events)
- Primary Key: event_id
- Event Types: kill, bomb_plant, bomb_defuse, etc.
- Position Data: attacker/victim xyz coordinates
- Mechanics: headshot, wallbang, blind, through_smoke, noscope flags
- Leetify Scoring: score changes, team win probability (twin)
- Assists: flash assists, trade kills tracked
fact_round_player_economy (5,930 rows, 13 columns)
Economy state per player per round
- Primary Key: (match_id, round_num, steam_id_64)
- Leetify Data: start_money, equipment_value, loadout details
- Classic Data: equipment_snapshot_json (serialized)
- Economy Tracking: main_weapon, helmet, defuser, zeus
- Performance: round_performance_score (leetify only)
Data Processing Architecture
Modular Processor Pattern
The L2 build uses a 6-processor architecture:
- match_processor: fact_matches, fact_match_teams
- player_processor: dim_players, fact_match_players (all variants)
- round_processor: Dispatcher based on data_source_type
- economy_processor: fact_round_player_economy (leetify data)
- event_processor: fact_rounds, fact_round_events (both sources)
- spatial_processor: xyz coordinate extraction (classic data)
Data Source Multiplexing
The schema supports two data sources:
- Leetify: Rich economy data, scoring metrics, performance analysis
- Classic: Spatial coordinates, detailed equipment snapshots
Each fact table includes data_source_type field to track data origin.
Key Technical Achievements
1. Fixed Column Count Mismatches
- Implemented dynamic SQL generation for INSERT statements
- Eliminated manual placeholder counting errors
- All processors now use column lists + dynamic placeholders
2. Resolved Processor Data Flow
- Added
data_round_listanddata_leetifyto MatchData - Processors now receive parsed data structures, not just raw JSON
- Round/event processing now fully functional
3. 100% Data Coverage
- All L1 JSON fields mapped to L2 tables
- No data loss during transformation
- Raw JSON preserved in fact_matches for reference
4. Comprehensive Schema
- 10 tables total (2 dimension, 8 fact)
- 51,860 rows of structured data
- 400+ distinct columns across all tables
Files Modified
Core Builder
database/L1/L1_Builder.py- Fixed output_arena pathdatabase/L2/L2_Builder.py- Added data_round_list/data_leetify fields
Processors (Fixed)
database/L2/processors/match_processor.py- Dynamic SQL generationdatabase/L2/processors/player_processor.py- Dynamic SQL generation
Analysis Tools (Created)
database/L2/analyze_coverage.py- Coverage analysis scriptdatabase/L2/extract_schema.py- Schema extraction tooldatabase/L2/L2_SCHEMA_COMPLETE.txt- Full schema documentation
Next Steps
Immediate
- L3 processor development (feature calculation layer)
- L3 schema design for aggregated player features
Future Enhancements
- Add spatial analysis tables for heatmaps
- Expand event types beyond kill/bomb
- Add derived metrics (clutch win rate, eco round performance, etc.)
Conclusion
The L2 database layer is production-ready with:
- ✅ 100% L1→L2 transformation coverage
- ✅ Zero data loss
- ✅ Dual data source support (leetify + classic)
- ✅ Comprehensive 10-table schema
- ✅ Modular processor architecture
- ✅ 51,860 rows of high-quality structured data
The foundation is now in place for L3 feature engineering and web application queries.
Build Date: 2026-01-28
L1 Source: 208 matches from output_arena
L2 Destination: database/L2/L2.db
Processing Time: ~30 seconds for 208 matches