86 lines
2.2 KiB
Markdown
86 lines
2.2 KiB
Markdown
|
|
# Downloader 使用说明
|
|||
|
|
|
|||
|
|
## 作用
|
|||
|
|
用于从 5E Arena 比赛页面抓取 iframe 内的 JSON 结果,并按需下载 demo 文件到本地目录。
|
|||
|
|
|
|||
|
|
## 运行环境
|
|||
|
|
- Python 3.9+
|
|||
|
|
- Playwright
|
|||
|
|
|
|||
|
|
安装依赖:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python -m pip install playwright
|
|||
|
|
python -m playwright install
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 快速开始
|
|||
|
|
|
|||
|
|
单场下载(默认 URL):
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python downloader.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
指定比赛 URL:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python downloader.py --url https://arena.5eplay.com/data/match/g161-20260118222715609322516
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
批量下载(从文件读取 URL):
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python downloader/downloader.py --url-list downloader/match_list_temp.txt --concurrency 4 --headless true --fetch-type iframe
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
指定输出目录:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python downloader.py --out output_arena
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
只抓 iframe 数据或只下载 demo:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python downloader.py --fetch-type iframe
|
|||
|
|
python downloader.py --fetch-type demo
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 主要参数
|
|||
|
|
- --url:单场比赛 URL,未传时使用默认值
|
|||
|
|
- --url-list:包含多个比赛 URL 的文本文件,一行一个 URL
|
|||
|
|
- --out:输出目录,默认 output_arena
|
|||
|
|
- --match-name:输出目录前缀名,默认从 URL 提取
|
|||
|
|
- --headless:是否无头模式,true/false,默认 false
|
|||
|
|
- --timeout-ms:页面加载超时毫秒,默认 30000
|
|||
|
|
- --capture-ms:主页面 JSON 监听时长毫秒,默认 5000
|
|||
|
|
- --iframe-capture-ms:iframe 页面 JSON 监听时长毫秒,默认 8000
|
|||
|
|
- --concurrency:并发数量,默认 3
|
|||
|
|
- --goto-retries:页面打开重试次数,默认 1
|
|||
|
|
- --fetch-type:抓取类型,iframe/demo/both,默认 both
|
|||
|
|
|
|||
|
|
## 输出结构
|
|||
|
|
下载目录会以比赛编号或自定义名称创建子目录:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
output_arena/
|
|||
|
|
g161-20260118222715609322516/
|
|||
|
|
iframe_network.json
|
|||
|
|
g161-20260118222715609322516_de_ancient.zip
|
|||
|
|
g161-20260118222715609322516_de_ancient.dem
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## URL 列表格式
|
|||
|
|
文本文件一行一个 URL,空行和以 # 开头的行会被忽略:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
https://arena.5eplay.com/data/match/g161-20260118222715609322516
|
|||
|
|
# 注释
|
|||
|
|
https://arena.5eplay.com/data/match/g161-20260118212021710292006
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 常见问题
|
|||
|
|
- 如果提示 Playwright 未安装,请先执行安装命令再运行脚本
|
|||
|
|
- 如果下载目录已有文件,会跳过重复下载
|