概述

AI Wildlife Camera

Automated wildlife detection pipeline for infrared trail camera footage. Version 2.2 — pre-execution interaction + configurable frame density.

What's New in v2.2

Optimization	Problem Addressed	Implementation
-------------	------------------	----------------
Pre-execution interaction	Agent extracted frames before asking user preferences	All questions moved to Phase 0 before any file operations
Configurable frame density	High density caused excessive token usage	User selects High/Medium/Low before extraction starts
Three-zone frame extraction	Animal漏识 (animals entering mid-video missed)	Front 25% (50% frames) + Middle 50% (35% frames) + Last 25% (15% frames)
Increased frame density	Small/fast animals missed in sparse sampling	8-28 frames per video (up from 5-20), 800px resolution (up from 640px)
Similar species contrast reasoning	Species misjudgment	Prompt model to compare distinguishing features dynamically
Human-priority detection rules	False positives in human activity scenes	Prompt: "detected humans → lower wildlife threshold, flag separately"
"Suspected wildlife" category	Direct denial of unclear cases	Output: `has_wildlife: True/False/疑似` instead of binary
Cross-frame deduplication prompt	Count errors (same animal counted multiple times)	Prompt instructs model to check "跨帧一致性"
Dual-model API support	Single model bias	`analyze_api.py` supports Qwen-VL with checkpointing and retry

Workflow Overview

5-phase pipeline for batch-processing trail camera videos:

Phase	Task	Script	Agent Role
-------	------	--------	------------
0	预执行交互确认（强制）	—	Agent asks user
1	Scan videos, extract metadata	`inventory.py`	Auto
2	Extract frames from each video (three-zone, user-selected density)	`extract_frames.py`	Auto
3	Vision analysis + location correction + write results	`analyze_api.py` / Agent review	API batch or Agent reviews
4	Export results to Excel	`export_excel.py`	Auto

关键变更（v2.2）：所有交互确认在 Phase 0 完成，确认后才执行任何文件操作。

Prerequisites

FFmpeg with ffprobe (ffmpeg.exe / ffprobe.exe on Windows)
Python 3.8+ with openpyxl
NVIDIA GPU (optional, speeds up if vision model runs locally — currently uses agent vision)

Configuration

Edit scripts/inventory.py and scripts/extract_frames.py top CONFIG section:

FFMPEG_BIN = r"C:\path\to\ffmpeg\bin"      # Windows
# FFMPEG_BIN = "/usr/bin"                    # Linux/macOS

INPUT_DIR = r"C:\TrailCamera\Videos"        # Your footage folder
OUTPUT_DIR = r"C:\TrailCamera\Output"        # Results folder

Phase 0: 预执行交互确认（强制）

⚠️ 执行任何文件操作前，Agent 必须先完成以下交互，获得用户明确确认后方可继续。

Step 0a: 询问相机安装地点

Prompt the user:

> "🦌 即将开始野生动物视觉识别。请指定红外相机安装地点（至少精确到省/地区，如'中国云南省'），这会用于修正物种识别结果。默认'中国'。"

Store location in output/location.txt (single line, e.g. 中国云南省高黎贡山).

Step 0b: 说明当前模型并询问用户选择

Prompt the user with full transparency:

> "🧠 当前可用的视觉识别模型：

> - A) Qwen-VL-Plus API（默认，通过阿里云百炼接口，批量处理，支持物种对比推理、跨帧去重、断点续传）

> - B) Kimi 内置视觉（我逐帧查看，适合小批量或 API 不可用时）

> - C) 其他 API（GPT-4o / Claude / Gemini，需要你自行提供 API key）

>

> 请选 A/B/C，或告诉我你的偏好。如选 C，请提供 API key 和模型名称。"

根据用户选择配置对应脚本：

选 A → 确认 analyze_api.py 中已有阿里云 key，可直接使用
选 B → 进入 Agent 手动 review 模式
选 C → 用户提供 key 后写入 analyze_api.py CONFIG，或创建新的分析脚本

Step 0c: 询问帧密度

Prompt the user:

> "📐 帧密度选择（影响识别精度和 token / API 消耗）：

> - 高：当前设置（三段式共 8-28 帧/视频，精度最高，token 消耗最大）

> - 中：调减 50%（三段式共 4-14 帧/视频，平衡精度与成本）

> - 低：调减 75%（三段式共 2-7 帧/视频，成本最低，适合快速初筛）

>

> 请选高/中/低。"

Store density selection in output/frame_density.txt (single line: high / medium / low).

Frame density scaling rules:

User Choice	Scaling	<30s	30-60s	60-120s	120-300s	300-600s	>600s
-------------	---------	------	--------	---------	----------	----------	--------
高	100%	8	12	15	18	22	28
中	50%	4	6	8	9	11	14
低	25%	2	3	4	5	6	7

> 最低保障：每个 zone 至少提取 1 帧，确保覆盖视频前中后三段。

Step 0d: 最终确认

Prompt the user:

> "📋 确认信息：

> - 地点：[用户提供的地点]

> - 模型：[用户选择的模型]

> - 帧密度：[高/中/低]

> - 待处理视频：[INPUT_DIR 路径]

>

> 确认无误后回复'开始'，我将执行扫描→帧提取（按选的密度）→视觉识别→导出报告。"

收到用户明确回复（如"开始"/"确认"/"跑吧"）后，方可进入 Phase 1。

Phase 1: Inventory (Auto)

python scripts/inventory.py

Scans INPUT_DIR recursively for video files (.mp4, .mov, .avi, .mkv, .m4v, .mpg, .mpeg).

Extracts per video:

原始文件名 — filename
拍摄时间 — Parsed from filename patterns, then ffprobe creation_time, fallback to file mtime
视频时长 — Duration in seconds
分辨率 — Width × Height
文件大小 — MB
编码格式 — Video codec

Outputs:

output/inventory.json — Raw data
output/inventory.xlsx — Excel preview (Phase 1 data only)

Date Parsing Priority

Filename patterns: IMG_YYYYMMDD_HHMMSS, YYYY-MM-DD_HH-MM-SS, YYYYMMDD_HHMMSS, YYYY_MM_DD_HH_MM_SS
ffprobe creation_time / date tag
File modification time (fallback)

Phase 2: Frame Extraction (Auto) — Three-Zone + User Density

python scripts/extract_frames.py

Reads output/inventory.json and output/frame_density.txt, extracts frames per video using three-zone strategy with user-selected density.

Frame naming: Flat structure — output/frames/PIRT0001_frame_001.jpg, PIRT0001_frame_002.jpg, etc.

Frame extraction reads frame_density.txt to determine scaling factor:

high → 100% of base frame count (8-28 frames)
medium → 50% of base frame count (4-14 frames)
low → 25% of base frame count (2-7 frames)

Base frame count (high density):

Video Duration	Total Frames	First 25% (trigger zone)	Middle 50% (activity zone)	Last 25% (exit zone)
----------------	-------------	--------------------------	---------------------------	---------------------
< 30 sec	8	4	3	1
30–60 sec	12	6	4	2
60–120 sec	15	7	5	3
120–300 sec	18	9	6	3
300–600 sec	22	11	8	3
> 600 sec	28	14	10	4

> Rationale for three-zone split: Trail camera videos are triggered by motion, but animals may enter at start, linger mid-video, or exit at the end. Three-zone coverage maximizes detection probability across the full clip.

Frame resolution: 800px width (up from 640px) for better small animal detail.

Frame quality: JPEG quality=2 (high).

Phase 3: Vision Analysis + Correction + Write Results

Step 3a: Run Vision Analysis

根据用户在 Phase 0b 的选择，执行对应的视觉识别：

Option A — API Batch Mode (Qwen-VL)

python scripts/analyze_api.py

Sends frames per video to Qwen-VL-Plus API (frame count depends on density selected in Phase 0c).

Option B — Agent Manual Review

> Agent views extracted frames using read tool on image files and records per-video summary.

Option C — Other API

> Use user-provided API key and model.

Step 3b: Apply Location-Based Correction

After raw vision results are in, read output/location.txt and apply correction rules:

不在该地区分布的物种 → 排除或降置信度
该地区常见物种 → 提升置信度
候鸟/迁徙种 → 标注季节性
中国家养动物 → 与野生动物区分
入侵物种 → 特别标注
易混淆物种对 → 用区分特征修正。参考 references/wildlife_guide.md 中的地区物种参考和形态特征描述，对相似物种进行排除法推理。修正时不硬编码具体物种对，而是根据实际检出结果动态比对地区常见物种的形态特征（体型、毛色、尾型、行为模式等）。

Step 3c: Write Results to vision_analysis.json

After applying location-based correction, write the final results to output/vision_analysis.json:

{
  "location": "中国云南省高黎贡山",
  "model_used": "qwen-vl-plus",
  "frame_density": "medium",
  "correction_applied": true,
  "videos": [
    {
      "filename": "RCNX0001.AVI",
      "has_human": false,
      "has_wildlife": true,
      "species_detected": ["野猪"],
      "individual_count": {"野猪": 2},
      "confidence": "high",
      "notes": "夜间拍摄，成年个体带幼崽，从左侧进入画面"
    }
  ]
}

Field reference:

Field	Description
-------	-------------
`has_human`	`True` / `False` / `疑似`
`has_wildlife`	`True` / `False` / `疑似`（v2新增"疑似"用于难以辨认的情况）
`species_detected`	List of species names or `[]`
`individual_count`	Dict: `{species: count}` or total int
`confidence`	`high` / `medium` / `low`（`low` for suspected/unclear cases）
`notes`	Free text: behavior, weather, lighting, API raw response, correction notes

Writing command (for Agent or script):

import json

vision_analysis = {
    "location": location,  # from location.txt
    "model_used": model_name,  # e.g. "qwen-vl-plus" or "kimi-vision"
    "frame_density": density,  # from frame_density.txt
    "correction_applied": True,
    "videos": corrected_results  # list of dicts
}

with open("output/vision_analysis.json", "w", encoding="utf-8") as f:
    json.dump(vision_analysis, f, ensure_ascii=False, indent=2)

Phase 4: Export to Excel (Auto)

python scripts/export_excel.py

Reads inventory.json + vision_analysis.json, merges data, writes structured Excel:

Column	Source
--------	--------
序号	auto
原始文件名	inventory
拍摄时间	inventory (parsed)
视频时长(秒)	inventory
是否有人类	vision_analysis
是否有野生动物	vision_analysis
识别物种	vision_analysis (comma-separated)
个体数量	vision_analysis
置信度	vision_analysis
备注	vision_analysis

Output: output/wildlife_report.xlsx

Color coding:

🟢 浅绿 — 有野生动物
🟠 浅橙 — 有人类
🔴 浅红 — 读取错误行

Batch Processing Tips

For >50 videos: split into folders of 20–30 videos to manage frame review load
Frame extraction can run overnight; vision review can resume per-folder
Use --fps 1 if you want 1 frame per second (modify extract_frames.py CONFIG)

Known Limitations

Audio is ignored — no acoustic species identification
API dependency — Qwen-VL API requires valid key and network; fallback to agent manual review when unavailable
Night/IR footage — low-light frames may still reduce accuracy; infrared-trained models TBD for v3
Small animals — v2 improved with 800px frames and three-zone sampling, but very distant rodents/birds may still be missed
Cross-model validation — v2 uses single API model; v3 planned: dual-model consensus (Kimi + Qwen-VL) with disagreement flagging

版本历史

共 3 个版本

v2.2.1 更新安全设置当前

2026-05-20 17:38 安全
v2.2.0 Initial release

2026-05-20 17:07 安全
v1.0.0 Initial release

2026-05-20 15:46 安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

suspicious

查看报告

红外相机视频处理

概述

AI Wildlife Camera

What's New in v2.2

Workflow Overview

Prerequisites

Configuration

Phase 0: 预执行交互确认（强制）

Step 0a: 询问相机安装地点

Step 0b: 说明当前模型并询问用户选择

Step 0c: 询问帧密度

Step 0d: 最终确认

Phase 1: Inventory (Auto)

Date Parsing Priority

Phase 2: Frame Extraction (Auto) — Three-Zone + User Density

Phase 3: Vision Analysis + Correction + Write Results

Step 3a: Run Vision Analysis

Step 3b: Apply Location-Based Correction

Step 3c: Write Results to vision_analysis.json

Phase 4: Export to Excel (Auto)

Batch Processing Tips

Known Limitations

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

ontology

Self-Improving + Proactive Agent

Skill Vetter