This skill provides automated batch processing of images to detect faces and add masks. It uses OpenCV DNN SSD deep learning face detector (ResNet-10 based) for high-accuracy detection, with multi-scale tiled detection for small/distant faces, intelligent post-processing to eliminate false positives, automatic front/side face recognition for selecting the appropriate mask template, an enhanced detection mode (--enhanced) for crowded classroom/group photos with many small/distant faces, upward-angle adaptive positioning for selfie/wide-angle photos, manual face coordinate injection for missed detections, model instance reuse for efficient batch processing, and batch processing support.
Note: The DNN model weights file (res10_300x300_ssd_iter_140000.caffemodel, ~10MB) is not bundled in the skill package. It is automatically downloaded from OpenCV's official GitHub repository on first use. Subsequent runs reuse the downloaded models with no network required.
Use this skill when users need to:
--enhanced) for crowded scenes with 20+ people where standard mode misses small/distant faces
--manual-faces)
| Feature | v1.0 (Haar Cascade) | v2.0 (DNN SSD) | v4.0 Enhanced Mode | v4.1 Enhanced + Skin | v4.3 Adaptive |
|---------|---------------------|-----------------|---------------------|----------------------|---------------|
| Detection accuracy | ~85% for frontal faces | ~95%+ for frontal/slight angle | ~95%+ with higher recall | ~95%+ with higher precision | ~95%+ with adaptive positioning |
| Small face detection | Poor | Good (2-4 tile levels) | Excellent (2-8 tile levels + region enhancement) | Excellent | Excellent |
| False positive rate | High | Very low (intelligent filtering) | Very low (multi-stage filtering + overlap cleanup) | Lowest (+ skin color verification) | Lowest |
| Side face support | Very limited | Better coverage | Better coverage | Better coverage | Better coverage |
| Crowded scenes (20+ people) | Poor | Good | Excellent (60-100% more faces detected) | Excellent with fewer false positives | Excellent |
| Selfie/wide-angle | Poor alignment | Moderate | Moderate | Moderate | Adaptive positioning |
| Manual face override | No | No | No | No | Yes (--manual-faces) |
| Batch dual-template | No | No | No | No | Yes |
| Speed | Fast | Moderate | Slower but much more thorough | Similar to v4.0 | Similar to v4.1 |
The Haar Cascade classifier relies on simple rectangular features and struggles with:
The DNN SSD detector (ResNet-10 backbone) uses deep learning for much more robust face detection, combined with our multi-scale approach.
```bash
pip install opencv-python pillow numpy
```
models/ directory)
models/deploy.prototxt & models/res10_300x300_ssd_iter_140000.caffemodel
OpenCV DNN face detection model files (ResNet-10 SSD architecture). Pre-trained on face detection dataset, significantly more accurate than Haar Cascade.
scripts/add_mask_with_image.py
Core script for adding masks to images using DNN detection.
Usage:
# Single mask mode (all faces use same mask)
python scripts/add_mask_with_image.py <input_image> [mask_image] [output_path]
# Dual mask mode (auto front/side face recognition)
python scripts/add_mask_with_image.py --front <front_mask> --side <side_mask> <input_image> [output_path]
# Enhanced detection mode (for crowded classroom/group photos with many small faces)
python scripts/add_mask_with_image.py --enhanced <input_image> [mask_image] [output_path]
python scripts/add_mask_with_image.py --enhanced --front <front_mask> --side <side_mask> <input_image> [output_path]
# Manual face coordinates (for faces missed by auto-detection)
python scripts/add_mask_with_image.py --manual-faces "x,y,w,h;x,y,w,h" <input_image> [mask_image] [output_path]
Features:
--enhanced) for crowded scenes:
--manual-faces "x,y,w,h;x,y,w,h" for detection misses
scripts/batch_process.py
Batch processing script with shared model instance for performance.
Usage:
# Single mask batch processing
python scripts/batch_process.py <input_dir> [mask_image] [output_dir]
# Dual mask batch processing
python scripts/batch_process.py --front <front_mask> --side <side_mask> <input_dir> [output_dir]
# Enhanced detection batch processing
python scripts/batch_process.py --enhanced --front <front_mask> --side <side_mask> <input_dir> [output_dir]
Features:
--front / --side for batch processing
--enhanced for batch processing
assets/mask.png
Default mask template (currently: 小红花口罩). Used in single-mask mode.
assets/mask_front.png
Front-face mask template. Used in dual-mask mode for faces detected as frontal.
assets/mask_side.png
Side-face mask template. Used in dual-mask mode for faces detected as side-facing. Auto-flipped horizontally when the face is facing right (template default is left-facing).
references/usage_guide.md
Complete user guide with troubleshooting and examples.
These can be adjusted in add_mask_with_image.py:
# Detection (Standard mode)
CONFIDENCE_THRESHOLD = 0.25 # Balanced threshold for recall vs false positive control
MIN_FACE_SIZE = 12 # Minimum face size in pixels
ASPECT_RATIO_RANGE = (0.4, 2.5) # Face width/height ratio filter
# Enhanced mode detection (--enhanced flag)
ENHANCED_CONFIDENCE_THRESHOLD = 0.20 # Lower threshold for higher recall (v4.1: tuned from 0.15)
ENHANCED_MIN_FACE_SIZE = 12 # Minimum face size in enhanced mode
ENHANCED_TILE_LEVELS = [2,3,4,5,6] # Extended tiling levels (v4.1: tuned from 2-8 to 2-6)
ENHANCED_OVERLAP_THRESHOLD = 0.30 # Overlap cleanup threshold (v4.1: tuned from 0.35)
# Skin color verification (v4.1)
SKIN_COLOR_MIN_RATIO = 0.15 # Min skin pixel ratio in upper 40% of face box
SKIN_COLOR_CONF_THRESHOLD = 0.45 # Detections below this conf require skin check
SKIN_HSV_LOWER1 = (0, 30, 60) # Skin HSV range 1 lower bound
SKIN_HSV_UPPER1 = (20, 180, 255) # Skin HSV range 1 upper bound
SKIN_HSV_LOWER2 = (160, 30, 60) # Skin HSV range 2 lower bound (red wrap-around)
SKIN_HSV_UPPER2 = (180, 180, 255) # Skin HSV range 2 upper bound
# Upward-angle adaptive positioning (v4.3)
UPWARD_ANGLE_ASPECT_THRESHOLD = 1.35 # fh/fw > 1.35 → upward angle detected
UPWARD_ANGLE_Y_START_RATIO = 0.15 # Mask starts at 15% (vs normal 30%) for selfies
UPWARD_ANGLE_Y_END_RATIO = 1.10 # Slightly shorter mask for selfies
# Mask positioning (ratios relative to face bounding box)
MASK_Y_START_RATIO = 0.30 # Start at 30% = below eyes (covers nose, mouth, chin)
MASK_Y_END_RATIO = 1.25 # End below chin for full coverage
MASK_X_START_RATIO = -0.10 # Extend 10% beyond left edge (cover cheeks)
MASK_X_END_RATIO = 1.10 # Extend 10% beyond right edge (cover cheeks)
# Front/side face classification (v3.1 multi-feature)
# Composite scoring: ar_deviation_score * 0.4 + edge_asymmetry_score * 0.3 + gradient_bias_score * 0.3
# total_score > 0.40 → side face
Standard 300×300 input cannot resolve faces smaller than ~50px in the original image. Multi-scale tiled detection solves this:
Using a high threshold (e.g., 0.45) misses many real faces. A very low threshold (0.18) catches nearly all faces but introduces too many false positives. The optimal approach: moderate threshold (0.25) + two-stage post-processing:
In standing group photos, the DNN detector often picks up clothing patterns, buttons, belt buckles, and knee textures as faces. These false positives share a key spatial property: they are directly below a real face (same x-range). The body-part filter checks if a low-confidence detection falls within the vertical column under any high-confidence face anchor, effectively eliminating chest/waist/knee misdetections.
Many mask template images have black or white backgrounds. Simply overlaying them creates ugly rectangles. Solution:
DNN SSD detection boxes don't distinguish face orientation well (boxes are often near-square for both). The v3.1 strategy uses multi-feature composite scoring for significantly improved accuracy:
The original v1.0 script only processed the single largest face. For group photos (classrooms, teams), ALL faces need masks.
Standard detection (2-4 tile levels, conf=0.25) works well for most scenarios but misses many small/distant faces in crowded classroom photos (20+ people). The enhanced mode (--enhanced) addresses this with:
Even with spatial filtering, some low-confidence detections on blackboards, desks, backpacks, or wall decorations slip through. The v4.1 skin color verification provides a definitive signal:
In addition to the body-below filter (v4.0), v4.1 adds lateral spatial filtering to catch hand/arm misdetections:
When processing screenshots or images containing both real people and cartoon/anime/illustration characters (e.g., PPT slides with avatar illustrations), the DNN detector picks up cartoon faces too. The v4.2 cartoon filter distinguishes them using three texture-based features:
Key design decisions:
Real-world results on classroom photos:
Trade-off: Enhanced mode is 3-5x slower due to more detection passes. Use standard mode for simple photos, enhanced mode for crowded scenes.
Selfies and wide-angle group photos taken from a low camera angle produce elongated face bounding boxes (height/width ratio > 1.35). With the standard mask positioning (y_start=30%), the mask sits too low — leaving the nose exposed or barely covered. The v4.3 adaptive positioning detects this condition and shifts the mask upward:
face_height / face_width > 1.35 AND orientation is 'front'
No face detector achieves 100% recall. In production, some faces are consistently missed — especially small/partially-occluded faces at frame edges or unusual angles. Rather than endlessly tuning detector parameters (which risks introducing new false positives), v4.3 adds a practical escape hatch:
--manual-faces "x,y,w,h;x,y,w,h"): Allows specifying exact bounding boxes for missed faces. These are merged with auto-detected results before mask application, inheriting all mask processing (front/side classification, template selection, positioning)
add_mask_with_image() now accepts pre-loaded net, mask_front, mask_side, mask_single objects. This avoids redundant model loading when processing multiple images (e.g., batch scripts or fix_masks.py workflows), reducing per-image overhead from ~2s to <0.1s
Rule 1: conf < 0.40 AND area > 3× avg_high_conf_area → REJECT (large object)
Rule 2: conf < 0.45 AND face_center_y > 70% image_height → REJECT (furniture/floor)
Rule 3: conf < 0.35 AND face_y < 10% image_height → REJECT (ceiling/lights)
Rule 4: conf < 0.35 AND (face_x < 3% OR face_right > 97%) → REJECT (edge noise)
Rule 5: conf < 0.45 AND area < 25% median_high_conf_area → REJECT (tiny noise)
For each low-confidence detection (conf < 0.50):
For each high-confidence anchor face (conf >= 0.50):
IF x-aligned (within 1.5× anchor width)
AND below anchor face bottom
AND within 4× anchor face height distance
→ REJECT (body part: chest/waist/knee/shoe)
Stage 1: Tighter filtering rules
Rule 0: conf < 0.50 AND (aspect < 0.65 OR aspect > 1.50) → REJECT (abnormal shape)
Rule 1: conf < 0.55 AND area > 2× avg_high_conf_area → REJECT (oversized object)
Rule 2a: face_center_y > 68% image_height → REJECT unconditionally (floor/furniture)
Rule 2b: conf < 0.50 AND face_center_y > 55% image_height → REJECT (bottom zone)
Rule 3: conf < 0.40 AND face_y < 25% image_height → REJECT (ceiling/signs)
Rule 4: conf < 0.25 AND (face_x < 2% OR face_right > 98%) → REJECT (edge noise)
Rule 5a: conf < 0.40 AND area < 20% median_high_conf_area → REJECT (too small)
Rule 5b: conf < 0.55 AND area < 10% median_high_conf_area → REJECT (extremely small)
Stage 2: Body part + hand spatial filtering (v4.1 improved)
- Body part: conf < 0.45, x-aligned with anchor face, below anchor, within 3× height → REJECT
- Hand region (NEW): conf < 0.45, lateral to anchor (0.8-3.0× face width), below eye level,
area < 50% median → REJECT (hand/arm misdetection)
Stage 3: Skin color verification (v4.1 NEW)
- For detections with conf < 0.45:
- Extract upper 40% of detection box (forehead/eye region)
- Convert to HSV and check skin-tone pixel ratio
- HSV ranges: H=0-20 or 160-180, S=30-180, V=60-255
- skin_ratio >= 0.15 → PASS (real face)
- skin_ratio < 0.15 → REJECT (blackboard, desk, backpack, etc.)
Stage 4: Overlap cleanup (>35% overlap with higher-conf detection → REJECT)
models/deploy.prototxt and models/res10_300x300_ssd_iter_140000.caffemodel exist
CONFIDENCE_THRESHOLD slightly (e.g., to 0.30) if still seeing issues
post_filter_faces() for specific scenarios
--enhanced flag for enhanced detection mode (recommended for 20+ people)
CONFIDENCE_THRESHOLD (e.g., from 0.25 to 0.18)
MASK_Y_START_RATIO (e.g., from 0.30 to 0.40 or 0.50)
pip install opencv-python pillow numpy
opencv-python package does)
共 1 个版本