This skill transforms raw Apple Health export data into a multi-report system of fully Chinese-localized, interactive health dashboards with cross-correlation analysis, personal dynamic baselines, and personalized recommendations. It handles the full pipeline: XML parsing (with token-efficient streaming), data cleaning, statistical analysis, and interactive Plotly visualization — all while adapting to each user's unique data profile, devices, and health goals.
When this skill is activated, follow this decision tree:
User has Apple Health data?
├── YES: XML file found in workspace
│ ├── Step 1: DATA PROFILING (lightweight scan — never load full XML into context)
│ ├── Step 2: USER INTERVIEW (goals, life stages, preferences)
│ ├── Step 3: ADAPTIVE ANALYSIS PLAN (based on available data + goals)
│ ├── Step 4: PARSE & EXTRACT (streaming XML → aggregated CSV)
│ ├── Step 5: ANALYZE & VISUALIZE (generate dashboard)
│ └── Step 6: INSIGHTS & RECOMMENDATIONS (personalized advice)
│
├── YES: Pre-parsed CSV/JSON files exist
│ ├── Skip to Step 2 (interview)
│ └── Continue from Step 3
│
└── NO: No health data found
└── Guide user through Apple Health export process
CRITICAL: Token Conservation Strategy
Apple Health XML files are typically 100MB–2GB+. NEVER read the raw XML into the conversation context. Instead:
scripts/parse_health_xml.py --profile-only) to generate a compact JSON summarypython3 {SKILL_DIR}/scripts/parse_health_xml.py --profile-only --input "<path_to_export.xml>"
This produces a health_profile.json containing:
After profiling, read ONLY the JSON summary:
read_file("<workspace>/health_data/health_profile.json")
The profiler automatically classifies the user's setup into one of three tiers:
| Tier | Devices | Available Data | Analysis Scope |
|---|---|---|---|
| ------ | --------- | --------------- | ---------------- |
| Tier 1: iPhone Only | iPhone (no wearable) | Steps, distance, flights climbed, walking metrics, headphone audio, sleep (if using phone-based tracking app) | Activity trends, mobility analysis, audio exposure |
| Tier 2: iPhone + Watch (basic) | iPhone + Apple Watch (older/SE) | Tier 1 + heart rate, active energy, exercise time, basic sleep stages | + Heart rate analysis, energy expenditure, workout tracking |
| Tier 3: iPhone + Watch (advanced) | iPhone + Apple Watch Series 7+ / Ultra | Tier 2 + HRV, blood oxygen, respiratory rate, wrist temperature, sleep breathing disturbances, ECG | + Full cardiovascular analysis, sleep quality deep-dive, cycle tracking correlation |
Fallback rule: If a metric is missing, NEVER error out. Gracefully skip that analysis module and note what additional data would unlock.
Before analysis, ask the user about their goals using ask_followup_question. Keep it to 2–3 focused questions based on what the data profile reveals.
Always ask about analysis goal. Select remaining questions adaptively based on available data:
Question 1 (ALWAYS ASK): Analysis Goal
What's your primary goal for this health analysis?
Options:
- General health overview / curiosity
- Fitness optimization (training, performance)
- Sleep improvement
- Weight management / body composition
- Stress & recovery monitoring
- Reproductive health tracking (cycle analysis)
- Health condition monitoring (post-illness recovery, chronic condition)
- Pre/post pregnancy health tracking
Question 2 (CONDITIONAL): Special Life Periods
Ask ONLY IF the data contains MenstrualFlow, Pregnancy, or Lactation records, OR if the user profile indicates female sex:
Were there any special health periods during the data timeframe we should account for?
Options:
- Pregnancy / postpartum
- Breastfeeding period
- Major illness or surgery recovery
- Significant lifestyle change (new job, relocation, etc.)
- Menopause transition
- None / prefer not to specify
Question 3 (CONDITIONAL): Analysis Depth
Ask ONLY IF data spans 3+ years:
What time period should we focus on?
Options:
- Full history (comprehensive longitudinal view)
- Last 12 months (recent trends)
- Year-over-year comparison
- Specific period (I'll specify dates)
sourceName values): Inform the user which sources were auto-detected and which will be prioritized. Only ask for manual override if auto-detection finds conflicting sources with similar data quality. Source identification uses pattern matching (see Data Robustness Rule 2), not exact string matching.Based on the data profile + user answers, construct an analysis plan. The plan selects from these analysis modules:
| Module | Required Data | Tier | Priority |
|---|---|---|---|
| -------- | -------------- | ------ | ---------- |
| Daily Activity | StepCount, DistanceWalkingRunning, FlightsClimbed | 1+ | P0 |
| Workout Analysis | Workout records | 1+ | P0 |
| Heart Rate Overview | HeartRate (daily aggregates) | 2+ | P0 |
| Resting HR Trend | RestingHeartRate | 3 | P0 |
| HRV & Recovery | HeartRateVariabilitySDNN | 3 | P1 |
| Sleep Duration | SleepAnalysis | 1+ | P0 |
| Sleep Stages | SleepAnalysis (with stage values) | 2+ | P1 |
| Sleep Quality | SleepAnalysis + AppleSleepingWristTemperature | 3 | P2 |
| Body Composition | BodyMass, BodyFatPercentage | 1+ | P1 |
| Menstrual Cycle | MenstrualFlow | 1+ | P1 |
| Cycle-Vital Correlation | MenstrualFlow + RestingHeartRate + HRV | 3 | P2 |
| Cardio Fitness | VO2Max | 3 | P1 |
| Respiratory | RespiratoryRate, OxygenSaturation | 3 | P2 |
| Audio Exposure | HeadphoneAudioExposure, EnvironmentalAudioExposure | 1+ | P2 |
| Mobility & Gait | WalkingSpeed, WalkingStepLength, WalkingAsymmetryPercentage | 1+ | P2 |
| Swimming Analysis (v2.2.0) | Workout (Swimming) + SwimmingStrokeCount + SwimmingDistance + WaterTemperature | 2+ | P1 |
| Cross-Correlation (v2.2.0) | SleepAnalysis + RestingHeartRate + HRV | 3 | P1 |
| Personal Dynamic Baselines (v2.2.0) | Any long-term metric (30+ days) | 1+ | P1 |
Before executing, briefly tell the user which modules will run and which are skipped (with reason). Example:
> Based on your data, I'll analyze: Daily Activity (N years of step data), Workouts (N sessions), Heart Rate (from YYYY), Sleep (YYYY–present), Menstrual Cycles (N records). Skipping: Blood Oxygen (insufficient data), Respiratory Rate (limited data). Special period (if any) will be handled separately in trend analysis.
Run the parsing script to extract data into lightweight CSV files:
python3 {SKILL_DIR}/scripts/parse_health_xml.py \
--input "<path_to_export.xml>" \
--output-dir "<workspace>/health_data/" \
--modules "activity,workout,heartrate,sleep,menstrual,body" \
--start-date "2016-01-01"
iterparse — never ET.parse() the full tree for files >50MBelem.clear() after processing — release memory immediately2025-03-30 08:15:23 +0800. Current limitation: the scripts truncate timezone info for simplicity — all dates are treated as local time at the moment of recording. This works correctly for users who stay in one timezone. For users who travel across timezones, some date attributions may be slightly off. A future version will parse full timezone offsets and convert to user's home timezone.\xa0, narrow no-break space \u202F, figure space \u2007, etc.) in sourceName and other text fields. Always apply unicodedata.normalize('NFKC', s) and collapse whitespace before any string matching or comparison. The normalize_str() helper in parse_health_xml.py handles this.See references/health_data_types.md for complete field definitions of each output CSV.
Core Dashboard (v2.1.0 pipeline):
python3 {SKILL_DIR}/scripts/generate_dashboard.py \
--data-dir "<workspace>/health_data/" \
--output "<workspace>/health_dashboard.html" \
--modules "<comma-separated module list>" \
--special-periods '<JSON array of special period configs>'
Multi-Report System (v2.2.0):
In addition to the core dashboard, v2.2.0 provides three specialized, independent analysis reports. Each report reads from the parsed CSV files in health_data/ and generates a self-contained HTML file. Run these after Step 4 (Parse & Extract) completes.
Report 1: Comprehensive Health Analysis
python3 {SKILL_DIR}/scripts/health_analysis.py
health_data/*.csv (in current working directory)health_report.htmlhealth_data/ exists.Report 2: Sleep Deep-Dive Dashboard
python3 {SKILL_DIR}/scripts/sleep_analysis_dashboard.py
health_data/*.csv (in current working directory)sleep_analysis_report.htmlReport 3: Yearly Data Overview
# Step 1: Extract yearly statistics
python3 {SKILL_DIR}/scripts/yearly_stats.py
# Step 2: Generate the report
python3 {SKILL_DIR}/scripts/yearly_analysis_report.py
导出.xml or export.xml (for yearly_stats.py), yearly_stats.json (for yearly_analysis_report.py)yearly_stats.json, then yearly_analysis_report.htmlData Exploration (utility):
python3 {SKILL_DIR}/scripts/data_exploration.py
#007AFF (blue), #FF9500 (orange), #34C759 (green), #FF3B30 (red), #AF52DE (purple)#FAFAFA, Grid: #E5E5EAData Type Name Mappings:
| English Identifier | Chinese Name |
|-------------------|-------------|
| StepCount | 步数 |
| DistanceWalkingRunning | 步行+跑步距离 |
| FlightsClimbed | 已爬楼层 |
| ActiveEnergyBurned | 活动能量 |
| HeartRate | 心率 |
| RestingHeartRate | 静息心率 |
| HeartRateVariabilitySDNN | 心率变异性(HRV) |
| VO2Max | 最大摄氧量(VO2Max) |
| OxygenSaturation | 血氧饱和度 |
| RespiratoryRate | 呼吸频率 |
| BodyMass | 体重 |
| BodyFatPercentage | 体脂率 |
| SleepAnalysis | 睡眠分析 |
| MenstrualFlow | 月经 |
| BodyTemperature | 体温 |
| AppleSleepingWristTemperature | 腕部温度 |
| WalkingSpeed | 步速 |
| WalkingStepLength | 步幅 |
| WalkingAsymmetryPercentage | 步行不对称性 |
| HeadphoneAudioExposure | 耳机音量 |
| EnvironmentalAudioExposure | 环境声级 |
| SwimmingStrokeCount | 游泳划水次数 |
Unit Mappings:
| English | Chinese |
|---------|---------|
| bpm | 次/分 |
| ms | 毫秒 |
| kcal | 千卡 |
| mL/(kg·min) | 毫升/(千克·分钟) |
| km | 公里 |
| count | 次 |
| % | % |
Sleep Stage Mappings:
| English | Chinese |
|---------|---------|
| InBed | 在床上 |
| Asleep / Core | 浅睡 |
| Deep | 深睡 |
| REM | 快速眼动(REM) |
| Awake | 清醒 |
English abbreviations (HRV, REM, VO2Max, SWOLF, BMI) are retained in parentheses after the Chinese name for professional context.
Workout records where workoutActivityType contains SwimmingHKSWOLFScore or derived from HKLapLength and stroke countWhen the user has declared special periods (pregnancy, illness, etc.):
Example pregnancy handling:
Pregnancy detected: YYYY-MM-DD (from health records)
→ Mark charts with pregnancy period (approx. start to end)
→ Expect: elevated resting HR, altered sleep patterns, paused menstrual tracking
→ Post-pregnancy: track recovery metrics vs. pre-pregnancy baseline
After generating the dashboard, provide a written summary with:
Based on user goals, emphasize relevant categories:
| User Goal | Primary Recommendation Focus |
|---|---|
| ----------- | ------------------------------ |
| General health | Balance of activity, sleep, stress metrics |
| Fitness optimization | Training load, recovery, VO2Max improvement |
| Sleep improvement | Sleep hygiene, consistency, stage optimization |
| Weight management | Activity-calorie balance, trend correlation |
| Stress & recovery | HRV optimization, activity-rest balance |
| Cycle tracking | Cycle regularity, phase-specific adjustments |
| Condition monitoring | Trend stability, anomaly awareness |
Data gaps are extremely common in Apple Health data. Handle them at every level:
| Gap Type | Definition | Handling Strategy |
|---|---|---|
| ---------- | ----------- | ------------------- |
| Device transition | No Watch data before purchase date | Show "data available from [date]" marker; don't interpolate |
| Sporadic recording | Random missing days/weeks | Use available data with appropriate caution notes |
| Metric not available | Entire metric type is absent (e.g., no VO2Max) | Skip the analysis module; suggest how to enable it |
| Source conflict | Multiple devices recording same metric | Deduplicate using source priority rules |
| Low-frequency manual entry | Body weight recorded only occasionally | Show raw points + moving average; don't interpolate aggressively |
When a preferred metric is unavailable, fall back to alternatives:
RestingHeartRate unavailable?
→ Calculate from HeartRate records (min HR during 2am–5am window)
→ If HeartRate also unavailable → skip HR analysis
SleepAnalysis stages unavailable?
→ Use total InBed/Asleep duration only
→ If no sleep data at all → analyze rest patterns from activity gaps
VO2Max unavailable?
→ Estimate fitness level from resting HR trend + activity level
→ Note: "Estimated fitness level (not clinical VO2Max)"
BodyMass infrequent?
→ Show sparse data points connected, no interpolation
→ Note: "Weight recorded [N] times over [M] months — consider more frequent tracking"
MenstrualFlow incomplete?
→ Calculate available cycle lengths with confidence intervals
→ Note which cycles might have missing data
1. Run profiling script → read small JSON profile
2. Interview user → decide analysis modules
3. Run parsing script → generates CSV files (don't read them)
4. Run dashboard script → generates HTML file
5. Preview HTML in browser
6. Read any small summary files for insights text
| File Size | Profile Time | Parse Time | Dashboard Time |
|---|---|---|---|
| ----------- | ------------- | ------------ | ---------------- |
| <100MB | <10s | <30s | <15s |
| 100MB–500MB | <30s | 1–3 min | <30s |
| 500MB–1GB | <1 min | 3–5 min | <30s |
| >1GB | 1–2 min | 5–10 min | <1 min |
This skill must work for diverse user profiles. Key adaptations:
sourceName.| Error | Recovery |
|---|---|
| ------- | ---------- |
| XML file too large for memory | Switch from ET.parse() to iterparse() streaming |
| XML file not found | Guide user: Settings → Health → Export All Health Data |
| Malformed XML (invalid schema) | Attempt lenient parsing; report unparseable sections |
| No data for requested module | Show empty state with explanation of what's needed |
| Script execution fails | Fall back to in-context Python with small data samples |
| Plotly not installed | Guide pip install plotly (pandas is optional, only needed for custom analysis beyond the scripts) |
| CSV generation fails mid-way | Partial results are still usable; report which modules succeeded |
These rules address common failure modes in Apple Health data processing. They are general-purpose and must be followed regardless of the specific user, device, or data history.
Apple Health exports frequently contain Unicode whitespace variants in text fields, especially sourceName. This is caused by iOS localization, firmware changes, or device-specific formatting. The most common case is non-breaking space (\xa0 / U+00A0) instead of regular space in device names like "XXX的Apple\xa0Watch", but other Unicode spaces also occur.
MUST DO:
unicodedata.normalize('NFKC', s) followed by whitespace collapsing to all string fields before any comparison, matching, or filtering operationnormalize_str() helper provided in parse_health_xml.pysourceName, value (for category types), workout type, and any user-facing textNEVER hardcode specific device names (like "John's Apple Watch" or "陈XX的Apple Watch"). Device names contain personal information and change when users rename devices, switch languages, or upgrade hardware.
MUST DO:
sourceName contains "Apple Watch" (case-insensitive)"iPhone" "Pokémon Sleep", "AutoSleep", "Oura", "Garmin", etc.Example of correct pattern matching:
import unicodedata
def classify_source(source_name):
"""Classify a data source by pattern matching, not exact strings."""
normalized = unicodedata.normalize('NFKC', source_name).lower()
normalized = ' '.join(normalized.split()) # collapse whitespace
if 'apple watch' in normalized:
return 'apple_watch'
elif 'iphone' in normalized:
return 'iphone'
elif any(app in normalized for app in ['pokémon sleep', 'pokemon sleep']):
return 'pokemon_sleep'
elif any(app in normalized for app in ['autosleep', 'pillow', 'sleep cycle']):
return 'sleep_tracker_app'
elif any(app in normalized for app in ['oura', 'garmin', 'whoop', 'zepp', 'fitbit']):
return 'third_party_wearable'
else:
return 'other'
NEVER use datetime.now() as a reference point for "recent N days" calculations or any time-relative analysis. Users frequently:
MUST DO:
```python
last_date = sorted_dates[-1] # NOT datetime.now()
recent_30 = [d for d in data if d['date'] >= (last_date - timedelta(days=30))]
```
recent_n_days() helper in generate_dashboard.py implements this correctlyChart configurations MUST adapt to the actual data being displayed. Never use fixed tick intervals that assume a specific data range.
MUST DO:
adaptive_xaxis(dates) for all time-series charts — it automatically selects appropriate tickformat and dtick based on data span:| Data Span | dtick | tickformat | Example |
|-----------|-------|------------|---------|
| < 3 months | M1 | %m-%d | 03-15 |
| 3–12 months | M1 | %Y-%m | 2025-03 |
| 1–3 years | M3 | %Y-%m | 2025-03 |
| 3–5 years | M6 | %Y-%m | 2025-06 |
| > 5 years | M12 | %Y | 2025 |
adaptive_category_xaxis(labels) for monthly/categorical aggregation chartstickangle to prevent label overlap on dense axestickfont.size (recommended: 11px) for consistencyNEVER silently discard data without documentation. Extreme values may be genuine (marathon day, illness, jet lag) or data errors.
MUST DO:
Sleep sessions that start before a cutoff hour belong to the previous calendar date's night. The current implementation uses 18:00 (6 PM) as the cutoff — any sleep session starting before 18:00 is attributed to the previous day's night.
This handles common cases:
Improvement consideration: In a future version, distinguish naps from main sleep sessions by duration (naps typically < 2 hours) and time of day. For now, the cutoff approach works for the primary use case of nightly sleep tracking.
parse_health_xml.py (v2.1.0) — Streaming XML parser with profiling mode. Handles data extraction, daily aggregation, source-based deduplication for additive metrics (steps, distance, energy, flights), Unicode normalization, and CSV generation.generate_dashboard.py (v2.1.0) — Plotly-based interactive dashboard generator. Reads CSV files and produces self-contained offline HTML (Plotly JS embedded). Features include multi-source sleep deduplication, adaptive axis scaling, data-relative time calculations, data range header display, and smart body fat percentage detection.health_analysis.py (v2.2.0) — Comprehensive health analysis report generator. Produces health_report.html with heart rate/HRV/sleep/workout/menstrual/swimming analysis, cross-correlations, personal dynamic baselines, and fully Chinese-localized chart labels. Includes swimming depth analysis (SWOLF, stroke distribution, water temperature correlation) and stress warning system.sleep_analysis_dashboard.py (v2.2.0) — Sleep-focused dashboard with multi-source deduplication, sleep stage/efficiency/scoring analysis, pregnancy period three-phase comparison (before/during/after), and physiological indicators (RHR/HRV/SpO2/respiratory rate/wrist temperature). All labels fully Chinese-localized.yearly_analysis_report.py (v2.2.0) — Yearly data overview report. Generates heatmap of data types × years, annual data volume trends, data type distribution, device source breakdown, and automated analysis strategy recommendations. Chinese data type name mapping included.yearly_stats.py (v2.2.0) — Yearly statistics extractor using streaming XML parsing. Produces yearly_stats.json with per-year record counts by data type.data_exploration.py (v2.2.0) — Data exploration utility for investigating swimming details, device inventory, and data type specifics. Useful for ad-hoc data inspection during analysis.health_data_types.md — Complete mapping of Apple Health data type identifiers to human-readable names, units, expected ranges, and analysis notes.analysis_templates.md — Statistical analysis templates for each module, including formulas, reference ranges, and insight generation patterns.(Reserved — all output is generated dynamically. No static assets required.)
This section clarifies which features are fully implemented in the scripts vs. described in this document as guidelines for the LLM to implement via custom code during analysis.
| Feature | Script | Status |
|---|---|---|
| --------- | -------- | -------- |
| Streaming XML parse + profiling | parse_health_xml.py | Done |
| Unicode NFKC normalization | parse_health_xml.py | Done |
| Pattern-based source classification | parse_health_xml.py | Done |
| Step/distance/energy source deduplication | parse_health_xml.py | Done |
| Sample standard deviation (Bessel's correction) | parse_health_xml.py | Done |
| Sleep multi-source deduplication | generate_dashboard.py | Done |
| Sleep night-date attribution (18:00 cutoff) | generate_dashboard.py | Done |
Data-relative recent_n_days() | generate_dashboard.py | Done |
| Adaptive x-axis scaling | generate_dashboard.py | Done |
| Body fat smart % detection | generate_dashboard.py | Done |
| Data date range in dashboard header | generate_dashboard.py | Done |
| Offline-capable HTML (Plotly embedded) | generate_dashboard.py | Done |
| Activity module (steps, flights) | generate_dashboard.py | Done |
| Heart rate module (RHR, HRV, HR range, VO2Max) | generate_dashboard.py | Done |
| Sleep module (duration, stages) | generate_dashboard.py | Done |
| Workout module (types, frequency) | generate_dashboard.py | Done |
| Menstrual cycle module | generate_dashboard.py | Done |
| Body composition module (weight, body fat) | generate_dashboard.py | Done |
| Feature | Script | Status |
|---|---|---|
| --------- | -------- | -------- |
| Full Chinese localization (all labels/legends/tooltips/axes) | health_analysis.py, sleep_analysis_dashboard.py, yearly_analysis_report.py | Done |
| Comprehensive health analysis report (HR/HRV/sleep/workout/menstrual) | health_analysis.py → health_report.html | Done |
| Swimming depth analysis (SWOLF, stroke distribution, water temp, progress) | health_analysis.py | Done |
| Cross-correlation analysis (sleep→recovery, deep sleep→HRV) | health_analysis.py | Done |
| Personal dynamic baselines (P25-P75 percentile self-assessment) | health_analysis.py | Done |
| Stress warning system (RHR↑ + HRV↓ dual-indicator detection) | health_analysis.py | Done |
| Actionable health insights (data-driven recommendations) | health_analysis.py | Done |
| Sleep-focused dashboard (stages/efficiency/scoring) | sleep_analysis_dashboard.py → sleep_analysis_report.html | Done |
| Pregnancy period comparison (before/during/after three-phase analysis) | sleep_analysis_dashboard.py | Done |
| Sleep physiological indicators (RHR/HRV/SpO2/respiratory rate/wrist temp) | sleep_analysis_dashboard.py | Done |
| Multi-source sleep deduplication (in sleep dashboard) | sleep_analysis_dashboard.py | Done |
| Yearly data overview (heatmap, type distribution, device breakdown) | yearly_analysis_report.py → yearly_analysis_report.html | Done |
| Chinese data type name mapping (22+ types) | yearly_analysis_report.py | Done |
| Analysis strategy recommendations (auto-generated from data distribution) | yearly_analysis_report.py | Done |
| Yearly statistics extraction (streaming XML → JSON) | yearly_stats.py → yearly_stats.json | Done |
| Data exploration utility (swimming/device/type inspection) | data_exploration.py | Done |
| Feature | Notes |
|---|---|
| --------- | ------- |
| Mobility module visualization | Parser extracts data to CSV; dashboard generator not yet implemented |
| Audio exposure module visualization | Parser extracts data to CSV; dashboard generator not yet implemented |
| GitHub-style calendar heatmap | Described in guidelines; implement with Plotly heatmap if user wants |
| Bedtime/waketime scatter plot | Implement from sleep_analysis.csv data |
| Heart rate zone distribution | Implement using age-based HR zones from analysis_templates.md |
| Weekday vs weekend comparison charts | Stats templates available; charts not auto-generated |
| Year-over-year overlay | Implement for users with 2+ years of data |
| Data density indicator on charts | Nice-to-have background heatmap |
| Large gap (>30 days) dotted line | Currently draws solid lines across all gaps |
| Full timezone parsing | Current: timezone truncated; works for single-timezone users |
| Tab navigation in dashboard | All modules displayed vertically; tabs not yet implemented |
Always include in generated reports:
共 1 个版本