A privacy-first mobility intelligence skill developed by DataHive AI.
Use this skill to collect ride-sharing receipts from Gmail, extract structured trip data locally, and generate rich
personal insights on spending, habits, repeated routes, likely anchor locations, time-of-day patterns, and more.
You can also share an anonymized data report with DataHive AI to participate in missions and earn
rewards (missions blog).
This skill uses a unique local-processing approach that showcases the real power of OpenClaw: agent intelligence
processes sensitive source data locally and ensures that no raw data is shared externally.
Local-only by design: this skill requires a loopback OpenClaw Gateway and will fail if the Gateway URL points anywhere
else.
Generate an anonymized/shareable CSV version of the ride history when the user wants to upload it to DataHive and earn
points without exposing raw receipt emails or obvious personal identifiers.
gog CLI authenticated for the target Gmail account.gog auth list before fetching, even if the user already named an account. Which account should I use: (A) name1@example.com or (B) name2@example.com? Do not summarize as "default" or make
the user infer which accounts exist.
default exists.OPENCLAW_GATEWAY_TOKEN or ~/.openclaw/openclaw.json at gateway.auth.token (legacy fallback: gateway.token).
OPENCLAW_GATEWAY_TOKEN, OPENCLAW_GATEWAY_URL, and OPENCLAW_GATEWAY_MODEL.
data/ride-insights/emails.json stores fetched receipt emails locally and may include full HTMLreceipt content.
Gateway /v1/responses endpoint.
model.
any remote/private host.
localhost, 127.0.0.1, and ::1 are acceptedGateway hosts.
Primary artifacts:
data/ride-insights/emails.json — fetched receipt emails in one JSON array; may include full HTML receipt contentdata/ride-insights/rides.json — extracted ride records in one JSON arraydata/ride-insights/rides.sqlite — queryable SQLite database containing normalized ride fields plus extracted_ride_json, but not raw source email JSON
Retention note:
emails.json persists raw fetched receipt content until the user deletes it.rides.json and rides.sqlite persist extracted ride data locally until deleted.dataset and should be treated as potentially sensitive.
Run each step in order. Stop and report on failure.
python3 skills/ride-insights/scripts/init_db.py \
--db ./data/ride-insights/rides.sqlite \
--schema skills/ride-insights/references/schema_rides.sql
emails.jsonpython3 skills/ride-insights/scripts/fetch_emails_json.py \
--account <gmail-account> \
--after YYYY-MM-DD \
--before YYYY-MM-DD \
--max-per-provider 5000 \
--out ./data/ride-insights/emails.json
Notes:
--after / --before when not needed.references/provider_queries.json.data/ride-insights/emails.json./v1/responses into rides.jsonpython3 skills/ride-insights/scripts/extract_rides_gateway.py \
--emails-json ./data/ride-insights/emails.json \
--out ./data/ride-insights/rides.json
Notes:
OPENCLAW_GATEWAY_TOKEN when ~/.openclaw/openclaw.json already contains gateway.auth.token.
gateway.token if present, but gateway.auth.token is the expected current configpath.
OPENCLAW_GATEWAY_URL and OPENCLAW_GATEWAY_MODEL; these should be declared anywhere theskill metadata or packaging contract lists env dependencies.
Notes:
/v1/responses endpoint.data/ride-insights/rides.json after each successful extraction, so progress is checkpointed.data/ride-insights/rides.json already exists, it skips emails whose gmail_message_id is already present there.--delay-ms .interval.
data/ride-insights/emails.json andextract only the first 50.
rides.json into SQLitepython3 skills/ride-insights/scripts/insert_rides_json_sqlite.py \
--db ./data/ride-insights/rides.sqlite \
--rides-json ./data/ride-insights/rides.json
Do this as an agent action, not a dedicated insights script.
Recommended workflow:
data/ride-insights/rides.json as the primary source because it preserves the extracted ride objects directly.data/ride-insights/rides.sqlite for lightweight deterministic counts, filters, grouping, and cross-checks.PRAGMA table_info(rides) or read skills/ride-insights/references/schema_rides.sql.
Notes:
behavior, weekday/weekend habits, time-of-day patterns, outliers, and premium ride choices.
rides.json for rich per-ride context and rides.sqlite for quick factual checks; combine both when useful.phrasing like likely base, recurring destination, or commute-like pattern.
Use the bundled Python exporter when the user asks for an anonymized/shareable ride report.
python3 skills/ride-insights/scripts/export_anonymized_rides_csv.py \
--db ./data/ride-insights/rides.sqlite \
--out ./data/ride-insights/anonymized_rides.csv
Export rules:
provider, email_month, start_time_15m, end_time_15m, currency, amount, distance_km, duration_min, pickup_city, pickup_country, dropoff_city, dropoff_country.
email_date_text to month-only format like 2025-05.start_time_text and end_time_text upward to the next 15-minute bucket. Exact quarter-hours stay unchanged.distance_km and duration_min when available by reading them from extracted_ride_json; leaveblank when unavailable.
output.
.csv file in the workspace; do not paste inline CSVtext into chat.
data/ride-insights/anonymized_rides.csv. containing exactly MEDIA:./data/ride-insights/anonymized_rides.csv.
Done — I regenerated the anonymized CSV and attached the updated file. followed by the MEDIA: line.
null when unknown.skills/ride-insights/references/schema_rides.sqlskills/ride-insights/references/provider_queries.json共 1 个版本