This skill delegates multi-hop artifact retrieval + structured entity extraction to a lightweight subagent, keeping the main agent’s context lean.
It is designed for datasets where a workspace contains many interlinked artifacts (documents, chat logs, meeting transcripts, PRs, URLs) plus reference metadata (employee/customer directories).
This version adds two critical upgrades:
1) Product grounding & anti-distractor filtering (prevents mixing CoFoAIX/other products when asked about CoachForce).
2) Key reviewer extraction rules (prevents “meeting participants == reviewers” mistake; prefers explicit reviewers, then evidence-based contributors).
Invoke when ANY of the following is true:
Without this skill: you manually grep many files, risk missing cross-links, and often accept the first “looks right” report (common failure: wrong product).
With this skill: a subagent:
Typical context savings: 70–95%.
Use this format:
Task(subagent_type="enterprise-artifact-search", prompt="""
Dataset root: /root/DATA
Question: <paste the question verbatim>
Output requirements:
- Return JSON-ready extracted entities (employee IDs, doc IDs, etc.).
- Provide evidence pointers: artifact_id(s) + short supporting snippets.
Constraints:
- Avoid oracle/label fields (ground_truth, gold answers).
- Prefer primary artifacts (docs/chat/meetings/PRs/URLs) over metadata-only shortcuts.
- MUST enforce product grounding: only accept artifacts proven to be about the target product.
""")
If product name is missing in question, infer cautiously from nearby context ONLY if explicitly supported by artifacts; otherwise mark AMBIGUOUS.
Search in this order:
1) Product artifact file(s): /root/DATA/products/ if exists.
2) Global sweep (if needed): other product files and docs that mention the product name.
3) Within found channels/meetings: follow doc links (e.g., /archives/docs/), referenced meeting chats, PR mentions.
Collect all candidates matching:
A candidate report is VALID only if it passes at least 2 independent grounding signals:
Grounding signals (choose any 2+):
A) Located under the correct product artifact container (e.g., inside products/CoachForce.json and associated with that product’s planning channels/meetings).
B) Document content/title explicitly mentions the target product name (“CoachForce”) or a canonical alias list you derive from artifacts.
C) Shared in a channel whose name is clearly for the target product (e.g., planning-CoachForce, #coachforce-) OR a product-specific meeting series (e.g., CoachForce_planning_).
D) The document id/link path contains a product-specific identifier consistent with the target product (not another product).
E) A meeting transcript discussing the report includes the target product context in the meeting title/series/channel reference.
Reject rule (very important):
Why: Benchmarks intentionally insert same doc type across products; “first hit wins” is a common failure.
If multiple VALID reports exist, choose the “final/latest” by this precedence:
1) Explicit “latest” marker (id/title/link contains latest, or most recent date field)
2) Explicit “final” marker
3) Otherwise, pick the most recent by date field
4) If dates missing, choose the one most frequently referenced in follow-up discussions (slack replies/meeting chats)
Keep the selected report’s doc_id and link as the anchor.
Extract authors in this priority order:
1) Document fields: author, authors, created_by, owner
2) PR fields if the report is introduced via PR: author, created_by
3) Slack: the user who posted “Here is the report…” message (only if it clearly links to the report doc_id and is product-grounded)
Normalize into employee IDs:
eid_*, keep it.Key reviewers must be evidence-based contributors, not simply attendees.
Use this priority order:
Tier 1 (best): explicit reviewer fields
reviewers, key_reviewers, approvers, requested_reviewersreviewers, approvers, requested_reviewersTier 2: explicit feedback authors
feedback sections that attribute feedback to specific people/IDsTier 3: slack thread replies to the report-share message
Critical rule:
participants list alone is NOT sufficient.If the benchmark expects “key reviewers” to be “the people who reviewed in the review meeting”, then your evidence must cite the transcript lines/turns that contain their suggestions.
eid_...) and exist in the employee directory if provided.1) authors first
2) key reviewers next
Return:
{
"target_product": "<ProductName>",
"report_doc_id": "<doc_id>",
"author_employee_ids": ["eid_..."],
"key_reviewer_employee_ids": ["eid_..."],
"all_employee_ids_union": ["eid_..."]
}
For each extracted ID, include:
Example evidence record:
{
"employee_id": "eid_xxx",
"role": "key_reviewer",
"evidence": [
{
"artifact_type": "meeting_transcript",
"artifact_id": "CoachForce_planning_2",
"snippet": "…Alex: We should add a section comparing CoachForce to competitor X…"
}
]
}
Return one of:
1) Cross-product leakage
Picking “Market Research Report” for another product (e.g., CoFoAIX) because it appears first.
→ Fixed by Step 2 (2-signal product grounding).
2) Over-inclusive reviewers
Treating all meeting participants as reviewers.
→ Fixed by Step 5 (evidence-based reviewer definition).
3) Wrong version
Choosing draft over final/latest.
→ Fixed by Step 3.
4) Schema mismatch
Returning a flat list when evaluator expects split fields.
→ Fixed by Output Format.
Question:
“Find employee IDs of the authors and key reviewers of the Market Research Report for the CoachForce product?”
Correct behavior:
author.reviewers/key_reviewers if present; else from transcript turns or slack replies showing concrete feedback.共 1 个版本