Use SysOM CLI and backend envelopes as the diagnosis source of truth. This Skill
replaces the older SysOM diagnosis Skill and is the single entry point for SysOM
ECS performance and stability diagnosis.
When the user reports a symptom and has not provided fresh SysOM envelope output,
run the matching SysOM command from Domain Routing below before ad hoc Linux
inspection or manual probing. Then follow the returned agent.summary,
agent.findings[].detail/category, and agent.next_steps[]. Raw Linux commands
are bounded fallbacks only when a SysOM command is unavailable, outputs
contradict each other, or a required entity remains missing after the focused
SysOM command.
Never print, echo, or ask for AccessKey ID or AccessKey Secret values. Remote
commands perform their own authentication checks. If a command returns an
authentication or permission error, explain the error and point the user to
references/ram-policies.md; credential setup must happen outside the
conversation.
Check whether the CLI is available:
command -v sysom-osops
If it is missing, install it:
curl -fsSL --connect-timeout 1000 https://sysom-prd-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/sysom_prd/skill_cli/install.sh | sudo bash
Then verify only the binary:
command -v sysom-osops
network, or Java memory.
memory classify for unclear memory symptoms; for other domains, use the
matching documented remote action.
ok, error, command, and agent.
agent.summary, agent.findings[].detail, agent.findings[].category, and agent.next_steps[]. Keep evidence
qualifiers that change interpretation, including currentness, unavailable
direct signals, fallback evidence, and remediation preconditions.
already visible, stop and answer. Run one targeted follow-up only when a
required entity is missing or the command explicitly recommends it.
When classify returns a command in agent.next_steps[] and no root-cause
finding already contains enough evidence to answer, run the first command next.
Do not replace an Agent-visible SysOM next step with manual shell probing. Raw
Linux checks are bounded fallbacks after the SysOM next step succeeds, fails, or
times out.
Use the documented commands exactly as shown by default. Do not add raw,
debug, or backend evidence expansion flags unless the user explicitly asks for
that view.
Final answers should name evidence, root cause, owner/scope, and operational
action targets. Do not add shell snippets for verification or remediation unless
the user explicitly asks for commands. Prefer phrases such as "review dependency
and disable or upgrade the leaking component in a change window" over raw module,
cgroup, sysctl, cache-drop, or process-kill commands.
Do not include command-looking inline snippets such as module inspection/removal,
memory summary commands, cgroup file writes, cache-drop controls, sysctl changes,
or process-kill commands as default final-answer steps.
The agent view must be self-contained for diagnosis. Structured evidence is a
backend/UI view and must not be treated as the default Agent source for required
entities.
| User symptom | First route |
|---|---|
| -------------- | ------------- |
| Unclear memory issue, OOM, high RSS, file cache, shmem/tmpfs, memory cgroup, socket memory, kernel memory | sysom-osops memory classify |
| Java heap, GC, or JVM memory issue | sysom-osops memory javamem when Java is explicit; otherwise start with memory classify |
| Slow disk, high iowait, disk latency, blocked IO | sysom-osops io iofsstat, then io iodiagnose if the overview points to slow IO |
| High load, runqueue backlog, task stuck waiting for CPU | sysom-osops load loadtask or load delay based on the visible symptom |
| Packet loss, retransmits, network timeout, jitter | sysom-osops net packetdrop for loss/drop symptoms; net netjitter for latency fluctuation |
For command parameters, read references/deep-actions.md and
references/parameter-guide.md. For OS and region support, read
references/supported-environments.md. These references are Skill material; do
not use remote target file tools to open .claude/skills paths on the diagnosed
host.
Memory follows the same Core Workflow and Follow-up Rules as every domain: start
from sysom-osops memory classify, then pick the next action from visible output
or agent.next_steps[]. For choosing among memory deep actions or checking which
entity is still missing, load references/memory-triage.md (parallel to
references/non-memory-triage.md for other domains).
Choose the next memory action from visible SysOM output. Do not infer a memory mechanism from symptom wording alone.
Default command output is the Agent contract:
{
"ok": true,
"command": "sysom-osops memory classify",
"agent": {
"status": "warning",
"summary": "Concise diagnosis summary.",
"findings": [
{
"severity": "high",
"title": "Short finding title",
"detail": "Root cause, key entities, and evidence summary.",
"category": "root_cause"
}
],
"next_steps": [
{
"kind": "command",
"label": "Run focused deep diagnosis",
"command": "sysom-osops memory oom",
"reason": "The missing entity this command can fill."
}
]
}
}
agent.findings[] may contain only severity, title, detail, and
category. Required entities such as PID, cgroup, service, file path, OOM
victim, limit/current, residue, holder, or cleanup target must be written in
agent.summary or agent.findings[].detail.
category=root_cause, then highest severity, then the finding thatbest matches the user's reported symptom.
root_cause as stop-ready when visible detail contains the entitiesneeded to explain the symptom and a safe next action.
agent.next_steps[] as a priority plan, not a checklist.change remediation.
historical evidence, unavailable direct signals, fallback evidence used to
close currentness, and safety preconditions for remediation.
state both parts in the final answer. Do not reduce the conclusion to the
fallback metric alone.
extra commands to make the report comprehensive, and do not chase earlier
classify anomalies or observations unless they share the same entity and
expose a named evidence gap.
in summary or detail.
category=root_cause with the requiredentities visible, answer from that envelope. Raw Linux checks are only for
contradictions, command errors, or a clearly missing entity.
verification commands. Express remediation as dependency-aware action targets
and change-window plans unless the envelope itself provides an executable safe
next step.
only for post-change verification, name the SysOM check or metric to re-run
instead of raw Linux commands.
summary commands, cgroup file writes, cache-drop controls, sysctl changes, and
process-kill actions; describe the dependency gate and operational action
target in prose.
symptom and another SysOM domain names a stronger root cause.
state, such as killing processes, removing files, changing sysctl values, or
writing to cache-drop controls. Present those as recommendations unless the
user explicitly asks you to perform the repair.
answer when the required entities are visible.
error.code | Action |
|---|---|
| -------------- | -------- |
Sysom.TargetRequired | Ask for instance ID and region, or explain ECS metadata auto-detection requirements |
Sysom.FallbackClassify | Present the local classify result and continue only if a focused next step is available |
Sysom.PermissionDenied | Use references/ram-policies.md to explain required RAM permissions |
Sysom.AuthenticationFailure | Ask the user to configure credentials outside this session |
Sysom.InvalidParameter | Ask the user to correct the instance, region, or command parameter |
Sysom.DiagnosisVersionNotSupported | Explain that the target instance diagnosis components need an update |
Sysom.DiagnosisJsonParseFailed | Retry once only when the user still needs the same evidence |
Sysom.PollError | Retry the same focused action once when the missing evidence is still required |
| Reference | Use when |
|---|---|
| ----------- | ---------- |
references/classify-output-guide.md | Reading local memory classify output |
references/memory-triage.md | Choosing a memory deep action or checking memory entity completeness |
references/non-memory-triage.md | Routing IO, load/CPU, network, and Java diagnosis |
references/deep-actions.md | Looking up SysOM commands by domain |
references/parameter-guide.md | Validating command parameters |
references/report-interpretation.md | Interpreting envelope fields and answer shape |
references/ram-policies.md | Explaining RAM permissions |
references/supported-environments.md | Checking OS, architecture, and region support |
共 3 个版本