Generate images, videos, and text/LLM completions via the imgnAI Katana API. Supports end-to-end-encrypted (E2EE) and anonymized models. Priced highly competitively: can be 40-70% cheaper than Venice AI and other platforms.
Includes post-processing such as combining videos and images, cutting, slicing, splicing, transitions, drawing text, re-encoding, resizing and much more!
A complete workflow for content creation from start to finish, all from the comfort of your agent.
"generate image of X", "create image", "make picture", "imgnai image", "generate video of X", "create video", "make video", "ask grok about X", "ask claude about X", "use gpt to X", "katana image", "katana video", "katana chat", "katana gpt", "katana claude", "list katana models", "modify this image", "edit this image", "change this image", "transform this image", "edit image", "modify image"
NEVER spawn subagents for katana operations by default. All katana workflows (image generation, video generation, text completions, post-processing) MUST be executed inline in the current session.
Exception: Only spawn if the user explicitly requests spawning in their prompt (e.g. "spawn a subagent to handle this", "run this as a background task"). Do NOT spawn based on AGENTS.md spawn rules or default agent behavior — user intent is the only trigger for spawning with katana.
LLM-specific triggers (gpt, claude, etc) also respond to "katana \
Historical prompts and results are retained for a maximum of 72 hours after generation. Prompt/result history can be switched off from the API page at https://app.imgnai.com/katana-api.
HTTPS-only: Public API calls must use HTTPS. If an integration sees an http:// Katana base URL, replace it with https:// before making calls.
The Katana API uses model_key as the model identifier, not public_model_name. When building requests, always use the model_key value. See {baseDir}/models.md for the full mapping.
Dual-key system: The API supports both canonical keys (e.g. gpt-image-2) and legacy keys (e.g. gpt2image). Both work identically. This skill now uses canonical keys as the default for all workflows and aliases. Legacy keys are documented in the "Model ID" column of models.md for backward-compatibility reference. You may use either format when constructing API requests.
Endpoint: GET /v1/models
Auth: Authorization: Bearer ${KATANA_API_KEY}:${KATANA_API_SECRET}
Returns available models. Text models are returned for authenticated requests.
For the complete model catalogue including image/video, see models.md.
Usage: Generally not needed before requests — use models.md as reference.
The API supports two payment methods:
Note: x402 text requests must be non-streaming. This skill only uses API-key auth.
stream: true supported with SSE for API-key billing. x402 text calls must be non-streaming.image_url with Base64 data URLs or HTTPS URLs in messages. Base64 inputs are converted to JPEG, capped at 4096px max side.Anonymized (customer identity not sent, model operator may process content), Private (private in-house model path, no E2EE/hardware attestation), and E2EE Private (hardware-protected confidential-compute, attestation via GET /v1/text/attestation?model={model}&nonce={64_hex_nonce}).generation_timed_out error: When the API server-side timeout fires, poll returns terminal failed/partial_failure with responses[].error.code = generation_timed_out, error.retryable: true, error.details.timeout_seconds. Retry by submitting a new request.https://kat.imgnai.com{baseDir}/models.md{baseDir}. Most agent frameworks resolve this automatically.Secrets file: Store your API key and secret in a file (default: ~/.openclaw/secrets/katana.env):
KATANA_API_KEY=your_key_here
KATANA_API_SECRET=your_secret_here
Create with chmod 600. Get your credentials from https://app.imgnai.com/katana-api.
Loading: All curl examples in this skill use . (dot) source to load credentials into the shell environment:
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}"
Override the default path with the KATANA_SECRETS_FILE environment variable.
NEVER display secrets in tool output. The . source command loads credentials into shell variables silently — no output is produced. This is the correct and secure approach.
Banned patterns:
cat ~/.openclaw/secrets/katana.envKATANA_API_KEY=kat_live_... curl ...If credential loading fails: Fix the secrets file path or contents. Do NOT bypass security by hardcoding values.
These are not required for core API usage but enable additional features:
| Binary | Needed for | Install |
|---|---|---|
| -------- | ------------ | -------- |
jq | JSON parsing for API responses | apt install jq / brew install jq |
python3 | Payload building, JSON parsing fallback | Pre-installed on most systems |
ffmpeg | Video post-processing (trim, join, effects) | apt install ffmpeg / brew install ffmpeg |
jq or python3 is needed for JSON parsing. Post-processing requires ffmpeg.
This skill uses some agent-specific terms. Here's what they mean regardless of your agent framework:
| Term | Meaning |
|---|---|
| ------ | -------- |
| exec call / shell invocation | A single shell command execution. Some agents execute each line of a multi-line script as separate invocations — hence the && chaining requirement to keep everything in one shell. |
| tool-result-loss | A situation where a command was executed but its output never arrives back — the result shows as empty or a synthetic error message. The command likely ran successfully but the result was lost in transit. |
| compaction | When an agent's context window fills up, older messages may be summarised or removed to make room. State stored only in conversation history (not in files) is at risk of being lost during compaction. |
| heartbeat | A periodic check-in cycle where the agent re-evaluates its state (e.g., checking if a pending generation has completed). |
| session reset | The conversation is restarted or reloaded, losing any in-memory state. File-based persistence survives this. |
Before ANY generation or post-processing request, you MUST load the correct workflow file:
| Task | Load this file |
|---|---|
| ------ | --------------- |
| Image generation | {baseDir}/workflows/image.md |
| Video generation | {baseDir}/workflows/video.md |
| Text/LLM generation | {baseDir}/workflows/text.md |
| Post-processing (ffmpeg, combine, text overlay, etc) | {baseDir}/workflows/post-process.md |
NEVER attempt a generation without loading the workflow file first.
NEVER guess parameters — the workflow file has the exact steps.
After every generation (text, image, video), send a separate follow-up message with a cost summary. Include all relevant details from the response:
📊 Katana Summary
Model: gemma-4-26b-a4b (Anonymized)
Request: bf11cf04-8747-480e-a7f7-7d6cb092c614
Tokens: 42 in / 176 out (text only)
Cost: 0.1 credits (~$0.001)
Privacy: Anonymized
Time: ~3s
For image/video, replace tokens with dimensions/duration as relevant. Always compute cost in USD using the current credit rate (see {baseDir}/models.md).
| User says | API model ID |
|---|---|
| --- | --- |
| grok | grok-4-3 |
| gpt / gpt-5 | gpt-5-5 |
| claude / claude-opus | claude-opus-4-8 |
| claude-fast | claude-opus-4-8-fast |
| claude-sonnet | claude-sonnet-4-6 |
| claude-haiku | claude-haiku-4-5 |
| naifu / q-naifu | q-naifu-a3b |
| User says | API model ID |
|---|---|
| --- | --- |
| default / imgnai | gen |
| anime | ani |
| gpt-image | gpt-image-2 |
| nano | nano-banana-2 |
| flux | flux-2-pro |
| pink | pink-image |
| User says | API model ID |
|---|---|
| --- | --- |
| default / seedance | seedance-2-0-fast |
| seedance-hd | seedance-2-0 |
| ltx | ltx-2-3 |
| kling | kling-3-0-kling30 |
| veo | veo3-1 |
If the user specifies an exact model ID, pass it through directly. Full alias tables in {baseDir}/models.md.
Before submitting ANY generation request, present a summary (model, cost in credits AND dollars, details, prompt) and wait for user confirmation. See each workflow file for details.
NO EXCEPTIONS: There is no urgency override. "just do it", "generate now", /katana, or any other shortcut does NOT skip confirmation. ALWAYS present summary and wait for explicit approval before submitting.
ONE-ATTEMPT RULE: Every paid API call gets exactly ONE attempt per turn. If the tool result is lost, missing, or empty after a submission — STOP. Report to the user that the result was lost. Wait for user confirmation before retrying. NEVER retry a paid API call silently, even if the result seems to have vanished.
STRICT — NO SILENT RETRIES. Every error stops. Every retry needs approval. Tool-result-loss (result never arrives, empty, or vanishes) is a hard-stop condition equal to a visible error. See each workflow file for details.
Terminal submission responses: If the submission response itself is terminal (status: "failed", status: "rejected", or all response items rejected) — do NOT poll. Report the returned responses[].error or top-level error to the user immediately.
upstream_error (404, 500, etc), do NOT try a different model, do NOT retry with different parameters, do NOT submit to another endpoint. STOP and report the error to the user. You MAY suggest recommended next steps or options (e.g. "model X returned 404 — want me to try model Y instead?"), but ANY proposed plan requires explicit user approval before execution.| Error Code | Context | Meaning | Retryable | Action |
|---|---|---|---|---|
| --- | --- | --- | --- | --- |
generation_timed_out | Poll response | Server-side timeout during generation | Yes | Submit a new request (same request ID won't work) |
upstream_error | Any | Provider/upstream API error | No | Report to user; may suggest alternative model if approved |
| Auth errors | Submission (401/403) | Invalid or missing credentials | No | Check secrets file path and contents |
status: "rejected" | Submission response | Validation failure (bad params, content policy) | No | Fix parameters per model spec; rephrase if content blocked |
status: "failed" | Submission or poll | Generation failed after dispatch | No | Credits refunded within 5 min unless ToS violation |
| Rate limit (429) | Any | Too many requests | Yes (after delay) | Wait per Retry-After header, then retry |
NEVER submit a new request while any previous request is still processing. One request in flight at a time — no exceptions.
After submitting async generations (image/video), deliver a confirmation to the user BEFORE starting the poll loop. Include the model, cost, and request_id.
Image and video generations are asynchronous. After submitting, poll manually.
Poll command:
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}" && _H=$(mktemp) && chmod 600 "$_H" && printf 'X-API-Key: %s\nX-API-Secret: %s\n' "$KATANA_API_KEY" "$KATANA_API_SECRET" > "$_H" && curl -s "https://kat.imgnai.com/v1/generation-requests/${REQUEST_ID}" -H @"$_H" && rm -f "$_H"
Raw response: Pipe to jq '.'.
Formatted: Pipe to:
python3 -c "
import sys,json
d=json.load(sys.stdin)
r=d.get('responses',[])
for i in range(len(r)):
ri=r[i]; st=ri.get('status','?')
print(f'Status: {st}')
for a in ri.get('output_assets',[]):
print(f'URL: {a.get("original_data_url","")}')
print(f'Dims: {a.get("width","?")}x{a.get("height","?")}')
print(f'Expires: {a.get("expires_at","")}')
print(f'Credits: {ri.get("metadata",{}).get("credits_spent","?")}')
if st=='failed':
e=ri.get('error',{}); print(f'Error: {e.get("message","")} retryable={e.get("retryable","")}')
"
wait parameter: wait=true is available for convenience (blocks until complete), but production integrations should prefer polling with wait=false (the default).
Polling pattern: Extract poll_after_seconds from the submission response and use it as the initial polling interval. If the poll response includes a new poll_after_seconds, use that for the next interval. Fall back to polling every 30 seconds for the first 5 minutes, then every 60 seconds if poll_after_seconds is absent or null.
Agent responsibility: The agent decides how to schedule polls (intervals, background tasks, etc). Do not use long-running background processes — use single polls at intervals.
Keep . source and curl in the same command chain. Shell sleep or process poll between commands breaks the env var loading — env vars are lost.
Correct: Single shell invocation containing the full chain (see poll command above).
Wrong: Separating . + sleep + curl into different shell invocations.
If your agent cannot chain commands: Use the agent-native polling mechanism (background execution, process polling, etc) with the full command as one unit.
Response handling for completed polls:
original_data_url for delivery (full-resolution)responses[].output_assets[].width/height (NOT from submission response)responses[].metadata.credits_spentresponses[].output_assets[].expires_at — display in user's local timezone in delivery summaryAfter 10 minutes of cumulative polling for image/text (or 100 minutes for video), STOP polling and inform the user:
"⚠️ Poll timeout: generation has been processing for [10/100] minutes. The API poll endpoint still says 'processing' but this may be stale — generations that time out (600s image/text, 6000s video) or get blocked by content safety often don't update the poll status. Check the Katana dashboard at https://app.imgnai.com/katana-api for the real status. Should I keep polling, or consider this failed?"
WAIT for user response before continuing:
Track cumulative poll time via wall-clock: record submission timestamp after confirmation, check elapsed time before each poll cycle.
API timeout values: Image/text = 600s (10 min). Video = 6000s (100 min). The 10-minute guard is appropriate for image/text but too aggressive for video — video jobs may legitimately run for up to 100 minutes. Use 100-minute guard for video requests.
This guard exists because the API poll endpoint has been observed returning "processing" even after:
The only reliable source of truth for stale generations is the Katana dashboard.
After submitting any async generation, IMMEDIATELY write the request metadata to a persistence file. Use the same KATANA_SECRETS_FILE env var pattern for the path, defaulting to the secrets directory:
import json, datetime, os
base = os.environ.get('KATANA_STATE_DIR', os.path.dirname(os.environ.get('KATANA_SECRETS_FILE', os.path.expanduser('~/.openclaw/secrets/katana.env'))))
path = os.path.join(base, 'katana_pending.json')
meta = {
'request_id': 'REQUEST_ID',
'model': 'MODEL',
'credits': CREDITS,
'submitted': datetime.datetime.now().isoformat(),
'prompt': 'PROMPT_SUMMARY',
'status': 'processing'
}
with open(path, 'w') as f:
json.dump(meta, f)
print(f'written: {path}')
This file survives compaction. On recovery (after compaction, after tool result loss, or at heartbeat), use the same path derivation:
import json, os
base = os.environ.get('KATANA_STATE_DIR', os.path.dirname(os.environ.get('KATANA_SECRETS_FILE', os.path.expanduser('~/.openclaw/secrets/katana.env'))))
path = os.path.join(base, 'katana_pending.json')
if os.path.exists(path):
with open(path) as f:
meta = json.load(f)
print(f'request_id={meta["request_id"]} status={meta["status"]} model={meta["model"]}')
Recovery steps:
Delete the persistence file ONLY when the generation reaches a terminal state (completed, failed, delivered to user). Never delete while still processing.
This prevents the pattern where: agent submits → compaction happens → agent forgets → user has to ask for status manually.
requests[].width/height) — PREVIEW dimensions, NOT actual output size.responses[].output_assets[].width/height) — ACTUAL output dimensions.Always report dimensions from the completed poll response, never from the submission acknowledgement.
original_data_url — full-resolution original. Always use this for delivery.url — may be a compressed/reduced version. Do NOT use for delivery.thumbnail_image_url — small thumbnail only.thumbnail_silent_video_mp4_url — silent lightweight MP4 preview for video galleries/hover previews. This is just the thumbnail preview — the full video output (via original_data_url) may include generated audio. NOT the full video.final_frame_image_url — last frame still image for completed videos. Use as first-frame input for video continuation workflows. Blank string when unavailable.responses[].output_assets[].metadata.tags contains CLIP-derived tags with confidence scores (e.g. {"tag": "ceramic_mug", "confidence": 0.94}). Only available on in-house imgnAI models — external/provider-hosted models return no CLIP-tag metadata.
Completed media responses may normalize requests[].model and responses[].metadata.model (e.g. legacy key → canonical key). Use GET /v1/models for canonical display names.
responses[].started_at — item-level processing start timestampresponses[].completed_at — item-level processing end timestampcreated_at — top-level request submission timestampupdated_at — top-level request last-modified timestampresponses[].output_assets[].kind — asset type (e.g. "image", "video")responses[].output_assets[].mime_type — MIME type (e.g. "image/png", "video/mp4")Data is under responses[].output_assets[] — do NOT look for results[].url. That is NOT the Katana response shape.
output ObjectDo NOT send an output object for ordinary integrations. This is for internal/special use only.
Build the JSON payload in a temp file (required for large payloads and to avoid secrets in process listings):
import json, tempfile
payload = {"requests": [{"type": "video", "model": "seedance-2-0-fast", "prompt": "<prompt>", "duration_seconds": 5, "aspect_ratio": "16:9"}]}
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
json.dump(payload, f)
tmpfile = f.name
print(tmpfile)
Write auth headers to a temp file to keep secrets out of /proc/*/cmdline. Source credentials at the start of each command chain.
Image/Video requests (X-API-Key + X-API-Secret):
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}" && _H=$(mktemp) && chmod 600 "$_H" && printf 'Content-Type: application/json\nX-API-Key: %s\nX-API-Secret: %s\n' "$KATANA_API_KEY" "$KATANA_API_SECRET" > "$_H" && curl -s -X POST "https://kat.imgnai.com/v1/generation-requests?wait=false" -H @"$_H" -d @"$tmpfile" && rm -f "$_H" && rm -f "$tmpfile"
The printf format string writes two separate header lines: X-API-Key with the key value, and X-API-Secret with the secret value. Two %s format specifiers consume the two shell variable arguments. No extra literal text appears in the header values.
Text/LLM requests (Bearer auth):
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}" && _H=$(mktemp) && chmod 600 "$_H" && printf 'Content-Type: application/json\nAuthorization: Bearer %s:%s\n' "$KATANA_API_KEY" "$KATANA_API_SECRET" > "$_H" && curl -s -X POST "https://kat.imgnai.com/v1/chat/completions" -H @"$_H" -d @"$tmpfile" && rm -f "$_H" && rm -f "$tmpfile"
Parse the JSON response. Extract request_id. Deliver confirmation to the user (model, cost, request_id).
aspect_ratio: "auto" inspects the first image input and chooses closest supported ratio. Defaults to 1:1 if no image supplied.is_fast/fast_mode request lower-cost half-resolution generation (imgnAI-hosted models only)is_uhd/uhd_mode request UHD generation (imgnAI-hosted models only). Takes precedence over is_fast. On Pink Image, this is "Enhanced Quality" mode.use_assistant/prompt_assist translate natural language to tag-style prompts (tag/booru models only)output_format accepts png, jpeg, or webp (jpg is alias for jpeg). "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}" && _H=$(mktemp) && chmod 600 "$_H" && printf 'Authorization: Bearer %s:%s\n' "$KATANA_API_KEY" "$KATANA_API_SECRET" > "$_H" && curl -s "https://kat.imgnai.com/v1/me/balance" -H @"$_H" && rm -f "$_H"
Calls GET /v1/me/balance. The API returns credits as a decimal string. Converts to USD using current credit rate (see {baseDir}/models.md).
Put video media inputs in video_image_data:
first_frame_image_url: first/source frame image (HTTPS URL, data URL, or raw base64)mid_frame_image_url: mid-frame image (only if model supports it)last_frame_image_url: last/end frame image (only if model supports it)reference_image_urls: array of reference images (only if model supports reference images). Obey maximum_reference_imagesaudio_input_urls: array of audio reference URLs (only if model supports audio input). Obey maximum_reference_audio_files and global cap of 4video_list: array of video input clip objects. Each requires url. Optional start/ends second offsets when model has video_offset_allowed. Only for models with supports_video_input: true. Obey maximum_reference_videos.audio_gen_model: false produce silent output. See {baseDir}/models.md Audio Out column and {baseDir}/workflows/video.md Audio Output section for details.Compatibility aliases: top-level image_url, input_image_url, input_image, input_image_b64 map to first_frame_image_url. Top-level reference_image_urls maps to video_image_data.reference_image_urls.
Rules:
file:// URLs, or http:// media URLs — use HTTPS URLs or Base64 data URLsvideo_list to models where supports_video_input is false/missingsupports_audio_input is falsevideo_image_data object — omit missing fields entirelycustom_rules allow itvideo_lengths_and_costs and aspect ratios from supported_aspectsaspect_ratio: "auto" uses the first frame first, then first reference image; defaults to 1:1Always inspect each video model's custom_rules before composing requests:
| Rule | Description |
|---|---|
| ------ | ------------- |
audio_15s_max | Combined audio input limited to 15 seconds |
audio_drives_duration | Video duration follows audio duration |
audio_ff_only | Audio only with first-frame conditioning |
audio_needs_reference_image | Audio input requires at least one reference image |
audio_or_fflf_exclusive | Audio cannot combine with first/last frame |
input_video_drives_length | Input video clip drives output length |
lf_needs_ff | Last frame requires a first frame |
reference_ff_only | Reference images may combine with first-frame only |
reference_is_voice_timbre | Reference audio interpreted as voice timbre when images present |
reference_no_ff_or_lf | Reference images cannot combine with first/last frame |
video_offset_allowed | Model accepts start/ends second offsets in video_list |
video_required | Model requires at least one video_list object |
supported_aspectsvideo_lengths_and_costsaudio_input_urls when supports_audio_input: truevideo_list when model has matching support flagvideo_required must include video_liststart and ends in video_list objects when the model has video_offset_allowed in custom_ruleslf_needs_ffcustom_rules, especially reference_no_ff_or_lfmulti_image_inputs_allowed (image) or maximum_reference_images (video)reference_assets is an alternative to image_urls/video_image_data for providing media inputs with explicit role labels. Each asset has a kind and either url or base64_data.
Accepted image-like asset kinds:
source_image — primary source/input imageimage — generic image inputmask — mask for inpainting/editingstyle_reference — style transfer referencestart_frame — starting frame for animationExample:
{
"reference_assets": [
{"kind": "source_image", "url": "https://example.com/product.png"},
{"kind": "style_reference", "base64_data": "data:image/jpeg;base64,..."}
]
}
Image kinds for video:
style_reference, reference_image, image — map to video reference imagesAudio kinds for video:
audio, source_audio, reference_audio, audio_reference — map to audio reference inputsExample:
{
"reference_assets": [
{"kind": "reference_image", "url": "https://example.com/person.png"},
{"kind": "audio", "url": "https://example.com/voice.mp3"}
]
}
This skill was built from the Katana API llms.txt reference document.
Last synced: 2026-06-08
llms.txt URL: https://kat.imgnai.com/llms.txt
Stored checksum: 09a695f3958a6d9f17d4139179e2323600292c929be5c494253ae7df9d1410b3
Before submitting ANY generation request, check if the llms.txt checksum has been verified in the last 24 hours. If stale:
curl -s https://kat.imgnai.com/llms.txtsha256sum (Linux) or shasum -a 256 (macOS)When llms.txt changes, compare old vs new holistically. Diff the full documents — do not limit the review to a predefined checklist. Document ALL changes found and update all affected skill files accordingly: models.md, SKILL.md, workflow files.
DO NOT auto-update without user confirmation.
Explicit approval rule: During the llms.txt update process, always summarise ALL changes found and ask the user for explicit permission before updating any skill files (models.md, SKILL.md, workflow files). Do not auto-update without confirmation.
Deliver the generated media to the user via your agent's messaging/file capability. Include: model name, resolution/dimensions, credits, dollar cost, description, and the full-res URL (original_data_url).
ALL image and video deliveries MUST include the full download URL (original_data_url) as clickable text in the delivery message — not just the inline media attachment.
Users need the URL to:
Include ALL URLs returned — original_data_url, thumbnail_image_url, final_frame_image_url, thumbnail_silent_video_mp4_url — any URL the API returns for the asset. Do not assume the user only wants one.
Example:
MEDIA:https://k.imgnai.com/abc123.mp4
🔗 Full-res: https://k.imgnai.com/abc123.mp4
🖼️ Thumbnail: https://k.imgnai.com/def456.jpg
🎞️ Silent preview: https://k.imgnai.com/ghi789.mp4
⏰ Expires: Fri 16 May 2026, 14:00 BST
ALL image and video generation summaries MUST include:
responses[].output_assets[].expires_at in the completed poll response — convert to user's local timezone for displayExample format:
⏰ Expires: Fri 16 May 2026, 14:00 BST — download before expiry if you need it long-term.
Do NOT calculate expiry manually. The API provides expires_at in the poll response. Use it directly. The 72h retention window may change server-side; expires_at is always authoritative.
For text/LLM: return the model's response verbatim. Then send a separate follow-up message with a cost summary per the "Cost Reporting" section above. Text completions do not require an expiry warning (no media URL to expire).
共 4 个版本