This Skill consolidates six multimodal media capabilities into reusable workflows and implementation templates, all routed through SkillBoss API Hub (https://api.skillbossai.com/v1/pilot):
> Convention: All API calls go through SkillBoss API Hub /v1/pilot, which automatically routes to the optimal underlying model. Authentication uses a single SKILLBOSS_API_KEY.
1) Do you need to produce images?
2) Do you need to understand images?
3) Do you need to produce video?
4) Do you need to understand video?
5) Do you need to read text aloud?
6) Do you need to understand audio?
fetch (built-in to Node 18+):# No extra install needed for Node.js 18+
# For older environments you can use: npm install node-fetch
SKILLBOSS_API_KEYAuthorization: Bearer $SKILLBOSS_API_KEYAll examples below use this shared pilot() helper:
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
result.image_url (URL) or result.images[0].url in the response.result.audio_url.result.video_url; long-running tasks may require polling.> SkillBoss API Hub /v1/pilot automatically routes to the optimal underlying model. Use prefer to control the trade-off:
> - "quality" — best output quality
> - "price" — lowest cost
> - "balanced" — balanced quality/cost (default)
No need to specify model names manually. The hub selects the best available model for the requested capability.
Node.js minimal template
import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "image",
inputs: {
prompt: "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme",
},
prefer: "quality",
});
const imageUrl = result["result"]["image_url"];
console.log("Image URL:", imageUrl);
// Download and save the image
const imgResponse = await fetch(imageUrl);
const buffer = Buffer.from(await imgResponse.arrayBuffer());
fs.writeFileSync("out.png", buffer);
REST (curl) minimal template
curl -s -X POST "https://api.skillbossai.com/v1/pilot" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"type": "image",
"inputs": {
"prompt": "Create a picture of a nano banana dish in a fancy restaurant",
"aspect_ratio": "16:9"
},
"prefer": "quality"
}'
# Image URL is at: .result.image_url
Use case: given an image, add/remove/modify elements, change style, color grading, etc.
Node.js minimal template
import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const imageBase64 = fs.readFileSync("input.png").toString("base64");
const result = await pilot({
type: "image",
inputs: {
prompt: "Add a nano banana on the table, keep lighting consistent, cinematic tone.",
image_data: imageBase64,
image_mime_type: "image/png",
},
prefer: "quality",
});
const imageUrl = result["result"]["image_url"];
const imgResponse = await fetch(imageUrl);
const buffer = Buffer.from(await imgResponse.arrayBuffer());
fs.writeFileSync("edited.png", buffer);
Best practice: use multiple sequential calls with the previous output fed back as image_data for continuous iteration (e.g., generate first, then "only edit a specific region/element", then "make variants in the same style").
Pass these in the inputs object:
aspect_ratio: e.g. "16:9", "1:1"size: e.g. "1024x1024", "1024x576" (16:9)import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const imageBase64 = fs.readFileSync("image.jpg").toString("base64");
const result = await pilot({
type: "chat",
inputs: {
messages: [
{
role: "user",
content: [
{
type: "image_url",
image_url: { url: `data:image/jpeg;base64,${imageBase64}` },
},
{
type: "text",
text: "Caption this image, and list any visible brands.",
},
],
},
],
},
prefer: "balanced",
});
const text = result["result"]["choices"][0]["message"]["content"];
console.log(text);
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "chat",
inputs: {
messages: [
{
role: "user",
content: [
{
type: "image_url",
image_url: { url: "https://example.com/image.jpg" },
},
{ type: "text", text: "Caption this image." },
],
},
],
},
prefer: "balanced",
});
const text = result["result"]["choices"][0]["message"]["content"];
console.log(text);
Append multiple images as multiple entries in the content array; you can mix URLs and inline Base64 bytes.
inputs.import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "video",
inputs: {
prompt: "A cinematic shot of a cat astronaut walking on the moon. Include subtle wind ambience.",
duration: 8,
aspect_ratio: "16:9",
resolution: "1080p",
},
prefer: "quality",
});
const videoUrl = result["result"]["video_url"];
console.log("Video URL:", videoUrl);
// Download and save
const videoResponse = await fetch(videoUrl);
const buffer = Buffer.from(await videoResponse.arrayBuffer());
fs.writeFileSync("out.mp4", buffer);
Pass these in the inputs object:
aspect_ratio: "16:9" or "9:16"resolution: "720p" | "1080p" | "4k"duration: duration in seconds (default 8)Retry with timeout pseudocode
const deadline = Date.now() + 300_000; // 5 min
let result = null;
while (Date.now() < deadline) {
try {
result = await pilot({
type: "video",
inputs: { prompt: "...", duration: 8 },
prefer: "quality",
});
if (result["result"]["video_url"]) break;
} catch (e) {
await new Promise((resolve) => setTimeout(resolve, 5000));
}
}
if (!result) throw new Error("video generation timed out");
const videoUrl = result["result"]["video_url"];
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "chat",
inputs: {
messages: [
{
role: "user",
content: [
{
type: "video_url",
video_url: { url: "https://example.com/sample.mp4" },
},
{
type: "text",
text: "Summarize this video. Provide timestamps for key events.",
},
],
},
],
},
prefer: "balanced",
});
const text = result["result"]["choices"][0]["message"]["content"];
console.log(text);
import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "tts",
inputs: {
text: "Say cheerfully: Have a wonderful day!",
voice: "Kore",
},
prefer: "balanced",
});
const audioUrl = result["result"]["audio_url"];
console.log("Audio URL:", audioUrl);
// Download and save
const audioResponse = await fetch(audioUrl);
const buffer = Buffer.from(await audioResponse.arrayBuffer());
fs.writeFileSync("out.mp3", buffer);
Pass multiple text segments with speaker labels in the text field, using a structured format like "[Speaker1]: Hello\n[Speaker2]: Hi there".
voice field supports named voices (e.g., "alloy", "Kore", "Zephyr", "Puck").Prefix the text with style directions, e.g.: "Speak in a calm, professional tone: [your content here]".
import * as fs from "node:fs";
import { Buffer } from "node:buffer";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const audioB64 = fs.readFileSync("sample.mp3").toString("base64");
const result = await pilot({
type: "stt",
inputs: {
audio_data: audioB64,
filename: "sample.mp3",
},
});
const transcript = result["result"]["text"];
console.log(transcript);
const audioB64 = fs.readFileSync("sample.mp3").toString("base64");
const result = await pilot({
type: "chat",
inputs: {
messages: [
{
role: "user",
content: [
{
type: "audio_url",
audio_url: { url: `data:audio/mp3;base64,${audioB64}` },
},
{ type: "text", text: "Describe this audio clip." },
],
},
],
},
prefer: "balanced",
});
const text = result["result"]["choices"][0]["message"]["content"];
console.log(text);
1) Generate product images via type: "image" (specify negative space and consistent lighting in the prompt).
2) Use type: "chat" with image understanding for self-check: verify text clarity, brand spelling, and unsafe elements.
3) If not satisfied, feed the generated image into editing and iterate.
1) Generate a short video with type: "video" (include dialogue or SFX in the prompt).
2) Download and save the video.
3) Use type: "chat" with video to produce a storyboard + timestamps + narration copy; then feed the copy to type: "tts".
1) Upload meeting audio and transcribe with type: "stt".
2) Use type: "chat" to summarize or extract specific time ranges.
3) Use type: "tts" to generate a "broadcast" version of the summary.
SKILLBOSS_API_KEY environment variable.type: "image" for image generation, "chat" for understanding tasks, "video" for video generation, "tts" for speech, "stt" for transcription.prefer: "quality" for best results, "balanced" for cost efficiency.result.image_url; audio → result.audio_url; video → result.video_url; chat → result.choices[0].message.content; stt → result.text.aspect_ratio / resolution, and download promptly.voice name in inputs; use director-style prefix for tone control.共 1 个版本