← 返回
未分类

Model Migrate FlagOS

Migrate a model from the latest vLLM upstream repository into the vllm-plugin-FL project (pinned at vLLM v0.13.0). Use this skill whenever someone wants to add support for a new model to vllm-plugin-FL, port model code from upstream vLLM, or backport a newly released model. Trigger when the user says things like "migrate X model", "add X model support", "port X from upstream vLLM", "make X work with the FL plugin", or simply "/model-migrate-flagos model_name". The model_name argument uses snake_
Migrate a model from the latest vLLM upstream repository into the vllm-plugin-FL project (pinned at vLLM v0.13.0). Use this skill whenever someone wants to add support for a new model to vllm-plugin-F
众智FlagOS社区
未分类 community v1.1.0 3 版本 99200 Key: 无需
★ 0
Stars
📥 124
下载
💾 0
安装
3
版本
#latest

概述

FL Plugin — Model Migration Skill

Usage

/model-migrate-flagos <model_name> [upstream_folder] [plugin_folder]
ArgumentRequiredDefault
---------
model_nameYes
upstream_folderNo/tmp/vllm-upstream-ref
plugin_folderNocurrent working directory

Execution

Step 1: Parse arguments and validate paths

Extract from user input:

  • {{model_name}} = first argument (required, snake_case)
  • {{upstream_folder}} = second argument or /tmp/vllm-upstream-ref
  • {{plugin_folder}} = third argument or current working directory

If {{upstream_folder}} doesn't exist, ask user whether to clone it. If {{plugin_folder}} doesn't exist, error out.

→ Tell user: Confirm parsed model name and paths.

Step 2: Load references and resolve placeholders

Read these files (relative to this SKILL.md):

  • references/procedure.md — step-by-step migration procedure
  • references/compatibility-patches.md — 0.13.0 patch catalog
  • references/operational-rules.md — communication, TaskList, bash rules, resilience

The procedure references executable scripts in scripts/:

  • scripts/validate_migration.py — automated code review (Step 6)
  • scripts/benchmark.sh — benchmark verification (Step 9)
  • scripts/serve.sh — serve model locally (Step 10.1, also used for E2E)
  • scripts/request.sh — test request (Step 10.2)
  • scripts/e2e_eval.py — E2E correctness verification (Step 11)
  • scripts/e2e_test_prompts.json — test prompts for E2E (5 text + 5 multimodal)
  • scripts/e2e_config.template.json — E2E config template (copy to e2e_config.json and fill in)
  • scripts/e2e_remote_serve.sh — manage GT server on remote machine via SSH

Then investigate upstream source + HuggingFace to resolve all placeholders:

PlaceholderHow to derive
------
{{model_name}}Direct from argument
{{model_name_lower}}Lowercase of model_name (usually identical, e.g. qwen3_5) — used in file paths
{{MODEL_DISPLAY_NAME}}From upstream code or HF model card
{{ModelClassName}}From upstream model class (PascalCase)
{{model_type}}From HF config.json model_type field
{{ConfigClassName}}From upstream or derive from model_type
{{skill_root}}Absolute path to this skill's folder (the directory containing this SKILL.md)

Naming conventions vary per model — always verify from actual source, never guess.

→ Tell user: Present all resolved values. Use AskUserQuestion if anything is ambiguous.

Step 3: Execute procedure

With placeholders resolved, execute every step in procedure.md sequentially. Apply patches from compatibility-patches.md during the copy-then-patch step. Follow operational-rules.md throughout.

→ Tell user: Before starting, output a numbered plan. Report progress at each step boundary.

Scripts Reference

ScriptStepDescription
---------
validate_migration.py6Automated import/API/registration checks
benchmark.sh9vllm bench throughput with dummy weights
serve.sh10, 11Start local vLLM server (port 8122, VLLM_FL_PREFER_ENABLED=false)
request.sh10Quick smoke-test request
e2e_eval.py11Token-level comparison vs upstream GT server
e2e_test_prompts.json115 text + 5 multimodal test prompts
e2e_config.template.json11Config template (GT machine, local port, eval params)
e2e_remote_serve.sh11SSH-based GT server lifecycle (start/stop/status/logs)

Examples

Example 1: Typical new model

User says: "/model-migrate-flagos kimi_k25"
Actions:
  1. Parse → model_name=kimi_k25, defaults for upstream/plugin paths
  2. Clone upstream, find vllm/model_executor/models/kimi_k25.py
  3. Discover it wraps DeepseekV2 → follow kimi_k25 (wrapper) pattern
  4. Copy file, apply P1+P2 patches, create config bridge
  5. Register, validate, test, benchmark, serve+request
  6. E2E verification against upstream GT
Result: kimi_k25 fully working in plugin, all 11 steps passed

Example 2: Re-run after upstream update

User says: "migrate qwen3_5 again, upstream updated"
Actions:
  1. Idempotent re-run — overwrite existing files with fresh upstream copy
  2. Re-apply patches, re-validate, re-test
  3. Re-run E2E to confirm no regression
Result: qwen3_5 updated to match latest upstream, no regressions

Troubleshooting

General principle: When any runtime error occurs, first compare vLLM upstream code against both the plugin adaptation and the installed 0.13.0 environment. The diff is the fastest path to root cause. See operational-rules.md § Debugging Priority: Upstream-First for the full protocol.

ProblemTypical CauseFix
---------
ImportError after copy-then-patchMissing P1 fix (relative→absolute imports)Verify all from .xxx converted to from vllm. or from vllm_fl.
AttributeError: module 'vllm' has no attribute XAPI doesn't exist in 0.13.0Check P3 in compatibility-patches.md; stub or remove
Config not recognized by vLLMmodel_type mismatch or config bridge missingVerify _CONFIG_REGISTRY[model_type] matches HF config.json exactly
Registration has no effectClass name or import path typoCompare with existing registrations in __init__.py
Benchmark KeyError on config fieldConfig bridge missing a fieldCompare upstream config class vs bridge; add missing fields with defaults
Benchmark/Serve fails with OOM or "insufficient memory"GPUs occupied by other processesKill GPU processes: `nvidia-smi --query-compute-apps=pid --format=csv,noheader \xargs -r kill -9` then retry. Never skip these steps.
Model outputs garbled/gibberish textColumnParallelLinear used for merged projections with different sub-dimensions (TP sharding mismatch)Override __init__ to use MergedColumnParallelLinear(output_sizes=[...]). See P8 in compatibility-patches.md
AssertionError: Duplicate op nameChild class imports custom op from different module path than parentUse same import path as parent module (e.g. vllm_fl.ops.fla not vllm_fl.models.fla_ops). See P11
AttributeError on fused_recurrent_* during CUDA graph warmup__init__ override with nn.Module.__init__(self) missed attributes used by inherited _forward_coreCreate ALL attributes from parent's __init__, especially custom ops. See P12
E2E: local server not reachableserve.sh port doesn't match e2e_config.json local portEnsure both use same port (default 8122)
E2E: GT server not reachableGT machine down or docker/conda env wrongCheck e2e_remote_serve.sh status or SSH manually
E2E: early token divergence (first 5 tokens)Weight loading bug, TP sharding errorCheck load_weights, stacked_params_mapping, MergedColumnParallelLinear
E2E: late minor divergence (token #15+)Numerical noise from different op implementationsUsually acceptable; document in report
resolve_op fails with VLLM_FL_PREFER_ENABLED=falseOp not registered in dispatch, no fallbackAdd try/except fallback to flag_gems in op import code

版本历史

共 2 个版本

  • v1.1.0 v1.1.0 from flagos-ai/skills 当前
    2026-05-19 15:21 安全 安全
  • v1.0.1 Initial release
    2026-04-03 10:41 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Skill Creator FlagOS

user_e10c831c
Create new skills, modify existing skills, and validate skill quality for the FlagOS skills repository. Use this skill w
★ 0 📥 90

Flagrelease Entrance FlagOS

user_e10c831c
Full FlagRelease pipeline orchestrator. Runs the complete LLM deployment, verification, and benchmarking pipeline for mu
★ 0 📥 61

Gpu Container Setup FlagOS

user_e10c831c
Automatically detect GPU vendor, find appropriate PyTorch container image, launch with correct mounts, and validate GPU
★ 0 📥 59