← 返回
AI智能 Key 中文

Free Scaling

$0 test-time scaling with online learning. Classify, generate, and verify using free model ensembles. Models self-select via ELO scoring + A/B testing from d...
零成本测试时扩展结合在线学习。利用免费模型集成进行分类、生成与验证。模型通过ELO评分和A/B测试自动选择。
isotrivial
AI智能 clawhub v3.3.1 1 版本 99855.7 Key: 需要
★ 0
Stars
📥 692
下载
💾 18
安装
1
版本
#latest

概述

Free Scaling

$0 test-time scaling infrastructure using NVIDIA NIM free tier.

Three patterns, one API key:

from free_scaling import scale, generate, health

# Classify — vote on labels
result = scale("Is this safe?", context=code, k=3,
               answer_patterns=["SAFE", "VULNERABLE"])

# Generate — best-of-k with cross-evaluation
result = generate("Summarize this paper.", context=paper, k=3)

# Verify — just scale() with source+output as context
check = scale("Any hallucinated claims?",
              context=f"Source:\n{src}\n\nOutput:\n{draft}",
              k=3, answer_patterns=["YES", "NO"])

Setup

  1. Get a free API key at build.nvidia.com
  2. export NVIDIA_API_KEY="nvapi-..."
  3. No pip install — stdlib only (Python 3.10+)

Core API

scale(question, context, k, answer_patterns) → CascadeResult

Classification via ensemble voting. Ask k models, majority wins.

result = scale(
    "Is this email urgent? Answer URGENT, NORMAL, or IGNORE.",
    context=email_body,
    k=3,
    answer_patterns=["URGENT", "NORMAL", "IGNORE"]
)
result.answer       # "NORMAL"
result.confidence   # 1.0
result.calls_made   # 3
result.elapsed_s    # 1.8

Parameters:

  • question — what to judge (should end with "Answer X or Y")
  • context — material to evaluate (placed in system message)
  • k — models to query: 1, 3, 5, or "auto" (smart cascade)
  • answer_patterns — expected answers (e.g. ["YES", "NO"])
  • models — override model selection (list of aliases)

generate(question, context, k) → GenerateResult

Best-of-k generation with cross-evaluation. Round 1: k models generate. Round 2: k different models judge which is best.

result = generate(
    "Summarize this email in 2 sentences.",
    context=email_text,
    k=3,
    max_tokens=200,
)
result.output          # winning summary
result.all_outputs     # all 3 summaries
result.winner_model    # "llama-3.3"
result.judge_votes     # ["2", "2", "2"]
result.total_calls     # 6 (3 gen + 3 judge)

scale_batch(items, k) / generate_batch(items, k)

Parallel batch versions. Each item is a dict with question, context, answer_patterns.

results = scale_batch([
    {"question": "Urgent?", "context": e, "answer_patterns": ["YES", "NO"]}
    for e in emails
], k=3)

health(models=None) → dict

Probe models. Returns status per model (ok/dead/slow/error + latency).

status = health()  # all models
status = health(models=["llama-3.3", "gemma-27b"])  # specific

Dead models are auto-skipped in subsequent calls and retried after 5 minutes.

Online Learning (v3.3)

Models self-select through deployment data. No manual benchmarking needed.

from free_scaling import elo, feedback
from free_scaling.evolve import evolve, report

# Every scale() call automatically:
# 1. Logs votes to ELO tracker
# 2. Runs 1 shadow challenger for A/B data
# 3. Logs result for user feedback resolution

# Check current rankings
print(elo.summary())

# User feedback (4× stronger than consensus signal)
feedback.resolve_by_reaction("discord-msg-id", "👍")   # confirm
feedback.resolve_by_reaction("discord-msg-id", "🅱️")   # Panel B wins
feedback.resolve_by_reaction("discord-msg-id", "🔴")   # override to URGENT

# Weekly panel evolution
result = evolve(dry_run=True)   # check if panel should change
result = evolve(dry_run=False)  # apply the change

How it works:

  • Consensus: models that agree with majority get +ELO (K=16)
  • Override: user feedback is 4× stronger (K=64)
  • Shadow challenger: 1 extra model per call for free A/B data
  • Evolution: top-3 by ELO become champion panel (requires 30+ calls/model)

Smart Features

  • Online learning: ELO-based model scoring from deployment data (see above)
  • A/B testing: shadow challengers run alongside panel for competitive signal
  • Auto-heal: 404/410 models get marked dead, substituted with same-tier alternatives, retried after 5min TTL
  • Context routing: context goes in system message, question stays in user message
  • Parallel short-circuit: submits all k models in parallel, cancels remaining when first 2 agree
  • Task classification: k="auto" classifies the question type and routes to the best expert
  • Copilot integration: cp-* aliases route automatically through GitHub Copilot API
  • User feedback loop: Discord reaction → ELO update (👍 confirm, 🅰️🅱️ A/B, 🔴🟡⚪ override)
  • Error isolation: batch functions catch per-item failures without killing the batch

13 Models Included

TierModelsLatency
-----------------------
Fastllama-3.3 70B, gemma-27b, nemotron-super-49b, dracarys-70b, jamba-mini<1s
Mediummistral-large 675B, kimi-k2, qwen-397b, llama-405b, mistral-medium1-3s
Thinkingdeepseek-v3.1, minimax-m2.5 🧠, kimi-k2.5 🧠3s+

All free via NVIDIA NIM. One API key covers everything.

CLI

python3 -m nim_ensemble.cli scale "Is this safe?" -k 3 --answers "SAFE,VULNERABLE"
python3 -m nim_ensemble.cli models     # list available models
python3 -m nim_ensemble.cli panels     # list panels

Capability Profiling (optional)

Profile models on your tasks for data-driven routing:

python3 -m nim_ensemble.capability_map --models llama-3.3 gemma-27b mistral-large --trials 3

Generates capability_map.json — the cascade loads it automatically.

Architecture

nim_ensemble/
├── __init__.py       # Exports: scale, generate, health, scale_batch, generate_batch
├── cascade.py        # scale(), scale_batch(), smart cascade
├── generate.py       # generate(), generate_batch(), best-of-k
├── voter.py          # Core voting engine, NIM + Copilot backends
├── health.py         # Model probing, dead-model tracking, substitution
├── models.py         # Model registry, panels
├── parser.py         # Answer extraction (thinking models, negation, word boundaries)
├── elo.py            # Online ELO scoring, model ranking
├── feedback.py       # User feedback loop (reactions → ELO updates)
├── evolve.py         # Weekly panel evolution (promote/demote by ELO)
├── cli.py            # CLI interface
├── benchmark.py      # Single-trial profiling
└── capability_map.py # Multi-trial profiling with error correlation

Requirements

  • NVIDIA_API_KEY environment variable (free at build.nvidia.com)
  • Python 3.10+ (stdlib only, no pip dependencies)
  • Optional: GitHub Copilot token for cp-* model aliases

版本历史

共 1 个版本

  • v3.3.1 当前
    2026-03-19 05:01 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,841
ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 836 📥 213,144

Google Colab GPU Runtime

isotrivial
在Google Colab GPU运行环境(T4/L4/A100/H100)上执行代码,并通过Google Drive管理持久化存储。适用于需要GPU计算的任务(机器学习训练、推理等)。
★ 0 📥 643