Before answering questions about pricing, rankings, or availability:
openrouter.ai/models (aggregates all providers)lmarena.ai (crowdsourced ELO, updates weekly)Don't cite specific prices, context windows, or rate limits from memory — they change quarterly.
"How do I reduce hallucinations?"
Not just "use RAG." Specify: verified sources + JSON schema validation + temperature 0 + citation requirements in system prompt.
"Should I fine-tune or use RAG?"
RAG first, always. Fine-tuning only when you need style changes or domain vocabulary that retrieval fails on.
"What hardware for local models?"
Give numbers: 7B = 8GB VRAM, 13B = 16GB, 70B = 48GB+. Quantization (Q4) halves requirements.
Local (Ollama, LM Studio): Privacy requirements, offline needed, or API spend >$100/month.
API: Need frontier capabilities, no GPU, or just prototyping.
~4 characters per token in English. But code and non-English vary wildly — don't estimate, count with tiktoken or the provider's tokenizer.
共 1 个版本