chore: publish from staged

This commit is contained in:
github-actions[bot]
2026-05-04 04:22:49 +00:00
parent 252f342650
commit c135d1c5aa
536 changed files with 116819 additions and 294 deletions

View File

@@ -0,0 +1,66 @@
# Model Selection
Error analysis first, model changes last.
## Decision Tree
```
Performance Issue?
Error analysis suggests model problem?
NO → Fix prompts, retrieval, tools
YES → Is it a capability gap?
YES → Consider model change
NO → Fix the actual problem
```
## Judge Model Selection
| Principle | Action |
| --------- | ------ |
| Start capable | Use gpt-4o first |
| Optimize later | Test cheaper after criteria stable |
| Same model OK | Judge does different task |
```python
# Start with capable model
judge = ClassificationEvaluator(
llm=LLM(provider="openai", model="gpt-4o"),
...
)
# After validation, test cheaper
judge_cheap = ClassificationEvaluator(
llm=LLM(provider="openai", model="gpt-4o-mini"),
...
)
# Compare TPR/TNR on same test set
```
## Don't Model Shop
```python
from phoenix.client import Client
client = Client()
# BAD
for model in ["gpt-4o", "claude-3", "gemini-pro"]:
results = client.experiments.run_experiment(
dataset=dataset,
task=lambda input, _model=model: task(input, model=_model),
evaluators=evaluators,
)
# GOOD
failures = analyze_errors(results)
# "Ignores context" → Fix prompt
# "Can't do math" → Maybe try better model
```
## When Model Change Is Warranted
- Failures persist after prompt optimization
- Capability gaps (reasoning, math, code)
- Error analysis confirms model limitation