chore: sync Arize skills from arize-skills@597d609bfe5f07fd7d24acfdb408a082911b18fc and phoenix@746247cbb07b0dc7803b87c69dd8c77811c33f59 (#1583)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This commit is contained in:
Jim Bennett
2026-05-03 18:05:44 -07:00
committed by GitHub
parent 82b58047e0
commit c7b2aecb94
40 changed files with 1316 additions and 423 deletions

View File

@@ -14,6 +14,10 @@ CI/CD evals vs production monitoring - complementary approaches.
## CI/CD Evaluations
```python
from phoenix.client import Client
client = Client()
# Fast, deterministic checks
ci_evaluators = [
has_required_format,
@@ -23,7 +27,7 @@ ci_evaluators = [
]
# Small but representative dataset (~100 examples)
run_experiment(ci_dataset, task, ci_evaluators)
client.experiments.run_experiment(dataset=ci_dataset, task=task, evaluators=ci_evaluators)
```
Set thresholds: regression=0.95, safety=1.0, format=0.98.