--- name: phoenix-cli description: Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Use whenever the user is analyzing traces or spans, investigating LLM/agent failures, deciding what to do after instrumenting an app, building failure taxonomies, choosing what evals to write, or asking "what's going wrong", "what kinds of mistakes", or "where do I focus" — even without naming a technique. license: Apache-2.0 compatibility: Requires Node.js (for npx) or global install of @arizeai/phoenix-cli. Optionally requires jq for JSON processing. metadata: author: arize-ai version: "3.3.0" --- # Phoenix CLI ## Invocation ```bash px # if installed globally npx @arizeai/phoenix-cli # no install required ``` The CLI uses singular resource commands with subcommands like `list` and `get`: ```bash px trace list px trace get px trace annotate px trace add-note px trace-annotations delete px span list px span annotate px span add-note px span-annotations delete px session list px session get px session annotate px session add-note px session-annotations delete px dataset list px dataset get px project list px project get px annotation-config list px auth status px profile list px profile show [name] px profile create px profile use px profile edit px profile delete ``` ## Setup ```bash export PHOENIX_HOST=http://localhost:6006 export PHOENIX_PROJECT=my-project export PHOENIX_API_KEY=your-api-key # if auth is enabled ``` Always use `--format raw --no-progress` when piping to `jq`. ## Quick Reference | Task | Files | | ---- | ----- | | Look at sampled traces, spans, or sessions and write specific notes about what went wrong (no taxonomy yet) | [references/open-coding](references/open-coding.md) | | Group those notes into a structured failure taxonomy and quantify what matters | [references/axial-coding](references/axial-coding.md) | Both stages tag every artifact with one shared **coding annotation identifier** (descriptive shape, e.g. `coding-run:chatbot-context-loss-2026-05-06`) so the run is queryable, reversible, and viewable as a unit. Pass `--identifier ` explicitly on every `px` call — shell inheritance is unreliable across agent harnesses. Open coding writes notes via `px ... add-note` and records a small local JSONL sidecar at `.px/coding/.jsonl`; axial coding reads that sidecar as the deterministic handoff and records labels in `.px/coding/-axial.jsonl`. Pick the identifier once per run (see [references/open-coding.md](references/open-coding.md#coding-annotation-identifier-pick-this-first)), then share the Phoenix UI link from the wrap-up section. Revert is opt-in and runs three identifier-bound DELETEs only after explicit user confirmation. > **Workflow term vs. server annotation name.** The skill prose calls this value the **coding annotation identifier** (shell-variable hint: `CODING_ANNOTATION_IDENTIFIER`). The server-side annotation NAME used for the UI filter is unchanged — `coding_session_id` — for data compatibility with rows already written by previous runs. Don't try to rename the server-side annotation; treat the asymmetry as load-bearing. ## Workflows **"What do I do after instrumenting?" / "Where do I focus?" / "What's going wrong?"** [open-coding](references/open-coding.md) → [axial-coding](references/axial-coding.md) → build evals for the top categories. ## Reference Categories | Prefix | Description | | ------ | ----------- | | `references/open-coding` | Free-form notes against sampled traces, spans, or sessions — reach for it whenever the user wants to make sense of LLM traffic but has no failure categories yet. Includes a unit-of-analysis diagnostic so the workflow runs at the level the failure modes actually live at (trace for stateless single-shot calls, session for multi-turn agents, span for mechanical/in-isolation failures). | | `references/axial-coding` | Inductive grouping of notes into a MECE taxonomy with counts — reach for it whenever the user has observations and needs categories or eval targets | ## Auth ```bash px auth status # check connection and authentication px auth status --endpoint http://other:6006 # check a specific endpoint px auth status --profile staging # check a named profile's connection ``` ## Profiles Named profiles let you switch between multiple Phoenix instances (local, staging, cloud) without juggling environment variables. Profiles are stored in `~/.px/settings.json` (or `$XDG_CONFIG_HOME/px/settings.json`). Configuration priority (highest to lowest): CLI flags > env vars > active profile > built-in defaults. ```bash px profile list # list all profiles (shows active profile) px profile show # show the active profile's settings px profile show staging # show a named profile's settings px profile create prod --endpoint https://app.phoenix.arize.com --api-key --activate px profile create local --endpoint http://localhost:6006 --project my-app px profile use prod # switch the active profile px profile edit prod # open profile JSON in $EDITOR (validates on save) px profile delete prod --yes # delete a profile (--yes skips confirmation) ``` Use `--profile ` on any command to target a specific profile without changing the active one: ```bash px trace list --profile staging --limit 10 --format raw --no-progress | jq . px auth status --profile prod ``` `px profile create` options: `--endpoint `, `--project `, `--api-key `, `--header ` (repeatable), `--activate`. ## Projects ```bash px project list # list all projects (table view) px project list --format raw --no-progress | jq '.[].name' # project names as JSON px project get my-project --format raw --no-progress # single record by exact name px project get my-project --format raw --no-progress | jq -r '.id' # extract project id ``` `project get` exits with `ExitCode.FAILURE` (1) on a name miss and writes a `StructuredError` `{error, code: "FAILURE", hint}` to stderr in `--format json|raw`. ## Traces ```bash px trace list --limit 20 --format raw --no-progress | jq . px trace list --last-n-minutes 60 --limit 20 --format raw --no-progress | jq '.[] | select(.status == "ERROR")' px trace list --since 2025-01-15T00:00:00Z --limit 50 --format raw --no-progress | jq . px trace list --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]' px trace list --include-notes --format raw --no-progress | jq '.[].notes' px trace get --format raw | jq . px trace get --format raw | jq '.spans[] | select(.status_code != "OK")' px trace get --include-notes --format raw | jq '.notes' px trace annotate --name reviewer --label pass px trace annotate --name reviewer --score 0.9 --format raw --no-progress px trace annotate --name reviewer --label pass --identifier "" # tag with a coding annotation identifier px trace add-note --text "needs follow-up" px trace add-note --text "needs follow-up" --identifier "" # tag + upsert on identifier px trace-annotations delete --identifier "" --all -y # nuke every annotation tied to this coding annotation identifier ``` `px -annotations delete` requires `--all` or both `--start-time` and `--end-time` and emits `{deleted: true, target, filter}` on success. ### Trace JSON shape ``` Trace traceId, status ("OK"|"ERROR"), duration (ms), startTime, endTime annotations[] (with --include-annotations, excludes note) name, result { score, label, explanation } notes[] (with --include-notes) name="note", result { explanation } rootSpan — top-level span (parent_id: null) spans[] name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT"|"RERANKER"|"GUARDRAIL"|"EVALUATOR"|"UNKNOWN") status_code ("OK"|"ERROR"|"UNSET"), parent_id, context.span_id notes[] (with --include-notes) name="note", result { explanation } attributes input.value, output.value — raw input/output llm.model_name, llm.provider llm.token_count.prompt/completion/total llm.token_count.prompt_details.cache_read llm.token_count.completion_details.reasoning llm.input_messages.{N}.message.role/content llm.output_messages.{N}.message.role/content llm.invocation_parameters — JSON string (temperature, etc.) exception.message — set if span errored ``` ## Spans ```bash px span list --limit 20 # recent spans (table view) px span list --last-n-minutes 60 --limit 50 # spans from last hour px span list --since 2025-01-15T00:00:00Z --limit 50 # spans since a timestamp px span list --span-kind LLM --limit 10 # only LLM spans px span list --status-code ERROR --limit 20 # only errored spans px span list --name chat_completion --limit 10 # filter by span name px span list --trace-id --format raw --no-progress | jq . # all spans for a trace px span list --parent-id null --limit 10 # only root spans px span list --parent-id --limit 10 # only children of a span px span list --include-annotations --limit 10 # include annotation scores px span list --include-notes --limit 10 # include span notes px span list --attribute llm.model_name:gpt-4 --limit 10 # filter by string attribute px span list --attribute llm.token_count.total:500 --limit 10 # filter by numeric attribute px span list --attribute 'user.id:"12345"' --limit 10 # force string match for numeric-looking value px span list --attribute session.id:sess:abc:123 --limit 20 # colon in value OK (split on first colon only) px span list --attribute llm.model_name:gpt-4 --attribute session.id:abc --limit 10 # AND multiple filters px span list output.json --limit 100 # save to JSON file px span list --format raw --no-progress | jq '.[] | select(.status_code == "ERROR")' px span annotate --name reviewer --label pass px span annotate --name checker --score 1 --annotator-kind CODE px span annotate --name reviewer --label pass --identifier "" # tag with a coding annotation identifier px span add-note --text "verified by agent" px span add-note --text "verified by agent" --identifier "" # tag + upsert on identifier px span-annotations delete --identifier "" --all -y # nuke every annotation tied to this coding annotation identifier ``` ### Span JSON shape ``` Span name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT"|"RERANKER"|"GUARDRAIL"|"EVALUATOR"|"UNKNOWN") status_code ("OK"|"ERROR"|"UNSET"), status_message context.span_id, context.trace_id, parent_id start_time, end_time attributes input.value, output.value — raw input/output llm.model_name, llm.provider llm.token_count.prompt/completion/total llm.input_messages.{N}.message.role/content llm.output_messages.{N}.message.role/content llm.invocation_parameters — JSON string (temperature, etc.) exception.message — set if span errored annotations[] (with --include-annotations, excludes note) name, result { score, label, explanation } notes[] (with --include-notes) name="note", result { explanation } ``` ## Sessions ```bash px session list --limit 10 --format raw --no-progress | jq . px session list --order asc --format raw --no-progress | jq '.[].session_id' px session list --include-annotations --include-notes --format raw --no-progress | jq '.[].notes' px session get --format raw | jq . px session get --include-annotations --format raw | jq '.session.annotations' px session get --include-notes --format raw | jq '.session.notes' px session annotate --name reviewer --label pass px session annotate --name reviewer --score 0.9 --format raw --no-progress px session annotate --name reviewer --label pass --identifier "" # tag with a coding annotation identifier px session add-note --text "verified by agent" px session add-note --text "verified by agent" --identifier "" # tag + upsert on identifier px session-annotations delete --identifier "" --all -y # nuke every annotation tied to this coding annotation identifier ``` ### Session JSON shape ``` SessionData id, session_id, project_id start_time, end_time token_count_prompt, token_count_completion, token_count_total — cumulative across all LLM spans in the session (int, default 0) annotations[] (with --include-annotations, excludes note) name, result { score, label, explanation } notes[] (with --include-notes) name="note", result { explanation } traces[] id, trace_id, start_time, end_time ``` ## Datasets / Experiments / Prompts ```bash px dataset list --format raw --no-progress | jq '.[].name' px dataset get --format raw | jq '.examples[] | {input, output: .expected_output}' px dataset get --split train --format raw | jq . # filter by split px dataset get --version --format raw | jq . px experiment list --dataset --format raw --no-progress | jq '.[] | {id, name, failed_run_count}' px experiment get --format raw --no-progress | jq '.[] | select(.error != null) | {input, error}' px prompt list --format raw --no-progress | jq '.[].name' px prompt get --format text --no-progress # plain text, ideal for piping to AI ``` ## Annotation Configs ```bash px annotation-config list # list all configs (table view) px annotation-config list --format raw --no-progress | jq '.[].name' # config names as JSON ``` ## GraphQL For ad-hoc queries not covered by the commands above. Output is `{"data": {...}}`. ```bash px api graphql '{ projectCount datasetCount promptCount evaluatorCount }' px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }' | jq '.data.projects.edges[].node' px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | jq '.data.datasets.edges[].node' px api graphql '{ evaluators { edges { node { name kind } } } }' | jq '.data.evaluators.edges[].node' # Introspect any type px api graphql '{ __type(name: "Project") { fields { name type { name } } } }' | jq '.data.__type.fields[]' ``` Key root fields: `projects`, `datasets`, `prompts`, `evaluators`, `projectCount`, `datasetCount`, `promptCount`, `evaluatorCount`, `viewer`. ## Docs Download Phoenix documentation markdown for local use by coding agents. ```bash px docs fetch # fetch default workflow docs to .px/docs px docs fetch --workflow tracing # fetch only tracing docs px docs fetch --workflow tracing --workflow evaluation px docs fetch --dry-run # preview what would be downloaded px docs fetch --refresh # clear .px/docs and re-download px docs fetch --output-dir ./my-docs # custom output directory ``` Key options: `--workflow` (repeatable, values: `tracing`, `evaluation`, `datasets`, `prompts`, `integrations`, `sdk`, `self-hosting`, `all`), `--dry-run`, `--refresh`, `--output-dir` (default `.px/docs`), `--workers` (default 10).