# Step 2a: Instrument with `wrap`

> For the full `wrap()` API reference, see `wrap-api.md`.

**Goal**: Add `wrap()` calls at data boundaries so the eval harness can (1) inject controlled inputs in place of real external dependencies, and (2) capture outputs for scoring.

---

## Data-flow analysis

Starting from LLM call sites, trace backwards and forwards through the code to find:

- **Dependency input**: data from external systems (databases, APIs, caches, file systems, network fetches)
- **App output**: data going out to users or external systems
- **Intermediate state**: internal decisions relevant to evaluation (routing, tool calls)

You do **not** need to wrap LLM call arguments or responses — those are already captured by OpenInference auto-instrumentation.

## Adding `wrap()` calls

For each data point found, add a `wrap()` call in the application code:

```python
import pixie

# External dependency data — function form (prevents the real call in eval mode)
profile = pixie.wrap(db.get_profile, purpose="input", name="customer_profile",
    description="Customer profile fetched from database")(user_id)

# External dependency data — function form (prevents the real call in eval mode)
history = pixie.wrap(redis.get_history, purpose="input", name="conversation_history",
    description="Conversation history from Redis")(session_id)

# App output — what the user receives
response = pixie.wrap(response_text, purpose="output", name="response",
    description="The assistant's response to the user")

# Intermediate state — internal decision relevant to evaluation
selected_agent = pixie.wrap(selected_agent, purpose="state", name="routing_decision",
    description="Which agent was selected to handle this request")
```

### Value vs. function wrapping

```python
# Value form: wrap a data value (result already computed)
profile = pixie.wrap(db.get_profile(user_id), purpose="input", name="customer_profile")

# Function form: wrap the callable — in eval mode the original function is
# NOT called; the registry value is returned instead.
profile = pixie.wrap(db.get_profile, purpose="input", name="customer_profile")(user_id)
```

**CRITICAL: Always use function form for `purpose="input"` wraps on external calls** — HTTP requests, database queries, API calls, file reads, cache lookups. Function form prevents the real call from executing in eval mode, so the dataset value is returned directly without making a live network request or database query. Value form still executes the real call first and only replaces the result afterwards — this wastes time, creates flaky tests, and makes evals dependent on external service availability.

The only case where value form is acceptable for `purpose="input"` is when the wrapped value is a local computation (no I/O, no side effects) that is cheap to recompute.

### Placement rules

1. **Wrap at the data boundary** — where data enters or exits the application, not deep inside utility functions.
2. **Names must be unique** across the entire application (used as registry keys and dataset field names).
3. **Use `lower_snake_case`** for names.
4. **Don't change the function's interface** — `wrap()` is purely additive, returns the same type.

### Placement by purpose

#### `purpose="input"` — where external data enters

Place input wraps at the **boundary where external data enters the app**, not at intermediate processing stages. In a pipeline architecture (fetch → process → extract → format):

- **Correct**: `wrap(fetch_page, purpose="input", name="fetched_page")(url)` using **function form** at the HTTP fetch boundary — in eval mode, the fetch is skipped entirely and the dataset value is returned; in trace mode, the real fetch runs and the result is captured.
- **Incorrect**: `wrap(html_content, purpose="input", name="fetched_page")` using value form — the HTTP fetch still runs in eval mode (wasting time and creating flaky tests), and only the result is replaced afterwards.
- **Incorrect**: `wrap(processed_chunks, purpose="input", name="chunks")` after parsing — eval mode bypasses parsing and chunking entirely.

**Principle**: `wrap(purpose="input")` replaces the _minimum external dependency_ while exercising the _maximum internal logic_. Push the boundary as far upstream as possible. **Always use function form** for input wraps on external calls — this prevents the real call from executing in eval mode.

#### `purpose="output"` — where processed data exits

Track **downstream** from the LLM response to find where data leaves the app — sent to the user, written to storage, rendered in UI, or passed to an external system. Wrap at that exit boundary.

- Don't wrap raw LLM responses — those are already captured by OpenInference auto-instrumentation as `llm_span` entries.
- Wrap the app's **final processed result** — after any post-processing, formatting, or transformation the app applies to the LLM output.
- If the app has multiple output channels (e.g., a response to the user AND a side-effect write to a database), wrap each one separately.

```python
# Final response after the app's formatting pipeline
response = pixie.wrap(formatted_response, purpose="output", name="response",
    description="Final response sent to the user")

# Side-effect output — data written to external storage
pixie.wrap(saved_record, purpose="output", name="saved_summary",
    description="Summary record saved to the database")
```

**Principle**: output wraps are observation-only — they capture what the app produced so evaluators can score it. They are never mocked or injected during eval runs.

#### `purpose="state"` — internal decisions relevant to evaluation

Some eval criteria need to judge the app's internal reasoning — not just what went in or came out, but _how_ the app made decisions. Wrap internal state when an eval criterion requires it and the data isn't visible in inputs or outputs.

Common examples:

- **Agent routing**: which sub-agent or tool was selected to handle a request
- **Plan/step decisions**: what steps the agent chose to execute
- **Memory updates**: what the agent added to or removed from its working memory
- **Retrieval results**: which documents/chunks were retrieved before being fed to the LLM

```python
# Agent routing decision
selected_agent = pixie.wrap(selected_agent, purpose="state", name="routing_decision",
    description="Which agent was selected to handle this request")

# Retrieved context fed to LLM
pixie.wrap(retrieved_chunks, purpose="state", name="retrieved_context",
    description="Document chunks retrieved by RAG before LLM call")
```

**Principle**: only wrap state that an eval criterion actually needs. Don't wrap every variable — state wraps are for internal data that evaluators must see but that doesn't appear in the app's inputs or outputs.

### Coverage check

After adding all `wrap()` calls, go through each eval criterion from `pixie_qa/02-eval-criteria.md` and verify:

1. Every criterion that judges **what went in** has a corresponding `input` or `entry` wrap.
2. Every criterion that judges **what came out** has a corresponding `output` wrap.
3. Every criterion that judges **how the app decided** has a corresponding `state` wrap.

If a criterion needs data that isn't captured, add the wrap now — don't defer.

---

## Output

Modified application source files with `wrap()` calls at data boundaries.