Files
awesome-copilot/skills/dotnet-mcp-builder/references/sampling.md
T
Adrien Clerbois 2c275f2ef9 feat(skills): add dotnet-mcp-builder, deprecate csharp-mcp-server-gen… (#1645)
* feat(skills): add dotnet-mcp-builder, deprecate csharp-mcp-server-generator

Adds a comprehensive skill for building MCP (Model Context Protocol)
servers in C#/.NET against the official ModelContextProtocol 1.x NuGet
packages. Covers both transports (STDIO, Streamable HTTP — SSE is
deprecated) and every primitive in the current MCP spec (2025-11-25):
tools, prompts, resources, elicitation (form + URL mode), sampling,
roots, completions, logging, and MCP Apps. Includes a thin .NET MCP
client reference and testing guidance (MCP Inspector + in-memory
transport for unit tests).

Steers the model toward the current stable 1.x packages instead of the
0.x previews it tends to pin by default, and enforces the STDIO
stdout/stderr trap.

Also deprecates the existing csharp-mcp-server-generator skill, which
predates ModelContextProtocol 1.0 and only covered a subset of the
current spec. Its SKILL.md now redirects users to dotnet-mcp-builder so
existing install URLs keep working without surprises.

* fix: address PR review from aaronpowell

- Delete csharp-mcp-server-generator skill (rather than deprecating it)
- Update mcp-apps.md pitfalls section to reference .NET Tool.Meta type
  instead of the serialized _meta JSON property names
- Rebuild docs/README.skills.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: remove C# MCP development plugin files

* chore: remove csharp-mcp-development plugin entry from marketplace

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 09:35:26 +10:00

4.9 KiB
Raw Blame History

Sampling

Sampling lets a tool call the LLM through the client instead of bringing its own model. The server says "summarise this for me" and the client routes the request to whatever model the user has configured (Claude, GPT, local model, anything). Costs and rate limits live with the client, not the server.

When to use sampling

  • The tool needs an LLM step (summarise, classify, draft, extract) and you don't want to ship/configure your own model in the server.
  • You want to respect the user's model choice, key, and cost preferences.
  • You're building a "meta" tool that orchestrates LLM work as part of its job (e.g. multi-step agents).

If you already have a deterministic algorithm, don't add a sampling call "for flavour" — it adds latency and cost.

Prerequisite: stateful transport

Like elicitation, sampling needs the server to call back to the client. STDIO works always; HTTP needs options.Stateless = false.

The cleanest API wraps the sampling channel as Microsoft.Extensions.AI.IChatClient, so you write code that looks like normal LLM-calling .NET:

using System.ComponentModel;
using Microsoft.Extensions.AI;
using ModelContextProtocol.Server;

[McpServerToolType]
public class SummaryTools
{
    [McpServerTool(Name = "SummarizeContent"), Description("Summarises arbitrary text using the client's LLM.")]
    public static async Task<string> Summarize(
        IMcpServer server,
        [Description("The text to summarize")] string text,
        CancellationToken cancellationToken)
    {
        ChatMessage[] messages =
        [
            new(ChatRole.User, "Briefly summarize the following content:"),
            new(ChatRole.User, text),
        ];

        var options = new ChatOptions
        {
            MaxOutputTokens = 256,
            Temperature = 0.3f,
        };

        var response = await server.AsSamplingChatClient()
            .GetResponseAsync(messages, options, cancellationToken);

        return $"Summary: {response}";
    }
}

Why this is nice:

  • Same IChatClient API the rest of the .NET AI ecosystem uses.
  • Works with Microsoft.Extensions.AI middleware (rate limiting, retries, telemetry, function calling).
  • You can swap to a direct provider in tests by injecting a different IChatClient.

Lower-level: SampleAsync

When you need full control over the request shape:

using ModelContextProtocol.Protocol;

CreateMessageResult result = await server.SampleAsync(
    new CreateMessageRequestParams
    {
        Messages =
        [
            new SamplingMessage
            {
                Role = Role.User,
                Content = [new TextContentBlock { Text = "What is 2 + 2?" }]
            }
        ],
        MaxTokens = 100,
        Temperature = 0.0f,
        SystemPrompt = "You are a precise calculator.",
        // ModelPreferences, StopSequences, IncludeContext...
    },
    cancellationToken);

string answer = result.Content
    .OfType<TextContentBlock>()
    .FirstOrDefault()?.Text ?? string.Empty;

ModelPreferences lets you hint at model selection (cost vs. speed vs. intelligence priority); the client decides the actual model.

ModelPreferences = new ModelPreferences
{
    Hints = [new ModelHint { Name = "claude" }],   // soft preference
    CostPriority = 0.2,        // 0..1
    SpeedPriority = 0.4,
    IntelligencePriority = 0.9,
}

IncludeContext

Sampling requests can ask the client to include context from the current conversation:

IncludeContext = ContextInclusion.ThisServer   // include this server's prior messages
// or AllServers, or None (default)

Useful when you need the LLM to consider what's happened in the chat so far without you re-supplying it.

Capability check

Always confirm the client supports sampling — many do not:

if (server.ClientCapabilities?.Sampling is null)
    throw new McpException(
        "This client does not support sampling. " +
        "Configure a model in the host or use a different MCP client.");

Performance notes

  • Sampling calls are network round-trips (client → its provider → back). Expect 100msmultiple seconds. Don't loop tightly.
  • Token costs are paid by the user (their API key/quota). Be conservative with MaxTokens.
  • Cancellation propagates: if the user kills the tool call, the sampling request is cancelled too.

Sampling vs. doing it server-side

Sampling (via client) Direct LLM call (server-side)
Uses the user's model + key Uses your service's key
Respects user's policy/quota Your responsibility to bill/track
Works in any host the user has Locked to the model you ship with
Higher latency (extra hop) Lower latency, direct
No secrets to manage You manage the API key

For "smart" servers shipped to many users, prefer sampling. For internal corporate servers where you want consistent behaviour and you're already paying for the model, direct is fine.