Conformance Coverage

Per-SEP coverage matrix and open gaps for mcpkit.

MCP Conformance

mcpkit’s posture against the MCP conformance suite and the per-SEP traceability manifest. Refreshed before each tagged release; CI keeps the rendering in sync on PRs.

This file has two parts:

  • Overview (hand-edited, preserved across regenerations) — what the report covers, what it does not cover, and how to read it.
  • Generated block below the begin-marker — rebuilt from npx @modelcontextprotocol/conformance tier-check --output json + src/seps/traceability.json. Do not hand-edit; changes are overwritten by scripts/refresh-conformance.sh.

How to read this

  • Conformance Summary counts wire-level scenarios from the upstream conformance suite running against cmd/testserver. Aggregate pass/total at the scenario level + check-level pass/fail totals.
  • SEP Coverage is sourced from upstream’s traceability manifest, which maps each SEP to its declared requirements. A row’s Status says whether upstream has emitted a check ID for every declared requirement — not whether mcpkit passes them. Scenario-level pass/fail per SEP lives in conformance/UPSTREAM_AUDIT.md, which grades mcpkit against every scenario upstream currently ships.
  • Open Gaps lists failing scenarios + traceability rows with no emitted check. Tracking links and one-line context come from conformance/known-gaps.yaml.

What this report is not

The renderer drops the tier-check checks that depend on live GitHub state (labels, triage SLA, P0 resolution, stable release, policy signals, spec tracking). Those are useful tier-1 signals but they change daily independent of code, which would break the CI staleness gate. To see the full tier-check scorecard for a point-in-time tier judgement, run npx @modelcontextprotocol/conformance tier-check --repo panyam/mcpkit --output markdown directly.

Regenerate locally

bash scripts/refresh-conformance.sh

Needs Node.js 22+ and a clone of modelcontextprotocol/conformance at ../conf-upstream-main (override with MCPCONFORMANCE_BASE_PATH=...). Output is deterministic — re-running on unchanged input produces a byte-identical file.


Conformance Summary

Surface Scenarios pass/total Checks pass/fail
Server 30/30 42/0
Client 0/0 0/0

mcpkit-local Conformance Suites

These suites exercise SEP-specific behavior beyond what upstream’s tier-check covers. Each is wired into make testall as a separate stage and may show as PASS, FAIL, INFO (informational, not gating), or SKIP. INFO typically means “work in flight” — see the Tracking column. The Source column links to the branch the scenarios live on; per-suite env vars and default checkout paths are listed below the table.

Suite Covers Stage Status Source Tracking
testconf-tasks-v2 SEP-2663 Tasks v2 8d PASS panyam/mcpconformance@fix/tasks-mrtr-error-codes
testconf-mrtr SEP-2322 MRTR 8e PASS panyam/mcpconformance@fix/tasks-mrtr-error-codes
testconf-file-inputs SEP-2356 File inputs 8f PASS panyam/mcpconformance@pending
testconf-auth-server MCP authz 2025-11-25 8g PASS panyam/mcpconformance@pending
testconf-stateless SEP-2575 Stateless wire - PASS1 modelcontextprotocol/conformance@main
testconf-skills SEP-2640 Skills 8h INFO2 panyam/mcpconformance@chore/sep-2640-yaml mcpkit 567

1 25 pass / 1 known upstream-test fail (array-vs-object requiredCapabilities).
2 Fixture spawns and runs cleanly. Fork-side Scenario classes blocked on WG iteration of sep-2640.yaml in panyam/mcpconformance PR 330.

Setup — clone the right worktree per suite

Each suite’s Makefile target reads MCPCONFORMANCE_*_PATH to find its scenario worktree. Defaults assume sibling clones of the source repo at the relative path shown. Override per-invocation when the worktree lives elsewhere.

Suite Env var Default path Clone command
testconf-tasks-v2 MCPCONFORMANCE_TASKS_V2_PATH ../conf-tasks-mrtr git clone -b fix/tasks-mrtr-error-codes https://github.com/panyam/mcpconformance.git ../conf-tasks-mrtr
testconf-mrtr MCPCONFORMANCE_MRTR_PATH ../conf-tasks-mrtr git clone -b fix/tasks-mrtr-error-codes https://github.com/panyam/mcpconformance.git ../conf-tasks-mrtr
testconf-file-inputs MCPCONFORMANCE_FILE_INPUTS_PATH ../conf-pending git clone -b pending https://github.com/panyam/mcpconformance.git ../conf-pending
testconf-auth-server MCPCONFORMANCE_AUTH_PATH ../conf-pending git clone -b pending https://github.com/panyam/mcpconformance.git ../conf-pending
testconf-stateless MCPCONFORMANCE_STATELESS_PATH ../conf-upstream-main git clone -b main https://github.com/modelcontextprotocol/conformance.git ../conf-upstream-main
testconf-skills MCPCONFORMANCE_SKILLS_PATH ../conf-skills git clone -b chore/sep-2640-yaml https://github.com/panyam/mcpconformance.git ../conf-skills

SEP Coverage

SEP Tested reqs Excluded Untested Status
SEP-837 1 4 0 pass
SEP-2106 1 4 0 pass
SEP-2164 2 1 0 pass
SEP-2207 1 3 0 pass
SEP-2243 18 4 2 partial
SEP-2260 0 12 0 untested
SEP-2322 17 16 0 pass
SEP-2350 1 2 0 pass
SEP-2352 3 3 0 pass
SEP-2468 6 3 0 pass
SEP-2549 7 13 0 pass
SEP-2575 22 13 0 pass

Numeric cells link to per-SEP detail below; hover/long-press surfaces a one-line summary. Status reflects upstream-declared requirements only — Scenario→SEP attribution is not exposed in tier-check JSON today; this column tracks “does upstream have a check ID for this SEP requirement”, not “does mcpkit pass it”. Per-SEP scenario pass/fail lives in conformance/UPSTREAM_AUDIT.md.

SEP Detail

Per-SEP breakdown of upstream traceability — what is exercised, what is intentionally excluded, and what is declared but not yet exercised. Useful for auditing whether each exclusion still makes sense as upstream evolves. Check IDs link to their definition in the upstream SEP YAML.

SEP-837

Tested (1)

Excluded (4)

Requirement Upstream reason
Native applications (desktop applications, mobile apps, CLI tools, and locally-hosted web applications accessed via localhost) SHOULD use application_type: “native”. harness cannot determine the client-under-test application class (native vs web) out-of-band; only presence and value validity are wire-observable
Web applications (remote browser-based applications served from a non-local host) SHOULD use application_type: “web”. harness cannot determine the client-under-test application class (native vs web) out-of-band; only presence and value validity are wire-observable
MCP clients MUST be prepared to handle registration failures due to redirect URI constraints when authorization servers implement OIDC. robustness requirement with no defined wire-level success criterion
When a registration request is rejected, clients SHOULD surface a meaningful error to the user or developer. UI/DX behavior, not protocol-observable

Untested (0)

None.

SEP-2106

Tested (1)

Excluded (4)

Requirement Upstream reason
SDK maintainers SHOULD: Document the migration in SDK release notes; Where ergonomic, provide typed helpers (e.g. generics over a tool’s outputSchema) so consumers do not need to write narrowing guards by hand Migration/deprecation guidance for SDK maintainers; about SDK source code, not protocol-observable wire behavior
JSON Schema validation already handles type checking, value constraints, and required field validation, and implementations MUST continue to validate all inputs and outputs against declared schemas Restates pre-existing schema-validation behavior and is too broad to attribute to a specific observable SEP-2106 check; input/output validation overlaps existing tool scenarios
An opt-in mode that fetches non-local $refs SHOULD enforce an allowlist of hosts (or at minimum reject loopback, link-local, and private network addresses), apply timeouts and size limits, and log dereferenced URIs Applies only when the non-default opt-in network-$ref fetch mode is enabled; not observable in default conformance runs
Implementations SHOULD apply reasonable bounds — for example, a maximum schema depth, a cap on the total number of subschemas, or a per-validation time budget — to prevent a malicious tool definition from acting as a CPU DoS vector against the validator Internal validator resource limits (max schema depth, subschema cap, per-validation time budget); a defensive measure not observable on the wire

Untested (0)

None.

SEP-2164

Tested (2)

Excluded (1)

Requirement Upstream reason
clients SHOULD also accept -32002 as a resource not found error Client-side error handling is implementation-defined; not protocol-observable

Untested (0)

None.

SEP-2207

Tested (1)

Excluded (3)

Requirement Upstream reason
MCP Servers (Protected Resources) SHOULD NOT include offline_access in WWW-Authenticate scope or Protected Resource Metadata scopes_supported, as refresh tokens are not a resource requirement The server suite does not yet exercise the SDK server as an OAuth protected resource (no Protected Resource Metadata or WWW-Authenticate probing); revisit once server-side authorization scenarios exist (https://github.com/modelcontextprotocol/conformance/issues/116)
MCP Clients that desire refresh tokens MUST keep refresh tokens confidential in transit and storage as specified in OAuth 2.1 Section 4.3 Confidentiality of refresh tokens in storage is client-internal state, and in-transit (TLS) confidentiality is not exercised by the harness over localhost HTTP; not protocol-observable
MCP Clients that desire refresh tokens MUST NOT assume refresh tokens will be issued; the AS retains discretion A client “assuming” refresh tokens will be issued is mental-state; only manifests as general authorization-flow completion, which other checks already cover; not directly protocol-observable

Untested (0)

None.

SEP-2243

Tested (18)

Excluded (4)

Requirement Upstream reason
Clients SHOULD log a warning when rejecting a tool definition due to invalid x-mcp-header, including the tool name and the reason. Log output is not wire-observable.
Server developers SHOULD NOT mark sensitive parameters (such as passwords, API keys, tokens, or PII) with x-mcp-header. Design guidance to humans; not protocol-observable.
Intermediaries MUST return an appropriate HTTP error status for validation failures. Intermediary requirement; conformance harness tests clients and servers, not intermediaries.
Intermediate servers that do not recognize an Mcp-Param-{Name} header MUST forward it and otherwise ignore it. Intermediary requirement; conformance harness tests clients and servers, not intermediaries.

Untested (2)

SEP-2260

Tested (0)

None.

Excluded (12)

Requirement Upstream reason
roots/list, sampling/createMessage, and elicitation/create requests MUST NOT be sent on standalone streams. No longer needed this behavior is enabled by default SEP-2322 MRTR.
Clients MUST return standard JSON-RPC errors for common failure cases: Server sends an elicitation/create request with no associated client-to-server request: -32602 (Invalid params) No longer needed this behavior is enabled by default SEP-2322 MRTR.
Clients SHOULD return standard JSON-RPC errors for common failure cases: Server sends a roots/list request with no associated client-to-server request: -32602 (Invalid params) No longer needed this behavior is enabled by default SEP-2322 MRTR.
Clients SHOULD return errors for common failure cases: Sampling request not associated with a client-to-server request: -32602 (Invalid params) No longer needed this behavior is enabled by default SEP-2322 MRTR.
These messages MUST relate to the originating client request. Semantic association (“relate to”) is not protocol-observable. The harness cannot determine whether a server request conceptually relates to the client request without understanding application logic.
Implementations SHOULD prefer transport-level SSE keepalive mechanisms for idle-connection maintenance. Implementation preference for keepalive mechanism choice; not observable on the wire.
Servers MUST send server-to-client requests (such as roots/list, sampling/createMessage, or elicitation/create) only in association with an originating client request (e.g., during tools/call, resources/read, or prompts/get processing). No longer needed this behavior is enabled by default SEP-2322 MRTR.
Standalone server-initiated requests of these types on independent communication streams (unrelated to any client request) are not supported and MUST NOT be implemented. No longer needed this behavior is enabled by default SEP-2322 MRTR.
Servers MUST send sampling/createMessage requests only in association with an originating client request (e.g., during tools/call, resources/read, or prompts/get processing). No longer needed this behavior is enabled by default SEP-2322 MRTR.
Standalone server-initiated sampling on independent communication streams (unrelated to any client request) is not supported and MUST NOT be implemented. No longer needed this behavior is enabled by default SEP-2322 MRTR.
Servers MUST send server-to-client requests (such as roots/list, sampling/createMessage, or elicitation/create) only in association with an originating client request (e.g., during tools/call, resources/read, or prompts/get processing). No longer needed this behavior is enabled by default SEP-2322 MRTR.
Standalone server-initiated requests of these types on independent communication streams (unrelated to any client request) are not supported and MUST NOT be implemented. No longer needed this behavior is enabled by default SEP-2322 MRTR.

Untested (0)

None.

SEP-2322

Tested (17)

Excluded (16)

Requirement Upstream reason
inputRequests keys are server assigned identifiers and MUST be unique within the scope of the request. inputRequests is a JSON object; duplicate keys are collapsed by JSON parsing before the harness can observe them, so key uniqueness is not testable at the protocol level
Servers MUST send server-to-client requests (such as roots/list, sampling/createMessage, or elicitation/create) using the MRTR pattern. Architectural migration statement; tested indirectly through all MRTR scenarios
servers MUST treat requestState as an attacker-controlled input Internal security posture; not observable at protocol level
servers MUST protect its integrity (e.g. HMAC or AEAD) Internal implementation choice about encryption/signing; not observable at protocol level
servers SHOULD include the authenticated principal, a short expiry (TTL), and an identifier for the originating request inside the integrity-protected requestState payload and verify each on receipt Internal requestState format; not observable at protocol level
Servers for which a given requestState must be consumed at most once MUST enforce that invariant server-side Internal enforcement policy; conformance harness cannot determine which servers require single-use semantics
Servers MUST NOT assume that clients will fulfill the inputRequests or retry the original request Server-internal robustness assumption; not observable at protocol level
Servers MUST validate request state as described in the server requirements above. Duplicates integrity-protection requirements above; internal security detail
Servers MUST include an inputRequests field in the tasks/result response when the task is in status input_required. Tasks moved to an extension as of SEP-2663; no longer part of core conformance
inputRequests keys are server assigned identifiers and MUST be unique within the scope of a Task. Tasks moved to an extension as of SEP-2663; no longer part of core conformance
When tasks/get shows status input_required, clients MUST call tasks/result to get the inputRequests and optional requestState. Tasks moved to an extension as of SEP-2663; no longer part of core conformance
Clients SHOULD construct the results of those requests and call tasks/input_response with the inputResponses & requestState (if present). Tasks moved to an extension as of SEP-2663; no longer part of core conformance
Receivers MUST reject tasks/input_response requests for tasks that are not in input_required status with error code -32602 (Invalid params). Tasks moved to an extension as of SEP-2663; no longer part of core conformance
When a receiver receives a tasks/result request for a task in working status, it MUST block the response until the task reaches a terminal status or input_required status. Tasks moved to an extension as of SEP-2663; no longer part of core conformance
When a receiver receives a tasks/result request for a task in input_required status, it MUST return an InputRequiredResult containing the inputRequests that the requestor must fulfill. Tasks moved to an extension as of SEP-2663; no longer part of core conformance
After sending tasks/input_response, the requestor SHOULD resume polling via tasks/get. Tasks moved to an extension as of SEP-2663; no longer part of core conformance

Untested (0)

None.

SEP-2350

Tested (1)

Excluded (2)

Requirement Upstream reason
Regardless of the approach chosen, servers SHOULD include all scopes required for the current operation in a single challenge. “All scopes required for the current operation” has no harness-observable ground truth; the challenge is the only place the server declares its requirements. Detecting the negative (incremental challenging) would need server-side auth scenarios that do not exist yet and would test the example app’s scope config rather than SDK behavior; the spec also permits dynamic per-request scope determination, so a second challenge is not conclusively non-conformant.
When responding with insufficient scope errors, servers SHOULD include the scopes needed to satisfy the current operation in the scope parameter, consistent with RFC 6750 Section 3.1. reword of pre-existing requirement (request->operation, +RFC6750 cite); no normative delta; harness already emits scope= in WWW-Authenticate

Untested (0)

None.

SEP-2352

Tested (3)

Excluded (3)

Requirement Upstream reason
Clients MUST maintain separate registration state (client credentials, tokens) per authorization server. internal storage requirement; not directly observable on the wire
Clients that use pre-registered credentials, or persist client credentials obtained via Dynamic Client Registration, MUST associate those credentials with the specific authorization server that issued them, keyed by the authorization server issuer identifier. internal state-keying requirement; not protocol-observable
If the authorization server indicated by protected resource metadata no longer matches the one the credentials were registered with, clients SHOULD surface an error rather than silently attempting to use mismatched credentials. UI behavior; the negative half (do not send mismatched credentials) is covered by sep-2352-no-reuse-on-as-change

Untested (0)

None.

SEP-2468

Tested (6)

Excluded (3)

Requirement Upstream reason
MCP authorization servers SHOULD include the iss parameter in authorization responses, including error responses, as defined in RFC9207 Section 2. Targets the authorization server under test; observing iss in an authorization response requires driving an Authorization Code Grant against the AS, which the authorization-server suite does not implement yet (it only probes the metadata endpoint). (https://github.com/modelcontextprotocol/conformance/issues/208)
Authorization servers that include the iss parameter MUST advertise this by setting authorization_response_iss_parameter_supported to true in their metadata (RFC9207 Section 2.3). Conditional on the AS actually including iss in an authorization response, which requires driving an Authorization Code Grant against the AS under test; the authorization-server suite does not implement that yet. (https://github.com/modelcontextprotocol/conformance/issues/208)
This validation applies equally to error responses - on mismatch the client MUST NOT act on or display error, error_description, or error_uri. display is UI-facing; act-on has no protocol-observable signal beyond the existing reject-on-mismatch checks

Untested (0)

None.

SEP-2549

Tested (7)

Excluded (13)

Requirement Upstream reason
If ttlMs is 0, the response SHOULD be considered immediately stale. Client-side caching behavior; not observable at the protocol level
If ttlMs is positive, the client SHOULD consider the result fresh for that many milliseconds after receiving the response. Client-side caching behavior; not observable at the protocol level
If ttlMs is absent, clients SHOULD assume a default of 0 (immediately stale) and rely on their own caching heuristics or notifications. Client-side caching behavior; not observable at the protocol level
If ttlMs is negative, clients SHOULD ignore it and treat it as 0. Client-side caching behavior; not observable at the protocol level
Once the TTL expires, the response is stale and the client SHOULD re-fetch on next access. Client-side caching behavior; not observable at the protocol level
Clients SHOULD NOT treat TTL as a polling interval that triggers automatic background refetches. Client-side caching behavior; not observable at the protocol level
Implementations that do choose to poll MUST apply jitter and backoff. Client-side polling behavior; not observable at the protocol level
Cached responses MAY be reused for the same authorization context. Caches MUST NOT be shared across authorization contexts (e.g. a different access token requires a different cache). Client/cache-side behavior; not observable at the protocol level
When a cached page expires, the client SHOULD re-fetch that page using its cursor. Client-side caching behavior; not observable at the protocol level
Clients that require a consistent snapshot of the full list SHOULD re-fetch from the beginning (without a cursor). Client-side caching behavior; not observable at the protocol level
If a cursor becomes invalid (e.g., the server returns an error for a previously valid cursor), the client SHOULD discard all cached pages and re-fetch from the beginning. Client-side caching behavior; not observable at the protocol level
Servers MUST be aware that responses with a “public” cacheScope may be shared between callers even if the Result is coming from an authenticated endpoint. Server-side awareness requirement; implementation guidance not testable via protocol messages
Server implementors MUST apply appropriate per-primitive access controls, and MUST NOT rely on cacheScope alone to prevent unauthorized access to primitives. Server-side access control; implementation guidance not testable via protocol messages

Untested (0)

None.

SEP-2575

Tested (22)

Excluded (13)

Requirement Upstream reason
A server MUST NOT treat connection or process identity as a proxy for conversation or session continuity. / Servers MUST NOT rely on prior requests over the same connection to establish context (e.g., capabilities, protocol version, client identity). internal server state, not directly wire-observable; the observable consequence (rejecting requests with incomplete _meta rather than falling back to remembered state) is covered by sep-2575-request-meta-invalid-* — see https://github.com/modelcontextprotocol/conformance/issues/296
Servers MUST NOT require that a client reuse the same connection to perform related operations. not observable from a black-box harness; every harness request already arrives on an independent connection — see https://github.com/modelcontextprotocol/conformance/issues/296
Closing the SSE response stream MUST be treated by the server as cancellation of that request. “treated as cancellation” is internal server state; once the stream is closed there is no channel left on which to observe the effect — see https://github.com/modelcontextprotocol/conformance/issues/296
The server SHOULD stop work on the cancelled request as soon as practical and MUST NOT send any further messages for it [HTTP]. “stop work as soon as practical” is unobservable from a black-box harness, and “no further messages” cannot be verified once the response stream is closed — see https://github.com/modelcontextprotocol/conformance/issues/296
State that needs to span multiple requests (e.g., long-running tasks, application-level handles) MUST be referenced by an explicit identifier the client passes on each request. architectural guidance, observable only via subscriptionId/task-id rows already listed
To distinguish notifications belonging to different concurrent subscriptions, clients MUST correlate notifications using the io.modelcontextprotocol/subscriptionId field carried in _meta. client-internal demux; not observable on the wire from the harness
The client SHOULD check the acknowledged filter against what it requested and handle any unsupported types gracefully. internal comparison; “gracefully” has no wire-observable definition
Because there is no per-request status code to drive fallback, a client that supports both eras SHOULD probe with server/discover first [stdio backward compatibility]. stdio client harness not implemented — see https://github.com/modelcontextprotocol/conformance/issues/258
To cancel an in-flight request [on stdio], the client MUST send a notifications/cancelled notification referencing the request ID. stdio client harness not implemented — see https://github.com/modelcontextprotocol/conformance/issues/258
Servers SHOULD stop work on a cancelled request as soon as practical and MUST NOT send any further messages for it [stdio]. stdio client harness not implemented — see https://github.com/modelcontextprotocol/conformance/issues/258
If the server process exits unexpectedly, the client SHOULD restart it. stdio client harness not implemented — see https://github.com/modelcontextprotocol/conformance/issues/258
If the server returns UnsupportedProtocolVersionError, [the stdio client] SHOULD retry using one of the advertised supportedVersions rather than falling back to initialize. stdio client harness not implemented — see https://github.com/modelcontextprotocol/conformance/issues/258
On stdio, if the connection is terminated and then re-established, the client MUST re-send subscriptions/listen to re-establish its subscriptions. stdio client harness not implemented — see https://github.com/modelcontextprotocol/conformance/issues/258

Untested (0)

None.

Open Gaps

Declared requirements with no emitted check

SEP Check ID Tracking
SEP-2243 sep-2243-server-not-expect-null MCP-Name fail-closed semantics on tasks/* — covered locally by server/middleware test
SEP-2243 sep-2243-server-reject-missing-required MCP-Name fail-closed semantics on tasks/* — covered locally by server/middleware test