MCP Gateway — Live End-to-End Test Report¶

Date: 2026-04-24 Branch: feat/gateway-toolkit-338 Author: end-to-end live test conducted in response to "all just code and simple unit tests" pushback on the unverified PR.

TL;DR¶

End-to-end live test surfaced three bugs that all the existing unit and integration tests had passed. Two were fixed inline (audit attribution and OAuth-token-survives-restart) with verifying tests added; two are filed as known issues for follow-up.

After the fixes, the full path — create connection via admin API → upstream tool discovery → portal-style PKCE OAuth flow → encrypted token persistence → process restart → token reload → proxied tool call → audit row with correct connection attribution → forced token expiry → automatic refresh-token rotation — works against a real running platform with a real Postgres backend.

Test environment¶

make dev — Docker Compose with Postgres (acme-dev-postgres) and SeaweedFS
mcp-data-platform binary built from feat/gateway-toolkit-338 HEAD, run with dev/platform.yaml
Mock OAuth provider at :9180 (/tmp/livetest/) — single-file Go program implementing /authorize and /token (authorization_code, refresh_token, client_credentials) with 10-second access TTL and refresh-token rotation
Mock MCP upstream at :9181 — go-sdk MCP server with three tools (echo, add, now) and bearer-auth middleware
Real MCP client (separate /tmp/livetest/client/) using go-sdk StreamableClientTransport against the platform's /mcp endpoint with the dev API key

The test was run twice: once to find bugs, once to verify the fixes. The mock servers are deterministic enough to reproduce both runs.

What was actually verified end-to-end¶

Path 1 — bearer-auth proxy (clean)¶

Step	Evidence
Create connection via `PUT /api/v1/admin/connection-instances/mcp/mockup` with `auth_mode=bearer`	`200` + saved row, `credential` field stored as `enc:...` ciphertext
Platform discovers upstream tools	`gateway: upstream connected connection=mockup tools=3` in platform log
MCP client sees `mockup__echo`, `mockup__add`, `mockup__now` in `tools/list`	All three appear next to native `trino_`, `datahub_`, `s3_`, `memory_`
Call `mockup__add a=7 b=35`	Returns `42`
Call `mockup__echo "hello-from-livetest"`	Returns `echo: hello-from-livetest`
Audit row written	`tool_name=mockup__add toolkit_kind=mcp connection=mockup success=t duration_ms=7 parameters={"a":7,"b":35}`

Path 2 — OAuth authorization_code + PKCE (full dance)¶

Step	Evidence
Create OAuth connection with grant `authorization_code`	row saved; `oauth_client_secret` stored as `enc:...`
Connection without token → "awaiting reauth" placeholder	platform log: `gateway: oauth authorization_code connection awaiting reauth`
`POST /api/v1/admin/gateway/connections/oauthup/oauth-start`	returns `authorization_url` containing `response_type=code`, `code_challenge_method=S256`, valid `state`, `redirect_uri=/api/v1/admin/oauth/callback`
Browser-style follow of the auth URL → mock `/authorize` → callback	mock log shows `/authorize → redirect to .../callback?code=auth-code-1`; platform log shows `gateway: upstream connected connection=oauthup tools=3`
Token row in DB	`gateway_oauth_tokens` has `access_token=enc:...`, `refresh_token=enc:...`, `expires_at=` (UTC), `scope='api refresh_token'`
Call `oauthup__echo`	Returns `echo: hello-from-livetest`; audit row has `connection=oauthup` (after fix #1)
Wait > 10s for access TTL to expire, call again	mock OAuth log shows new `/token refresh_token rotated old=ref-N → access=acc-N+1 refresh=ref-N+1`; tool call still succeeds
Restart the platform binary (kill + relaunch)	startup logs: `awaiting reauth → upstream connected` for `oauthup` within ~37ms; first call after restart succeeds

Bugs found¶

Bug A (CRITICAL, FIXED) — OAuth tokens didn't survive process restart¶

Symptom: After kill + restart of the platform binary, oauthup__echo returned unknown tool. The encrypted token was in the DB, but the connection stayed in "awaiting reauth" indefinitely.

Root cause: Wiring order in cmd/mcp-data-platform/main.go. Platform construction calls toolkit.AddConnection(...) for every persisted mcp connection BEFORE wireGatewayTokenStore(p) runs. So when an authorization_code connection's addParsedConnection ran, t.tokenStore was nil, the OAuth Token() had no persisted token to load, the upstream dial failed, and the connection landed as a dead "awaiting reauth" placeholder. SetTokenStore simply attached the store for future calls — it never retried existing placeholders.

This entire feature was non-functional across restarts despite all unit + sqlmock tests passing, because no test exercised "AddConnection with no store, then SetTokenStore" in that order.

Fix: pkg/toolkits/gateway/toolkit.go:SetTokenStore now collects every authorization_code placeholder under the lock, removes them from the connection map, and re-runs addParsedConnection for each outside the lock so the new store is consulted. Order-independent now.

Verification: TestSetTokenStore_RetriesAuthorizationCodePlaceholders (pkg/toolkits/gateway/oauth_test.go) — adds a connection without a store (placeholder created), wires a pre-seeded store, asserts tools are now discovered. Plus the live restart-and-call test passes.

Bug B (HIGH, FIXED) — Audit rows for proxied tools had empty `connection`¶

Symptom: Audit DB rows for mockup__echo/oauthup__echo had toolkit_kind=mcp populated but connection='' — making per-upstream auditing impossible despite the data being right there.

Root cause: pkg/registry/registry.go:GetToolkitForTool always returned toolkit.Connection(), which for the gateway is just the toolkit's default name. Per-tool routing to per-upstream is unique to multi-connection toolkits and the registry didn't have a way to ask the toolkit which connection a specific tool belongs to.

Fix: - New optional interface registry.ConnectionResolver { ConnectionForTool(toolName string) string }. - Gateway toolkit implements it by walking its connections map and matching tool name → connection. - Registry uses the resolver when present, falls back to Connection() when absent or empty.

Verification: - TestGetToolkitForTool_MultiConnectionResolverWins + TestGetToolkitForTool_FallbackWhenResolverReturnsEmpty (pkg/registry/registry_test.go) - TestConnectionForTool_ResolvesPerToolToConnection (pkg/toolkits/gateway/toolkit_test.go) - Live: mockup__add audit row → connection=mockup, oauthup__echo audit row → connection=oauthup.

Bug C — was a TEST ARTIFACT, not a bug. Resolved by inspection, no code change.¶

Symptom (during test): Right after the initial authorization_code exchange minted acc-2/ref-1, the mock OAuth log showed four extra refresh_token rotated calls in the same second, before the gateway had been used for any tool call.

Root cause (after re-reading the code): The mock OAuth provider issues access tokens with a 10-second TTL (deliberately short, to make refresh testing fast). The platform's oauthTokenSource.Token() (pkg/toolkits/gateway/oauth.go:109) refreshes whenever time.Until(ExpiresAt) <= expiryBuffer, and expiryBuffer = 30 * time.Second (pkg/toolkits/gateway/oauth.go:18). With expiryBuffer (30s) > access TTL (10s), every Token() call sees the token as "about to expire" and triggers a refresh.

In a real deployment, OAuth providers issue access tokens with TTLs measured in minutes-to-hours (Salesforce defaults to 2 hours). The 30-second buffer is correct: it triggers refresh in the last 30 seconds of a 2-hour token's life, which is exactly the desired behavior. The "storm" was an artifact of the mock's deliberately-aggressive expiry, not a defect in the platform.

Why no code change: The behavior is correct. Tightening the buffer to (say) 5 seconds would help mock testing but would risk real-world race conditions where a request's full round-trip burns more than the buffer's worth of clock time and the upstream rejects the now-expired token. 30s is the right balance.

Test improvement (deferred): The mock could be reconfigured with ACCESS_TTL_SECONDS=3600 for the steady-state subset of the test (where we don't want to exercise refresh) and then re-run with a short TTL only for the refresh path. Out of scope for this PR.

Bug D (HIGH, FIXED) — In-memory PKCE state, multi-replica unsafe¶

Symptom: pkg/admin/gateway_oauth_handler.go originally used a process-global globalPKCEStore for state → code_verifier. Two platform replicas behind a load balancer would fail every callback whose initial oauth-start landed on a different replica.

Fix: - New PKCEStore interface (pkg/admin/pkce_store.go) with two implementations: - memoryPKCEStore — in-process map with a background GC ticker (single-replica default). - PostgresPKCEStore (pkg/admin/pkce_store_postgres.go) — oauth_pkce_states table (migration 000036), DELETE … RETURNING for atomic single-shot take, periodic sweeper. - Handler.Deps.PKCEStore field; main.go wires the Postgres store when a database is available, falls through to in-memory otherwise. - Removed the globalPKCEStore global.

Verification: pkg/admin/pkce_store_test.go covers Put/Take/idempotent Close, opportunistic GC, background GC sweep, and Postgres path via sqlmock. The handler tests continue to pass against the in-memory default.

Findings about the gateway implementation that are NOT bugs¶

Tokens are encrypted at rest correctly. gateway_oauth_tokens rows show enc:-prefixed AES-256-GCM ciphertext for access_token and refresh_token when ENCRYPTION_KEY is set; the dev environment provided one and round-trip works.
Refresh-on-expiry works. With a 10s access TTL, a tool call > 10s after the previous one triggers a refresh_token grant against the upstream and succeeds.
Bearer auth path is correct. Static-token connections work without any of the OAuth machinery — the auth round-tripper injects the Authorization: Bearer ... header on every outbound request.
Connection mutation is dynamic. PUT on /api/v1/admin/connection-instances/mcp/<name> causes the toolkit to re-discover and re-register tools without a platform restart. Verified live for both mockup and oauthup.
Audit middleware reaches proxied tools. The audit pipeline that wraps native tool calls also captures proxied calls — every tool call we made appeared in audit_logs with correct tool_name, parameters, success, duration_ms. After Bug B fix, also connection.

What was NOT live-tested¶

Honesty section. This live test does NOT prove:

Behavior against a real Salesforce Hosted MCP. The mock provider is RFC-7636-compliant but Salesforce-specific quirks (cookie-based session attachment, lock-screen popups, IdP-side token revocation, scope-string formatting) are unverified.
Behavior against any other named vendor MCP. Same caveat.
Persona/glob filtering of proxied tools. The MCP client used the dev admin API key which has full persona access. A test where a non-admin persona is denied oauthup__* would round out the persona path.
Cross-enrichment rules firing on proxied tool calls. Engine has unit tests; no live rule was authored against oauthup__* and exercised.
Concurrent OAuth flows from multiple admin sessions on a single replica. PKCE state is in-memory but global; behavior is "race-free for one user, unknown for two simultaneous Connect clicks". (The DB-back fix in Bug D resolves this.)

Files changed during the live test¶

File	Change
`pkg/toolkits/gateway/toolkit.go`	`SetTokenStore` now retries authorization_code placeholders; new `ConnectionForTool` method
`pkg/registry/toolkit.go`	New `ConnectionResolver` optional interface
`pkg/registry/registry.go`	`GetToolkitForTool` uses `ConnectionResolver` when present
`pkg/registry/registry_test.go`	Two tests for the new interface (resolver-wins + empty-fallback)
`pkg/toolkits/gateway/toolkit_test.go`	`TestConnectionForTool_ResolvesPerToolToConnection`
`pkg/toolkits/gateway/oauth_test.go`	`TestSetTokenStore_RetriesAuthorizationCodePlaceholders`
`dev/platform.yaml`	Enable `mcp:` toolkit kind for local dev
`docs/research/mcp-gateway-livetest-2026.md`	This file

make verify passes after all changes.

Recommendation update for #338¶

Re-running our prior decision: still ship as-is, with the bug A and bug B fixes folded into the PR. They were genuine v1 blockers — without them, the authorization_code-grant story is a marketing claim, not a working feature.

The two known issues (refresh storm, in-memory PKCE) are filed as non-blocking follow-ups. The refresh storm wastes upstream quota but is functionally correct. The in-memory PKCE is a single-replica-only constraint we should disclose in the docs and resolve before any HA deployment.