Skip to content

MCP Gateway — Live End-to-End Test Report

Date: 2026-04-24 Branch: feat/gateway-toolkit-338 Author: end-to-end live test conducted in response to "all just code and simple unit tests" pushback on the unverified PR.


TL;DR

End-to-end live test surfaced three bugs that all the existing unit and integration tests had passed. Two were fixed inline (audit attribution and OAuth-token-survives-restart) with verifying tests added; two are filed as known issues for follow-up.

After the fixes, the full path — create connection via admin API → upstream tool discovery → portal-style PKCE OAuth flow → encrypted token persistence → process restart → token reload → proxied tool call → audit row with correct connection attribution → forced token expiry → automatic refresh-token rotation — works against a real running platform with a real Postgres backend.


Test environment

  • make dev — Docker Compose with Postgres (acme-dev-postgres) and SeaweedFS
  • mcp-data-platform binary built from feat/gateway-toolkit-338 HEAD, run with dev/platform.yaml
  • Mock OAuth provider at :9180 (/tmp/livetest/) — single-file Go program implementing /authorize and /token (authorization_code, refresh_token, client_credentials) with 10-second access TTL and refresh-token rotation
  • Mock MCP upstream at :9181 — go-sdk MCP server with three tools (echo, add, now) and bearer-auth middleware
  • Real MCP client (separate /tmp/livetest/client/) using go-sdk StreamableClientTransport against the platform's /mcp endpoint with the dev API key

The test was run twice: once to find bugs, once to verify the fixes. The mock servers are deterministic enough to reproduce both runs.


What was actually verified end-to-end

Path 1 — bearer-auth proxy (clean)

Step Evidence
Create connection via PUT /api/v1/admin/connection-instances/mcp/mockup with auth_mode=bearer 200 + saved row, credential field stored as enc:... ciphertext
Platform discovers upstream tools gateway: upstream connected connection=mockup tools=3 in platform log
MCP client sees mockup__echo, mockup__add, mockup__now in tools/list All three appear next to native trino_*, datahub_*, s3_*, memory_*
Call mockup__add a=7 b=35 Returns 42
Call mockup__echo "hello-from-livetest" Returns echo: hello-from-livetest
Audit row written tool_name=mockup__add toolkit_kind=mcp connection=mockup success=t duration_ms=7 parameters={"a":7,"b":35}

Path 2 — OAuth authorization_code + PKCE (full dance)

Step Evidence
Create OAuth connection with grant authorization_code row saved; oauth_client_secret stored as enc:...
Connection without token → "awaiting reauth" placeholder platform log: gateway: oauth authorization_code connection awaiting reauth
POST /api/v1/admin/gateway/connections/oauthup/oauth-start returns authorization_url containing response_type=code, code_challenge_method=S256, valid state, redirect_uri=/api/v1/admin/oauth/callback
Browser-style follow of the auth URL → mock /authorize → callback mock log shows /authorize → redirect to .../callback?code=auth-code-1; platform log shows gateway: upstream connected connection=oauthup tools=3
Token row in DB gateway_oauth_tokens has access_token=enc:..., refresh_token=enc:..., expires_at= (UTC), scope='api refresh_token'
Call oauthup__echo Returns echo: hello-from-livetest; audit row has connection=oauthup (after fix #1)
Wait > 10s for access TTL to expire, call again mock OAuth log shows new /token refresh_token rotated old=ref-N → access=acc-N+1 refresh=ref-N+1; tool call still succeeds
Restart the platform binary (kill + relaunch) startup logs: awaiting reauth → upstream connected for oauthup within ~37ms; first call after restart succeeds

Bugs found

Bug A (CRITICAL, FIXED) — OAuth tokens didn't survive process restart

Symptom: After kill + restart of the platform binary, oauthup__echo returned unknown tool. The encrypted token was in the DB, but the connection stayed in "awaiting reauth" indefinitely.

Root cause: Wiring order in cmd/mcp-data-platform/main.go. Platform construction calls toolkit.AddConnection(...) for every persisted mcp connection BEFORE wireGatewayTokenStore(p) runs. So when an authorization_code connection's addParsedConnection ran, t.tokenStore was nil, the OAuth Token() had no persisted token to load, the upstream dial failed, and the connection landed as a dead "awaiting reauth" placeholder. SetTokenStore simply attached the store for future calls — it never retried existing placeholders.

This entire feature was non-functional across restarts despite all unit + sqlmock tests passing, because no test exercised "AddConnection with no store, then SetTokenStore" in that order.

Fix: pkg/toolkits/gateway/toolkit.go:SetTokenStore now collects every authorization_code placeholder under the lock, removes them from the connection map, and re-runs addParsedConnection for each outside the lock so the new store is consulted. Order-independent now.

Verification: TestSetTokenStore_RetriesAuthorizationCodePlaceholders (pkg/toolkits/gateway/oauth_test.go) — adds a connection without a store (placeholder created), wires a pre-seeded store, asserts tools are now discovered. Plus the live restart-and-call test passes.

Bug B (HIGH, FIXED) — Audit rows for proxied tools had empty connection

Symptom: Audit DB rows for mockup__echo/oauthup__echo had toolkit_kind=mcp populated but connection='' — making per-upstream auditing impossible despite the data being right there.

Root cause: pkg/registry/registry.go:GetToolkitForTool always returned toolkit.Connection(), which for the gateway is just the toolkit's default name. Per-tool routing to per-upstream is unique to multi-connection toolkits and the registry didn't have a way to ask the toolkit which connection a specific tool belongs to.

Fix: - New optional interface registry.ConnectionResolver { ConnectionForTool(toolName string) string }. - Gateway toolkit implements it by walking its connections map and matching tool name → connection. - Registry uses the resolver when present, falls back to Connection() when absent or empty.

Verification: - TestGetToolkitForTool_MultiConnectionResolverWins + TestGetToolkitForTool_FallbackWhenResolverReturnsEmpty (pkg/registry/registry_test.go) - TestConnectionForTool_ResolvesPerToolToConnection (pkg/toolkits/gateway/toolkit_test.go) - Live: mockup__add audit row → connection=mockup, oauthup__echo audit row → connection=oauthup.

Bug C — was a TEST ARTIFACT, not a bug. Resolved by inspection, no code change.

Symptom (during test): Right after the initial authorization_code exchange minted acc-2/ref-1, the mock OAuth log showed four extra refresh_token rotated calls in the same second, before the gateway had been used for any tool call.

Root cause (after re-reading the code): The mock OAuth provider issues access tokens with a 10-second TTL (deliberately short, to make refresh testing fast). The platform's oauthTokenSource.Token() (pkg/toolkits/gateway/oauth.go:109) refreshes whenever time.Until(ExpiresAt) <= expiryBuffer, and expiryBuffer = 30 * time.Second (pkg/toolkits/gateway/oauth.go:18). With expiryBuffer (30s) > access TTL (10s), every Token() call sees the token as "about to expire" and triggers a refresh.

In a real deployment, OAuth providers issue access tokens with TTLs measured in minutes-to-hours (Salesforce defaults to 2 hours). The 30-second buffer is correct: it triggers refresh in the last 30 seconds of a 2-hour token's life, which is exactly the desired behavior. The "storm" was an artifact of the mock's deliberately-aggressive expiry, not a defect in the platform.

Why no code change: The behavior is correct. Tightening the buffer to (say) 5 seconds would help mock testing but would risk real-world race conditions where a request's full round-trip burns more than the buffer's worth of clock time and the upstream rejects the now-expired token. 30s is the right balance.

Test improvement (deferred): The mock could be reconfigured with ACCESS_TTL_SECONDS=3600 for the steady-state subset of the test (where we don't want to exercise refresh) and then re-run with a short TTL only for the refresh path. Out of scope for this PR.

Bug D (HIGH, FIXED) — In-memory PKCE state, multi-replica unsafe

Symptom: pkg/admin/gateway_oauth_handler.go originally used a process-global globalPKCEStore for state → code_verifier. Two platform replicas behind a load balancer would fail every callback whose initial oauth-start landed on a different replica.

Fix: - New PKCEStore interface (pkg/admin/pkce_store.go) with two implementations: - memoryPKCEStore — in-process map with a background GC ticker (single-replica default). - PostgresPKCEStore (pkg/admin/pkce_store_postgres.go) — oauth_pkce_states table (migration 000036), DELETE … RETURNING for atomic single-shot take, periodic sweeper. - Handler.Deps.PKCEStore field; main.go wires the Postgres store when a database is available, falls through to in-memory otherwise. - Removed the globalPKCEStore global.

Verification: pkg/admin/pkce_store_test.go covers Put/Take/idempotent Close, opportunistic GC, background GC sweep, and Postgres path via sqlmock. The handler tests continue to pass against the in-memory default.


Findings about the gateway implementation that are NOT bugs

  • Tokens are encrypted at rest correctly. gateway_oauth_tokens rows show enc:-prefixed AES-256-GCM ciphertext for access_token and refresh_token when ENCRYPTION_KEY is set; the dev environment provided one and round-trip works.
  • Refresh-on-expiry works. With a 10s access TTL, a tool call > 10s after the previous one triggers a refresh_token grant against the upstream and succeeds.
  • Bearer auth path is correct. Static-token connections work without any of the OAuth machinery — the auth round-tripper injects the Authorization: Bearer ... header on every outbound request.
  • Connection mutation is dynamic. PUT on /api/v1/admin/connection-instances/mcp/<name> causes the toolkit to re-discover and re-register tools without a platform restart. Verified live for both mockup and oauthup.
  • Audit middleware reaches proxied tools. The audit pipeline that wraps native tool calls also captures proxied calls — every tool call we made appeared in audit_logs with correct tool_name, parameters, success, duration_ms. After Bug B fix, also connection.

What was NOT live-tested

Honesty section. This live test does NOT prove:

  • Behavior against a real Salesforce Hosted MCP. The mock provider is RFC-7636-compliant but Salesforce-specific quirks (cookie-based session attachment, lock-screen popups, IdP-side token revocation, scope-string formatting) are unverified.
  • Behavior against any other named vendor MCP. Same caveat.
  • Persona/glob filtering of proxied tools. The MCP client used the dev admin API key which has full persona access. A test where a non-admin persona is denied oauthup__* would round out the persona path.
  • Cross-enrichment rules firing on proxied tool calls. Engine has unit tests; no live rule was authored against oauthup__* and exercised.
  • Concurrent OAuth flows from multiple admin sessions on a single replica. PKCE state is in-memory but global; behavior is "race-free for one user, unknown for two simultaneous Connect clicks". (The DB-back fix in Bug D resolves this.)

Files changed during the live test

File Change
pkg/toolkits/gateway/toolkit.go SetTokenStore now retries authorization_code placeholders; new ConnectionForTool method
pkg/registry/toolkit.go New ConnectionResolver optional interface
pkg/registry/registry.go GetToolkitForTool uses ConnectionResolver when present
pkg/registry/registry_test.go Two tests for the new interface (resolver-wins + empty-fallback)
pkg/toolkits/gateway/toolkit_test.go TestConnectionForTool_ResolvesPerToolToConnection
pkg/toolkits/gateway/oauth_test.go TestSetTokenStore_RetriesAuthorizationCodePlaceholders
dev/platform.yaml Enable mcp: toolkit kind for local dev
docs/research/mcp-gateway-livetest-2026.md This file

make verify passes after all changes.


Recommendation update for #338

Re-running our prior decision: still ship as-is, with the bug A and bug B fixes folded into the PR. They were genuine v1 blockers — without them, the authorization_code-grant story is a marketing claim, not a working feature.

The two known issues (refresh storm, in-memory PKCE) are filed as non-blocking follow-ups. The refresh storm wastes upstream quota but is functionally correct. The in-memory PKCE is a single-replica-only constraint we should disclose in the docs and resolve before any HA deployment.