MCP Gateway — Live End-to-End Test Report¶
Date: 2026-04-24
Branch: feat/gateway-toolkit-338
Author: end-to-end live test conducted in response to "all just code and simple unit tests" pushback on the unverified PR.
TL;DR¶
End-to-end live test surfaced three bugs that all the existing unit and integration tests had passed. Two were fixed inline (audit attribution and OAuth-token-survives-restart) with verifying tests added; two are filed as known issues for follow-up.
After the fixes, the full path — create connection via admin API → upstream tool discovery → portal-style PKCE OAuth flow → encrypted token persistence → process restart → token reload → proxied tool call → audit row with correct connection attribution → forced token expiry → automatic refresh-token rotation — works against a real running platform with a real Postgres backend.
Test environment¶
make dev— Docker Compose with Postgres (acme-dev-postgres) and SeaweedFSmcp-data-platformbinary built fromfeat/gateway-toolkit-338HEAD, run withdev/platform.yaml- Mock OAuth provider at
:9180(/tmp/livetest/) — single-file Go program implementing/authorizeand/token(authorization_code, refresh_token, client_credentials) with 10-second access TTL and refresh-token rotation - Mock MCP upstream at
:9181— go-sdk MCP server with three tools (echo,add,now) and bearer-auth middleware - Real MCP client (separate
/tmp/livetest/client/) usinggo-sdkStreamableClientTransportagainst the platform's/mcpendpoint with the dev API key
The test was run twice: once to find bugs, once to verify the fixes. The mock servers are deterministic enough to reproduce both runs.
What was actually verified end-to-end¶
Path 1 — bearer-auth proxy (clean)¶
| Step | Evidence |
|---|---|
Create connection via PUT /api/v1/admin/connection-instances/mcp/mockup with auth_mode=bearer |
200 + saved row, credential field stored as enc:... ciphertext |
| Platform discovers upstream tools | gateway: upstream connected connection=mockup tools=3 in platform log |
MCP client sees mockup__echo, mockup__add, mockup__now in tools/list |
All three appear next to native trino_*, datahub_*, s3_*, memory_* |
Call mockup__add a=7 b=35 |
Returns 42 |
Call mockup__echo "hello-from-livetest" |
Returns echo: hello-from-livetest |
| Audit row written | tool_name=mockup__add toolkit_kind=mcp connection=mockup success=t duration_ms=7 parameters={"a":7,"b":35} |
Path 2 — OAuth authorization_code + PKCE (full dance)¶
| Step | Evidence |
|---|---|
Create OAuth connection with grant authorization_code |
row saved; oauth_client_secret stored as enc:... |
| Connection without token → "awaiting reauth" placeholder | platform log: gateway: oauth authorization_code connection awaiting reauth |
POST /api/v1/admin/gateway/connections/oauthup/oauth-start |
returns authorization_url containing response_type=code, code_challenge_method=S256, valid state, redirect_uri=/api/v1/admin/oauth/callback |
Browser-style follow of the auth URL → mock /authorize → callback |
mock log shows /authorize → redirect to .../callback?code=auth-code-1; platform log shows gateway: upstream connected connection=oauthup tools=3 |
| Token row in DB | gateway_oauth_tokens has access_token=enc:..., refresh_token=enc:..., expires_at= (UTC), scope='api refresh_token' |
Call oauthup__echo |
Returns echo: hello-from-livetest; audit row has connection=oauthup (after fix #1) |
| Wait > 10s for access TTL to expire, call again | mock OAuth log shows new /token refresh_token rotated old=ref-N → access=acc-N+1 refresh=ref-N+1; tool call still succeeds |
| Restart the platform binary (kill + relaunch) | startup logs: awaiting reauth → upstream connected for oauthup within ~37ms; first call after restart succeeds |
Bugs found¶
Bug A (CRITICAL, FIXED) — OAuth tokens didn't survive process restart¶
Symptom: After kill + restart of the platform binary, oauthup__echo returned unknown tool. The encrypted token was in the DB, but the connection stayed in "awaiting reauth" indefinitely.
Root cause: Wiring order in cmd/mcp-data-platform/main.go. Platform construction calls toolkit.AddConnection(...) for every persisted mcp connection BEFORE wireGatewayTokenStore(p) runs. So when an authorization_code connection's addParsedConnection ran, t.tokenStore was nil, the OAuth Token() had no persisted token to load, the upstream dial failed, and the connection landed as a dead "awaiting reauth" placeholder. SetTokenStore simply attached the store for future calls — it never retried existing placeholders.
This entire feature was non-functional across restarts despite all unit + sqlmock tests passing, because no test exercised "AddConnection with no store, then SetTokenStore" in that order.
Fix: pkg/toolkits/gateway/toolkit.go:SetTokenStore now collects every authorization_code placeholder under the lock, removes them from the connection map, and re-runs addParsedConnection for each outside the lock so the new store is consulted. Order-independent now.
Verification: TestSetTokenStore_RetriesAuthorizationCodePlaceholders (pkg/toolkits/gateway/oauth_test.go) — adds a connection without a store (placeholder created), wires a pre-seeded store, asserts tools are now discovered. Plus the live restart-and-call test passes.
Bug B (HIGH, FIXED) — Audit rows for proxied tools had empty connection¶
Symptom: Audit DB rows for mockup__echo/oauthup__echo had toolkit_kind=mcp populated but connection='' — making per-upstream auditing impossible despite the data being right there.
Root cause: pkg/registry/registry.go:GetToolkitForTool always returned toolkit.Connection(), which for the gateway is just the toolkit's default name. Per-tool routing to per-upstream is unique to multi-connection toolkits and the registry didn't have a way to ask the toolkit which connection a specific tool belongs to.
Fix:
- New optional interface registry.ConnectionResolver { ConnectionForTool(toolName string) string }.
- Gateway toolkit implements it by walking its connections map and matching tool name → connection.
- Registry uses the resolver when present, falls back to Connection() when absent or empty.
Verification:
- TestGetToolkitForTool_MultiConnectionResolverWins + TestGetToolkitForTool_FallbackWhenResolverReturnsEmpty (pkg/registry/registry_test.go)
- TestConnectionForTool_ResolvesPerToolToConnection (pkg/toolkits/gateway/toolkit_test.go)
- Live: mockup__add audit row → connection=mockup, oauthup__echo audit row → connection=oauthup.
Bug C — was a TEST ARTIFACT, not a bug. Resolved by inspection, no code change.¶
Symptom (during test): Right after the initial authorization_code exchange minted acc-2/ref-1, the mock OAuth log showed four extra refresh_token rotated calls in the same second, before the gateway had been used for any tool call.
Root cause (after re-reading the code): The mock OAuth provider issues access tokens with a 10-second TTL (deliberately short, to make refresh testing fast). The platform's oauthTokenSource.Token() (pkg/toolkits/gateway/oauth.go:109) refreshes whenever time.Until(ExpiresAt) <= expiryBuffer, and expiryBuffer = 30 * time.Second (pkg/toolkits/gateway/oauth.go:18). With expiryBuffer (30s) > access TTL (10s), every Token() call sees the token as "about to expire" and triggers a refresh.
In a real deployment, OAuth providers issue access tokens with TTLs measured in minutes-to-hours (Salesforce defaults to 2 hours). The 30-second buffer is correct: it triggers refresh in the last 30 seconds of a 2-hour token's life, which is exactly the desired behavior. The "storm" was an artifact of the mock's deliberately-aggressive expiry, not a defect in the platform.
Why no code change: The behavior is correct. Tightening the buffer to (say) 5 seconds would help mock testing but would risk real-world race conditions where a request's full round-trip burns more than the buffer's worth of clock time and the upstream rejects the now-expired token. 30s is the right balance.
Test improvement (deferred): The mock could be reconfigured with ACCESS_TTL_SECONDS=3600 for the steady-state subset of the test (where we don't want to exercise refresh) and then re-run with a short TTL only for the refresh path. Out of scope for this PR.
Bug D (HIGH, FIXED) — In-memory PKCE state, multi-replica unsafe¶
Symptom: pkg/admin/gateway_oauth_handler.go originally used a process-global globalPKCEStore for state → code_verifier. Two platform replicas behind a load balancer would fail every callback whose initial oauth-start landed on a different replica.
Fix:
- New PKCEStore interface (pkg/admin/pkce_store.go) with two implementations:
- memoryPKCEStore — in-process map with a background GC ticker (single-replica default).
- PostgresPKCEStore (pkg/admin/pkce_store_postgres.go) — oauth_pkce_states table (migration 000036), DELETE … RETURNING for atomic single-shot take, periodic sweeper.
- Handler.Deps.PKCEStore field; main.go wires the Postgres store when a database is available, falls through to in-memory otherwise.
- Removed the globalPKCEStore global.
Verification: pkg/admin/pkce_store_test.go covers Put/Take/idempotent Close, opportunistic GC, background GC sweep, and Postgres path via sqlmock. The handler tests continue to pass against the in-memory default.
Findings about the gateway implementation that are NOT bugs¶
- Tokens are encrypted at rest correctly.
gateway_oauth_tokensrows showenc:-prefixed AES-256-GCM ciphertext foraccess_tokenandrefresh_tokenwhenENCRYPTION_KEYis set; the dev environment provided one and round-trip works. - Refresh-on-expiry works. With a 10s access TTL, a tool call > 10s after the previous one triggers a
refresh_tokengrant against the upstream and succeeds. - Bearer auth path is correct. Static-token connections work without any of the OAuth machinery — the auth round-tripper injects the
Authorization: Bearer ...header on every outbound request. - Connection mutation is dynamic.
PUTon/api/v1/admin/connection-instances/mcp/<name>causes the toolkit to re-discover and re-register tools without a platform restart. Verified live for bothmockupandoauthup. - Audit middleware reaches proxied tools. The audit pipeline that wraps native tool calls also captures proxied calls — every tool call we made appeared in
audit_logswith correcttool_name,parameters,success,duration_ms. After Bug B fix, alsoconnection.
What was NOT live-tested¶
Honesty section. This live test does NOT prove:
- Behavior against a real Salesforce Hosted MCP. The mock provider is RFC-7636-compliant but Salesforce-specific quirks (cookie-based session attachment, lock-screen popups, IdP-side token revocation, scope-string formatting) are unverified.
- Behavior against any other named vendor MCP. Same caveat.
- Persona/glob filtering of proxied tools. The MCP client used the dev admin API key which has full persona access. A test where a non-admin persona is denied
oauthup__*would round out the persona path. - Cross-enrichment rules firing on proxied tool calls. Engine has unit tests; no live rule was authored against
oauthup__*and exercised. - Concurrent OAuth flows from multiple admin sessions on a single replica. PKCE state is in-memory but global; behavior is "race-free for one user, unknown for two simultaneous Connect clicks". (The DB-back fix in Bug D resolves this.)
Files changed during the live test¶
| File | Change |
|---|---|
pkg/toolkits/gateway/toolkit.go |
SetTokenStore now retries authorization_code placeholders; new ConnectionForTool method |
pkg/registry/toolkit.go |
New ConnectionResolver optional interface |
pkg/registry/registry.go |
GetToolkitForTool uses ConnectionResolver when present |
pkg/registry/registry_test.go |
Two tests for the new interface (resolver-wins + empty-fallback) |
pkg/toolkits/gateway/toolkit_test.go |
TestConnectionForTool_ResolvesPerToolToConnection |
pkg/toolkits/gateway/oauth_test.go |
TestSetTokenStore_RetriesAuthorizationCodePlaceholders |
dev/platform.yaml |
Enable mcp: toolkit kind for local dev |
docs/research/mcp-gateway-livetest-2026.md |
This file |
make verify passes after all changes.
Recommendation update for #338¶
Re-running our prior decision: still ship as-is, with the bug A and bug B fixes folded into the PR. They were genuine v1 blockers — without them, the authorization_code-grant story is a marketing claim, not a working feature.
The two known issues (refresh storm, in-memory PKCE) are filed as non-blocking follow-ups. The refresh storm wastes upstream quota but is functionally correct. The in-memory PKCE is a single-replica-only constraint we should disclose in the docs and resolve before any HA deployment.