mcp-data-platform composable mcp data platform
v1.x ·· UTC part of txn2 ↗

project / mcp-data-platform

org / txn2

est. 2025

pkg.go.dev ↗

apache 2.0 · semantic first

mcp-data-platform semantic platform for ai.

mcp-data-platform is the composable MCP server platform for AI assistants that need data with context. Compose mcp-datahub, mcp-trino, and mcp-s3 behind one endpoint with bidirectional cross-enrichment. Query a table and get its meaning, owners, quality scores, and deprecation warnings in the same response. Semantic first: every data response includes business context from the semantic layer.

OAuth 2.1 inbound, OAuth 2.1 outbound to upstream MCPs, OIDC, API keys, role-based personas, audit logging, admin portal, and a gateway toolkit that re-exposes any third-party MCP server through the same auth pipeline. Run as a single binary or import as a Go library to compose your own platform. Apache 2.0.

§ 01 / install · run

Two ways to use it.

Run mcp-data-platform as a standalone server with a YAML config and wire it to Claude, Cursor, or any MCP client. Or import the Go packages and compose a custom platform with your own toolkits, providers, and middleware. Same orchestration, two surfaces.

SRV-001 · go · server docs ↗

standalone platform

One binary, one YAML. Composes DataHub, Trino, S3 behind one MCP endpoint.

Install with go install, Docker, or grab a release. Point it at DataHub via YAML config. Add Trino for SQL and S3 for objects when you are ready. Cross-enrichment is on by default. Run over stdio for Claude Desktop, or http with OAuth 2.1 for hosted MCP clients.

~ / mcp-data-platform
$ go install github.com/txn2/mcp-data-platform/cmd/mcp-data-platform@latest

$ claude mcp add data-platform \
    -e DATAHUB_URL=https://datahub.example.com/api/graphql \
    -e DATAHUB_TOKEN=$TOKEN \
    -- mcp-data-platform --config platform.yaml
  added: data-platform (datahub + trino + s3, semantic on)

$ claude
# ask: describe the orders table with owners and quality

LIB-002 · go · library docs ↗

go library

Compose a custom platform. Bring your own toolkits and middleware.

platform.New returns a configured platform from a YAML config or functional options. The ToolkitRegistry loads DataHub, Trino, S3 and any custom toolkits. Wire Use middleware for redaction, audit, or transformation, then call Run to start the MCP server. No forking required.

~ / library
$ go get github.com/txn2/mcp-data-platform

// main.go
p, err := platform.New(ctx,
    platform.WithConfigFile("platform.yaml"),
    platform.WithToolkit(myToolkit),
)

p.Use(audit.Middleware)
p.Use(redact.Middleware)
p.Run(ctx)

§ 02 / what it does

A platform of tools. Plus context.

mcp-data-platform composes the txn2 MCP toolkits behind one server, then wraps them in the operational layer enterprises need: auth, personas, audit, knowledge, and a gateway for third-party MCPs. Semantic context flows automatically between services so AI responses arrive with meaning, not just rows.

  1. 001
    semantic cross-enrichment

    Trino results include DataHub metadata (owners, tags, glossary terms, quality scores, deprecation). DataHub searches show query availability and sample SQL. S3 operations include semantic context. Lineage inheritance fills in column metadata from upstream datasets. Session dedup avoids repeating context already sent.

    enrichment
  2. 002
    composable toolkits

    DataHub is the only required dependency (the semantic layer). Add mcp-trino for federated SQL and mcp-s3 for object storage when you need them. Multi-instance per service, runtime instance selection, isolated failure domains. Add custom toolkits via the Toolkit interface.

    compose
  3. 003
    oauth 2.1, oidc, api keys

    OAuth 2.1 server for inbound (Claude Desktop talks OAuth to the platform). OAuth 2.1 client for outbound to upstream MCPs through the gateway, with encrypted refresh tokens persisted across restarts. OIDC discovery for Keycloak, Auth0, Okta, Azure AD. API keys for service accounts. Fail-closed by default.

    auth
  4. 004
    personas & tool filtering

    Map OIDC roles to personas. Each persona has allow/deny patterns for tools and connections. Analysts get read access. Admins get everything. Connection-level filtering restricts which toolkit instances a persona can use. Description overrides per persona steer the model differently for different audiences.

    authz
  5. 005
    audit & observability

    Every tool call is logged to PostgreSQL with user, persona, tool, parameters (sanitized), duration, enrichment metrics, and result hash. SOC2 and HIPAA-friendly retention. Searchable from the admin portal with event-detail drawers. Activity dashboards for personas, tools, and connections. Discovery-before-query workflow gating with escalation.

    audit
  6. 006
    gateway toolkit

    Re-expose any well-behaved third-party MCP server through the platform's auth, persona, and audit pipeline. Connections authored in the admin portal (DB-backed, encrypted credentials). Tools surface as <connection>__<remote_tool>. OAuth 2.1 client_credentials and authorization_code+PKCE grants. Optional declarative cross-enrichment joins proxied responses with Trino or DataHub.

    gateway
  7. 007
    admin & user portal

    Built-in web dashboard. Admins manage connections, personas, API keys, configuration entries, audit, knowledge, and gateway connections. Users see their activity, prompts, assets, collections, resources, and shared work. Configurable branding (light/dark logos, implementor brand, public-share notices). Markdown asset viewer with mermaid, GFM tables, and code highlighting.

    portal
  8. 008
    knowledge & memory

    Capture tribal knowledge during AI sessions and write it back to DataHub through a human-in-the-loop governance workflow with changeset rollback. Persistent memory layer (PostgreSQL + pgvector) accumulates preferences, corrections, and domain knowledge across sessions, with staleness detection when referenced entities change.

    knowledge
  9. ···
    part of the txn2 mcp data platform

    Sister projects: mcp-datahub for metadata catalogs, mcp-trino for federated SQL, mcp-s3 for object storage. mcp-data-platform composes all three plus the operational layer that turns them into a platform.

    + ecosystem

// open source

mcp-data-platform is one of several open source components by Craig Johnston, sponsored by Deasil Works, Inc.. Released under the Apache 2.0 license. Built to give AI assistants a safe, composable, semantic-first bridge to your data infrastructure.