Skip to content

Configuration

mcp-data-platform uses YAML configuration with environment variable expansion. Variables in the format ${VAR_NAME} are replaced with their environment values at load time.

Configuration File

Create a platform.yaml file:

server:
  name: mcp-data-platform
  transport: stdio

toolkits:
  trino:
    primary:
      host: trino.example.com
      port: 443
      user: ${TRINO_USER}
      password: ${TRINO_PASSWORD}
      ssl: true
      catalog: hive
      schema: default

  datahub:
    primary:
      url: https://datahub.example.com
      token: ${DATAHUB_TOKEN}

  s3:
    primary:
      region: us-east-1
      access_key_id: ${AWS_ACCESS_KEY_ID}
      secret_access_key: ${AWS_SECRET_ACCESS_KEY}

injection:
  trino_semantic_enrichment: true
  datahub_query_enrichment: true
  s3_semantic_enrichment: true

Server Configuration

server:
  name: mcp-data-platform      # Server name reported to clients
  transport: stdio             # stdio or http
  address: ":8080"             # Listen address for HTTP transports
  tls:
    enabled: false
    cert_file: ""
    key_file: ""
Field Type Default Description
name string mcp-data-platform Server name in MCP handshake
transport string stdio Transport protocol: stdio or http (sse accepted for backward compatibility)
address string :8080 Listen address for HTTP transports
tls.enabled bool false Enable TLS for HTTP transport
tls.cert_file string - Path to TLS certificate
tls.key_file string - Path to TLS private key

HTTP Transport Security

When using HTTP transport without TLS, a warning is logged. For production deployments, always enable TLS to encrypt credentials in transit.

Streamable HTTP Configuration

The HTTP transport serves both legacy SSE (/sse, /message) and Streamable HTTP (/) endpoints. Streamable HTTP session behavior is configured under server.streamable:

server:
  streamable:
    session_timeout: 30m
    stateless: false
Field Type Default Description
session_timeout duration 30m How long an idle session persists before cleanup
stateless bool false Disable session tracking (no Mcp-Session-Id validation)

Authentication Configuration

auth:
  allow_anonymous: false       # Require authentication (default)
  oidc:
    enabled: true
    issuer: "https://auth.example.com/realms/platform"
    client_id: "mcp-data-platform"
    audience: "mcp-data-platform"
    role_claim_path: "realm_access.roles"
    role_prefix: "dp_"
    clock_skew_seconds: 30     # Allowed clock drift
    max_token_age: 24h         # Reject tokens older than this
  api_keys:
    enabled: true
    keys:
      - key: ${API_KEY_ADMIN}
        name: "admin"
        roles: ["admin"]
Field Type Default Description
allow_anonymous bool false Allow unauthenticated requests
oidc.enabled bool false Enable OIDC authentication
oidc.issuer string - OIDC issuer URL
oidc.client_id string - OAuth client ID
oidc.audience string - Expected token audience
oidc.role_claim_path string roles Path to roles in token claims
oidc.role_prefix string - Filter roles to those with this prefix
oidc.clock_skew_seconds int 30 Allowed clock skew for time claims
oidc.max_token_age duration 0 Max token age (0 = no limit)
api_keys.enabled bool false Enable API key authentication
api_keys.keys array - List of API key configurations

Fail-Closed Security

Authentication follows a fail-closed model. Missing tokens, invalid signatures, expired tokens, or missing required claims (sub, exp) all result in denied access.

Toolkit Configuration

Trino

toolkits:
  trino:
    primary:                   # Instance name (can be any identifier)
      host: trino.example.com
      port: 443
      user: analyst
      password: ${TRINO_PASSWORD}
      catalog: hive
      schema: default
      ssl: true
      ssl_verify: true
      timeout: 120s
      default_limit: 1000
      max_limit: 10000
      read_only: false
      connection_name: primary
Field Type Default Description
host string required Trino coordinator hostname
port int 8080 (443 if SSL) Trino coordinator port
user string required Trino username
password string - Trino password (if auth enabled)
catalog string - Default catalog
schema string - Default schema
ssl bool false Enable SSL/TLS
ssl_verify bool true Verify SSL certificates
timeout duration 120s Query timeout
default_limit int 1000 Default row limit for queries
max_limit int 10000 Maximum allowed row limit
read_only bool false Restrict to read-only queries
connection_name string instance name Display name for this connection

DataHub

toolkits:
  datahub:
    primary:
      url: https://datahub.example.com
      token: ${DATAHUB_TOKEN}
      timeout: 30s
      default_limit: 10
      max_limit: 100
      max_lineage_depth: 5
      connection_name: primary
Field Type Default Description
url string required DataHub GMS URL
token string - DataHub access token
timeout duration 30s API request timeout
default_limit int 10 Default search result limit
max_limit int 100 Maximum search result limit
max_lineage_depth int 5 Maximum lineage traversal depth
connection_name string instance name Display name for this connection

S3

toolkits:
  s3:
    primary:
      region: us-east-1
      endpoint: ""                    # Custom endpoint for MinIO, etc.
      access_key_id: ${AWS_ACCESS_KEY_ID}
      secret_access_key: ${AWS_SECRET_ACCESS_KEY}
      session_token: ""
      profile: ""                     # AWS profile name
      use_path_style: false           # Use path-style URLs
      timeout: 30s
      disable_ssl: false
      read_only: true                 # Restrict to read operations
      max_get_size: 10485760          # 10MB
      max_put_size: 104857600         # 100MB
      connection_name: primary
      bucket_prefix: ""               # Filter to buckets with this prefix
Field Type Default Description
region string us-east-1 AWS region
endpoint string - Custom S3 endpoint (for MinIO, etc.)
access_key_id string - AWS access key ID
secret_access_key string - AWS secret access key
session_token string - AWS session token (for temporary creds)
profile string - AWS credentials profile name
use_path_style bool false Use path-style S3 URLs
timeout duration 30s Request timeout
disable_ssl bool false Disable SSL (for local testing)
read_only bool false Restrict to read operations
max_get_size int64 10485760 Max bytes to read from objects
max_put_size int64 104857600 Max bytes to write to objects
connection_name string instance name Display name for this connection
bucket_prefix string - Only show buckets with this prefix

Cross-Injection Configuration

injection:
  trino_semantic_enrichment: true    # Add DataHub context to Trino results
  datahub_query_enrichment: true     # Add Trino availability to DataHub results
  s3_semantic_enrichment: true       # Add DataHub context to S3 results
  datahub_storage_enrichment: true   # Add S3 availability to DataHub results
Field Type Default Description
trino_semantic_enrichment bool false Enrich Trino results with DataHub metadata
datahub_query_enrichment bool false Add query availability to DataHub search results
s3_semantic_enrichment bool false Enrich S3 results with DataHub metadata
datahub_storage_enrichment bool false Add S3 availability to DataHub results

Semantic and Query Provider Configuration

Specify which toolkit instance provides semantic metadata and query execution:

semantic:
  provider: datahub           # Provider type: datahub or noop
  instance: primary           # Which DataHub instance to use
  cache:
    enabled: true
    ttl: 5m

query:
  provider: trino             # Provider type: trino or noop
  instance: primary           # Which Trino instance to use

storage:
  provider: s3                # Provider type: s3 or noop
  instance: primary           # Which S3 instance to use

Environment Variables

Common environment variables:

Variable Description
TRINO_USER Trino username
TRINO_PASSWORD Trino password
DATAHUB_TOKEN DataHub access token
AWS_ACCESS_KEY_ID AWS access key
AWS_SECRET_ACCESS_KEY AWS secret key
AWS_SESSION_TOKEN AWS session token
DATABASE_URL PostgreSQL connection string (for audit/OAuth)

Complete Example

server:
  name: mcp-data-platform
  transport: stdio

toolkits:
  trino:
    primary:
      host: trino.example.com
      port: 443
      user: ${TRINO_USER}
      password: ${TRINO_PASSWORD}
      ssl: true
      catalog: hive
      schema: default
      default_limit: 1000
      max_limit: 10000

  datahub:
    primary:
      url: https://datahub.example.com
      token: ${DATAHUB_TOKEN}
      default_limit: 10
      max_limit: 100

  s3:
    primary:
      region: us-east-1
      read_only: true

semantic:
  provider: datahub
  instance: primary
  cache:
    enabled: true
    ttl: 5m

query:
  provider: trino
  instance: primary

storage:
  provider: s3
  instance: primary

injection:
  trino_semantic_enrichment: true
  datahub_query_enrichment: true
  s3_semantic_enrichment: true

audit:
  enabled: true
  log_tool_calls: true
  retention_days: 90

database:
  dsn: ${DATABASE_URL}

personas:
  definitions:
    analyst:
      display_name: "Data Analyst"
      roles: ["analyst"]
      tools:
        allow: ["trino_query", "trino_explain", "datahub_*"]
        deny: ["*_delete_*"]
  default_persona: analyst

Persona Configuration

Personas define tool access based on user roles. The security model follows a default-deny approach.

personas:
  definitions:
    analyst:
      display_name: "Data Analyst"
      roles: ["analyst", "data_engineer"]
      tools:
        allow: ["trino_*", "datahub_*"]
        deny: ["*_delete_*", "*_drop_*"]
    admin:
      display_name: "Administrator"
      roles: ["admin"]
      tools:
        allow: ["*"]
  default_persona: analyst
Field Type Default Description
definitions map - Named persona configurations
definitions.<name>.display_name string - Human-readable name
definitions.<name>.roles array - Roles that map to this persona
definitions.<name>.tools.allow array [] Allowed tool patterns
definitions.<name>.tools.deny array [] Denied tool patterns
default_persona string - Persona for users without role match

Default-Deny Security

Users without a resolved persona have no tool access. The built-in default persona denies all tools. You must define explicit personas with tool access for your users.

MCP Apps Configuration

MCP Apps provide interactive UI components that enhance tool results. The platform provides the infrastructure; you provide the HTML/JS/CSS apps.

mcpapps:
  enabled: true
  apps:
    query_results:
      enabled: true
      assets_path: "/etc/mcp-apps/query-results"
      tools:
        - trino_query
      csp:
        resource_domains:
          - "https://cdn.jsdelivr.net"
Field Type Default Description
enabled bool false Enable MCP Apps infrastructure
apps map - Named app configurations
apps.<name>.enabled bool true Enable this app
apps.<name>.assets_path string required Absolute path to app directory
apps.<name>.tools array required Tools this app enhances
apps.<name>.csp.resource_domains array - Allowed CDN origins

See MCP Apps Configuration for complete options.

Next Steps