Deployment Guide¶
This guide covers deploying mcp-data-platform in various environments, from local development to production Kubernetes clusters.
Deployment Options¶
| Environment | Best For | Complexity |
|---|---|---|
| Docker Compose | Development, small teams, testing | Low |
| Kubernetes/Helm | Production, multi-user, enterprise | Medium |
Docker Compose (Development/Small Teams)¶
A complete full-stack deployment including DataHub, Trino, mcp-data-platform, Keycloak, and PostgreSQL.
Prerequisites¶
- Docker 24.0+
- Docker Compose 2.20+
- 16GB RAM minimum (DataHub requires significant memory)
- 20GB free disk space
Full-Stack Example¶
Create a docker-compose.yml:
services:
# PostgreSQL for metadata storage
postgres:
image: postgres:16-alpine@sha256:acf5271bce6b4b62e352341e3b175c2b1e9e0b6f6e3f2e7e3b7f8c9d0e1f2a3b
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
POSTGRES_MULTIPLE_DATABASES: datahub,keycloak,audit
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init-multiple-dbs.sh:/docker-entrypoint-initdb.d/init-multiple-dbs.sh
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
# Keycloak for authentication
keycloak:
image: quay.io/keycloak/keycloak:24.0@sha256:b3c4a5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4
command: start-dev --import-realm
environment:
KC_DB: postgres
KC_DB_URL: jdbc:postgresql://postgres:5432/keycloak
KC_DB_USERNAME: postgres
KC_DB_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
KEYCLOAK_ADMIN: admin
KEYCLOAK_ADMIN_PASSWORD: ${KEYCLOAK_ADMIN_PASSWORD:-admin}
volumes:
- ./keycloak-realm.json:/opt/keycloak/data/import/realm.json
ports:
- "8180:8080"
depends_on:
postgres:
condition: service_healthy
# DataHub GMS (Metadata Service)
datahub-gms:
image: acryldata/datahub-gms:v0.13.0@sha256:c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2
environment:
DATAHUB_GMS_HOST: datahub-gms
DATAHUB_GMS_PORT: 8080
EBEAN_DATASOURCE_HOST: postgres:5432
EBEAN_DATASOURCE_USERNAME: postgres
EBEAN_DATASOURCE_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
ELASTICSEARCH_HOST: elasticsearch
ELASTICSEARCH_PORT: 9200
KAFKA_BOOTSTRAP_SERVER: kafka:9092
KAFKA_SCHEMAREGISTRY_URL: http://schema-registry:8081
depends_on:
postgres:
condition: service_healthy
elasticsearch:
condition: service_healthy
kafka:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
# Elasticsearch for DataHub search
elasticsearch:
image: elasticsearch:7.17.18@sha256:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms512m -Xmx512m
volumes:
- elasticsearch_data:/usr/share/elasticsearch/data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9200/_cluster/health"]
interval: 10s
timeout: 5s
retries: 10
# Kafka for DataHub events
kafka:
image: confluentinc/cp-kafka:7.6.0@sha256:b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
depends_on:
- zookeeper
healthcheck:
test: ["CMD", "kafka-topics", "--bootstrap-server", "kafka:9092", "--list"]
interval: 30s
timeout: 10s
retries: 5
# Zookeeper for Kafka
zookeeper:
image: confluentinc/cp-zookeeper:7.6.0@sha256:a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
# Schema Registry for Kafka
schema-registry:
image: confluentinc/cp-schema-registry:7.6.0@sha256:c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: kafka:9092
depends_on:
kafka:
condition: service_healthy
# Trino for SQL queries
trino:
image: trinodb/trino:440@sha256:d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5
ports:
- "8081:8080"
volumes:
- ./trino-catalog:/etc/trino/catalog
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/v1/info"]
interval: 10s
timeout: 5s
retries: 10
# MCP Data Platform
mcp-data-platform:
image: ghcr.io/txn2/mcp-data-platform:latest
environment:
DATAHUB_TOKEN: ${DATAHUB_TOKEN}
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@postgres:5432/audit
OAUTH_SIGNING_KEY: ${OAUTH_SIGNING_KEY}
KEYCLOAK_CLIENT_SECRET: ${KEYCLOAK_CLIENT_SECRET}
volumes:
- ./platform.yaml:/etc/mcp/platform.yaml:ro
command: ["--config", "/etc/mcp/platform.yaml", "--transport", "http", "--address", ":8080"]
ports:
- "8080:8080"
depends_on:
datahub-gms:
condition: service_healthy
trino:
condition: service_healthy
keycloak:
condition: service_started
volumes:
postgres_data:
elasticsearch_data:
Platform Configuration¶
Create platform.yaml:
server:
name: mcp-data-platform
transport: http
address: ":8080"
toolkits:
datahub:
primary:
url: http://datahub-gms:8080
token: ${DATAHUB_TOKEN}
trino:
primary:
host: trino
port: 8080
user: trino
catalog: memory
ssl: false
oauth:
enabled: true
issuer: "http://localhost:8080"
signing_key: ${OAUTH_SIGNING_KEY}
clients:
- id: "claude-desktop"
secret: "claude-secret"
redirect_uris:
- "http://localhost"
- "http://127.0.0.1"
upstream:
issuer: "http://keycloak:8080/realms/mcp"
client_id: "mcp-data-platform"
client_secret: ${KEYCLOAK_CLIENT_SECRET}
redirect_uri: "http://localhost:8080/oauth/callback"
personas:
definitions:
analyst:
display_name: "Data Analyst"
roles: ["analyst"]
tools:
allow: ["trino_*", "datahub_*"]
deny: ["*_delete_*"]
admin:
display_name: "Administrator"
roles: ["admin"]
tools:
allow: ["*"]
default_persona: analyst
injection:
trino_semantic_enrichment: true
datahub_query_enrichment: true
audit:
enabled: true
log_tool_calls: true
database:
dsn: ${DATABASE_URL}
Start the Stack¶
# Generate secrets
export POSTGRES_PASSWORD=$(openssl rand -base64 32)
export OAUTH_SIGNING_KEY=$(openssl rand -base64 32)
export KEYCLOAK_CLIENT_SECRET=$(openssl rand -base64 32)
export DATAHUB_TOKEN="your-datahub-token"
# Start all services
docker compose up -d
# Wait for services to be healthy
docker compose ps
# View logs
docker compose logs -f mcp-data-platform
Local Development Workflow¶
For rapid iteration during development:
# Start dependencies only
docker compose up -d postgres elasticsearch kafka zookeeper schema-registry datahub-gms trino keycloak
# Run mcp-data-platform locally
go run ./cmd/mcp-data-platform --config platform.yaml --transport http --address :8080
Kubernetes/Helm (Production)¶
Production deployment using Helm charts with best practices for security, scaling, and monitoring.
Prerequisites¶
- Kubernetes 1.28+
- Helm 3.12+
- kubectl configured for your cluster
- TLS certificates (cert-manager recommended)
Helm Chart Structure¶
Create a Helm chart at charts/mcp-data-platform/:
charts/mcp-data-platform/
├── Chart.yaml
├── values.yaml
├── templates/
│ ├── _helpers.tpl
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── configmap.yaml
│ ├── secret.yaml
│ ├── ingress.yaml
│ ├── hpa.yaml
│ ├── pdb.yaml
│ └── serviceaccount.yaml
Chart.yaml¶
apiVersion: v2
name: mcp-data-platform
description: Semantic data platform MCP server
type: application
version: 1.0.0
appVersion: "0.1.0"
values.yaml¶
replicaCount: 2
image:
repository: ghcr.io/txn2/mcp-data-platform
pullPolicy: IfNotPresent
tag: "latest"
serviceAccount:
create: true
annotations: {}
name: ""
podSecurityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
fsGroup: 65534
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
service:
type: ClusterIP
port: 8080
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
hosts:
- host: mcp.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: mcp-data-platform-tls
hosts:
- mcp.example.com
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
pdb:
enabled: true
minAvailable: 1
# Platform configuration
config:
server:
name: mcp-data-platform
transport: http
address: ":8080"
tls:
enabled: false # TLS terminates at ingress
toolkits:
datahub:
primary:
url: http://datahub-gms.datahub:8080
trino:
primary:
host: trino.trino
port: 8080
user: mcp-platform
catalog: hive
ssl: false
injection:
trino_semantic_enrichment: true
datahub_query_enrichment: true
audit:
enabled: true
log_tool_calls: true
# External secrets (use external-secrets operator or sealed-secrets in production)
secrets:
datahubToken: ""
oauthSigningKey: ""
keycloakClientSecret: ""
databaseUrl: ""
# Prometheus metrics
metrics:
enabled: true
port: 9090
path: /metrics
# Health checks
probes:
liveness:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 10
readiness:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 5
templates/deployment.yaml¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "mcp-data-platform.fullname" . }}
labels:
{{- include "mcp-data-platform.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "mcp-data-platform.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
labels:
{{- include "mcp-data-platform.selectorLabels" . | nindent 8 }}
spec:
serviceAccountName: {{ include "mcp-data-platform.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
args:
- --config
- /etc/mcp/platform.yaml
- --transport
- http
- --address
- :8080
ports:
- name: http
containerPort: 8080
protocol: TCP
{{- if .Values.metrics.enabled }}
- name: metrics
containerPort: {{ .Values.metrics.port }}
protocol: TCP
{{- end }}
livenessProbe:
{{- toYaml .Values.probes.liveness | nindent 12 }}
readinessProbe:
{{- toYaml .Values.probes.readiness | nindent 12 }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
env:
- name: DATAHUB_TOKEN
valueFrom:
secretKeyRef:
name: {{ include "mcp-data-platform.fullname" . }}
key: datahub-token
- name: OAUTH_SIGNING_KEY
valueFrom:
secretKeyRef:
name: {{ include "mcp-data-platform.fullname" . }}
key: oauth-signing-key
- name: KEYCLOAK_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: {{ include "mcp-data-platform.fullname" . }}
key: keycloak-client-secret
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: {{ include "mcp-data-platform.fullname" . }}
key: database-url
volumeMounts:
- name: config
mountPath: /etc/mcp
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: config
configMap:
name: {{ include "mcp-data-platform.fullname" . }}
- name: tmp
emptyDir: {}
templates/hpa.yaml¶
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: {{ include "mcp-data-platform.fullname" . }}
labels:
{{- include "mcp-data-platform.labels" . | nindent 4 }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "mcp-data-platform.fullname" . }}
minReplicas: {{ .Values.autoscaling.minReplicas }}
maxReplicas: {{ .Values.autoscaling.maxReplicas }}
metrics:
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
{{- end }}
{{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
{{- end }}
{{- end }}
templates/pdb.yaml¶
{{- if .Values.pdb.enabled }}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: {{ include "mcp-data-platform.fullname" . }}
labels:
{{- include "mcp-data-platform.labels" . | nindent 4 }}
spec:
minAvailable: {{ .Values.pdb.minAvailable }}
selector:
matchLabels:
{{- include "mcp-data-platform.selectorLabels" . | nindent 6 }}
{{- end }}
Deploy to Kubernetes¶
# Create namespace
kubectl create namespace mcp-data-platform
# Create secrets (use external-secrets or sealed-secrets in production)
kubectl create secret generic mcp-data-platform-secrets \
--namespace mcp-data-platform \
--from-literal=datahub-token="$DATAHUB_TOKEN" \
--from-literal=oauth-signing-key="$OAUTH_SIGNING_KEY" \
--from-literal=keycloak-client-secret="$KEYCLOAK_CLIENT_SECRET" \
--from-literal=database-url="$DATABASE_URL"
# Install the chart
helm upgrade --install mcp-data-platform ./charts/mcp-data-platform \
--namespace mcp-data-platform \
--values values-production.yaml
# Verify deployment
kubectl get pods -n mcp-data-platform
kubectl get hpa -n mcp-data-platform
Production Checklist¶
Security¶
- TLS enabled for all external endpoints
- Secrets stored in external secrets manager (Vault, AWS Secrets Manager)
- Network policies restrict pod-to-pod communication
- Pod security context configured (non-root, read-only filesystem)
- Resource limits set for all containers
- OIDC configured with production identity provider
- API keys rotated regularly
High Availability¶
- Multiple replicas deployed (minimum 2)
- PodDisruptionBudget configured
- Anti-affinity rules spread pods across nodes
- Health checks configured for liveness and readiness
- HPA configured for automatic scaling
Monitoring¶
- Prometheus metrics enabled and scraped
- Grafana dashboards deployed
- Alerting rules configured
- Log aggregation set up (ELK, Loki)
- Distributed tracing enabled (Jaeger, Zipkin)
Operations¶
- Backup strategy for PostgreSQL audit logs
- Disaster recovery plan documented
- Runbooks for common issues
- On-call rotation established
Monitoring Setup¶
Prometheus ServiceMonitor¶
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: mcp-data-platform
namespace: mcp-data-platform
spec:
selector:
matchLabels:
app.kubernetes.io/name: mcp-data-platform
endpoints:
- port: metrics
interval: 30s
path: /metrics
Grafana Dashboard¶
Key metrics to monitor:
- Request rate:
sum(rate(mcp_requests_total[5m])) - Error rate:
sum(rate(mcp_requests_total{status="error"}[5m])) - Latency:
histogram_quantile(0.99, rate(mcp_request_duration_seconds_bucket[5m])) - Enrichment latency:
histogram_quantile(0.99, rate(mcp_enrichment_duration_seconds_bucket[5m])) - Active connections:
mcp_active_connections
Scaling Considerations¶
Horizontal Scaling¶
mcp-data-platform is stateless and scales horizontally. Key considerations:
- Connection pooling: Each replica maintains its own connections to DataHub/Trino
- Cache coordination: Semantic cache is per-instance; consider Redis for shared caching at scale
- Load balancing: Use sticky sessions for SSE connections
Vertical Scaling¶
Increase resources for:
- High query volume: More CPU for request processing
- Large result sets: More memory for enrichment processing
- Many concurrent connections: More memory for connection state
Bottleneck Analysis¶
Common bottlenecks and solutions:
| Bottleneck | Symptom | Solution |
|---|---|---|
| DataHub API | High enrichment latency | Enable caching, increase DataHub resources |
| Trino queries | Timeout errors | Tune Trino cluster, add query limits |
| PostgreSQL audit | Write latency | Use async writes, add replicas |
| Network | Connection timeouts | Deploy closer to data sources |