Troubleshooting¶
This guide helps you diagnose and resolve common issues with mcp-data-platform. If you don't find your issue here, check the GitHub Issues or open a new one.
Quick Diagnosis¶
Start here to quickly identify your issue category.
| Symptom | Likely Cause | Jump To |
|---|---|---|
| Server exits immediately | Configuration error | Server Won't Start |
| 401 Unauthorized | Invalid credentials | Authentication Issues |
| 403 Forbidden | Persona/tool mismatch | Persona Issues |
| No enrichment data | Injection misconfigured | Enrichment Issues |
| Slow responses | Performance bottleneck | Performance Issues |
| Connection refused | Service unreachable | Connection Issues |
Server Won't Start¶
Symptom: Server exits immediately¶
Check configuration syntax:
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('platform.yaml'))"
# Or with yq
yq eval platform.yaml
Check for missing environment variables:
# The server logs which variables are missing
mcp-data-platform --config platform.yaml 2>&1 | grep -i "missing\|undefined\|required"
Common configuration errors:
# WRONG: Missing quotes around URL with special characters
toolkits:
datahub:
primary:
url: https://datahub.example.com:8080 # Might be parsed incorrectly
# CORRECT: Quote URLs
toolkits:
datahub:
primary:
url: "https://datahub.example.com:8080"
# WRONG: Environment variable without braces
auth:
api_keys:
keys:
- key: $API_KEY # Won't expand
# CORRECT: Use ${} syntax
auth:
api_keys:
keys:
- key: ${API_KEY}
Symptom: Server hangs on startup¶
Likely cause: Cannot connect to a required service (DataHub, Trino, PostgreSQL).
Debug steps:
# Check DataHub connectivity
curl -v -H "Authorization: Bearer $DATAHUB_TOKEN" \
https://datahub.example.com/openapi/v2/entity/dataset
# Check Trino connectivity
curl -v https://trino.example.com:443/v1/info
# Check PostgreSQL connectivity
psql $DATABASE_URL -c "SELECT 1"
Symptom: Port already in use¶
# Find what's using the port
lsof -i :8080
# Kill the process if needed
kill -9 <PID>
# Or use a different port
mcp-data-platform --transport http --address :8081
Connection Issues¶
Cannot connect to Trino¶
Symptom: trino_query returns connection errors.
Debug output example:
Step 1: Verify network connectivity
# DNS resolution
nslookup trino.example.com
# TCP connectivity
nc -zv trino.example.com 443
# HTTP connectivity
curl -v https://trino.example.com:443/v1/info
Step 2: Check SSL configuration
toolkits:
trino:
primary:
host: trino.example.com
port: 443
ssl: true
ssl_verify: true # Try false for self-signed certs
# For custom CA certificates
ssl_ca_file: /path/to/ca.crt
Step 3: Verify credentials
# Test with Trino CLI
trino --server https://trino.example.com:443 \
--user $TRINO_USER \
--password \
--execute "SELECT 1"
Cannot connect to DataHub¶
Symptom: datahub_search returns connection errors.
Debug output example:
Step 1: Verify URL format
toolkits:
datahub:
primary:
# CORRECT: GMS URL (metadata service)
url: "https://datahub-gms.example.com"
# WRONG: Frontend URL
# url: "https://datahub.example.com" # This is the UI
Step 2: Test token validity
# Test the token
curl -H "Authorization: Bearer $DATAHUB_TOKEN" \
"https://datahub-gms.example.com/openapi/v2/entity/dataset?count=1"
# Check token expiration (if JWT)
echo $DATAHUB_TOKEN | cut -d. -f2 | base64 -d | jq '.exp | todate'
Step 3: Verify token permissions
DataHub tokens need these permissions:
Read Metadata- Required for all operationsManage Metadata- Required for write operations (if enabled)
Cannot connect to S3¶
Symptom: s3_list_buckets returns credential errors.
Debug output example:
Error: operation error S3: ListBuckets, https response error StatusCode: 403,
RequestID: ABC123, api error AccessDenied: Access Denied
Step 1: Verify credentials
# Test with AWS CLI
AWS_ACCESS_KEY_ID=$KEY AWS_SECRET_ACCESS_KEY=$SECRET \
aws s3 ls --region us-east-1
Step 2: Check endpoint configuration for MinIO/custom S3
toolkits:
s3:
minio:
endpoint: "http://minio.local:9000" # Include protocol
use_path_style: true # Required for MinIO
disable_ssl: true # For non-TLS endpoints
region: "us-east-1" # Still required
Step 3: Verify bucket permissions
# Check bucket policy
aws s3api get-bucket-policy --bucket your-bucket
# Check IAM permissions
aws sts get-caller-identity
Authentication Issues¶
OIDC token rejected (401)¶
Debug output example:
Step 1: Decode and inspect the token
# Decode JWT payload
echo $TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq
# Check key claims
echo $TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq '{iss, aud, exp, sub}'
Step 2: Verify issuer URL matches exactly
auth:
oidc:
# Must match token's "iss" claim EXACTLY (including trailing slash if present)
issuer: "https://auth.example.com/realms/myrealm"
# Common mistake: missing /realms/name for Keycloak
# issuer: "https://auth.example.com" # WRONG
Step 3: Check token expiration and clock skew
# Check server time
date -u
# Check token expiration
echo $TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq '.exp | todate'
Step 4: Verify audience claim
API key rejected (401)¶
Debug output example:
Step 1: Check for invisible characters
# Print key with visible whitespace
echo "Key: [$API_KEY_ADMIN]" | cat -A
# Remove any trailing whitespace
export API_KEY_ADMIN=$(echo "$API_KEY_ADMIN" | tr -d '[:space:]')
Step 2: Verify configuration matches
auth:
api_keys:
enabled: true # Must be enabled
keys:
- key: ${API_KEY_ADMIN} # Variable name must match exactly
name: "admin"
roles: ["admin"]
Step 3: Test the key directly
OAuth flow fails¶
Debug output example:
Step 1: Check redirect URI configuration
oauth:
clients:
- id: "claude-desktop"
redirect_uris:
- "http://localhost" # Claude Desktop uses these
- "http://127.0.0.1"
# Must match exactly what the client sends
Step 2: Verify upstream IdP configuration
oauth:
upstream:
issuer: "https://keycloak.example.com/realms/your-realm"
client_id: "mcp-data-platform"
client_secret: ${KEYCLOAK_CLIENT_SECRET}
# This must be registered as a valid redirect URI in Keycloak
redirect_uri: "https://mcp.example.com/oauth/callback"
Step 3: Check browser console for CORS errors
If using a web-based client, check for CORS issues in the browser console.
Persona Issues¶
User gets wrong persona¶
Debug steps:
Step 1: Check what roles are in the token
# Decode token and show roles
echo $TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq '.realm_access.roles'
Step 2: Verify role_claim_path matches token structure
auth:
oidc:
# For Keycloak with realm roles
role_claim_path: "realm_access.roles"
# For Keycloak with client roles
# role_claim_path: "resource_access.mcp-data-platform.roles"
# For Auth0
# role_claim_path: "https://example.com/roles"
Step 3: Check role prefix filtering
Token roles: ["dp_analyst", "user", "admin"]
Filtered roles: ["analyst"] (prefix stripped)
Step 4: Verify persona role matching
Tool denied unexpectedly (403)¶
Debug output example:
Step 1: Check allow/deny patterns
personas:
definitions:
analyst:
tools:
# Deny patterns are checked FIRST
deny: ["trino_query"] # This blocks trino_query
allow: ["trino_*"] # This would allow it, but deny wins
Step 2: Verify wildcard pattern matching
| Pattern | Matches | Doesn't Match |
|---|---|---|
trino_* |
trino_query, trino_list_tables |
datahub_search |
*_delete_* |
s3_delete_object, trino_delete_row |
s3_list_buckets |
* |
Everything | Nothing |
Step 3: List available tools
Enrichment Issues¶
No semantic context in Trino results¶
Debug steps:
Step 1: Verify injection is enabled
Step 2: Check semantic provider is configured
Step 3: Verify table exists in DataHub
# Search for the table in DataHub
curl -H "Authorization: Bearer $DATAHUB_TOKEN" \
"https://datahub-gms.example.com/openapi/v2/search?query=your_table&entity=dataset"
Step 4: Check URN format
DataHub uses URNs to identify entities. The platform must construct the correct URN:
Step 5: Check cache status
Enrichment errors in logs¶
Debug output example:
This is expected for tables that exist in Trino but aren't cataloged in DataHub. The platform returns the original result without enrichment.
To suppress these warnings:
Performance Issues¶
Slow queries¶
Step 1: Check query timeout configuration
Step 2: Verify row limits
toolkits:
trino:
primary:
default_limit: 1000 # Default rows returned
max_limit: 10000 # Maximum allowed
Step 3: Profile the query in Trino
Step 4: Check network latency
# Time a simple query
time curl -H "Authorization: Bearer $TOKEN" \
"https://trino.example.com:443/v1/statement" \
-d "SELECT 1"
Slow enrichment¶
Step 1: Enable and tune caching
Step 2: Check DataHub response time
# Time a DataHub API call
time curl -H "Authorization: Bearer $DATAHUB_TOKEN" \
"https://datahub-gms.example.com/openapi/v2/entity/dataset/urn:li:dataset:..."
Step 3: Temporarily disable enrichment to isolate
High memory usage¶
Likely causes:
- Large query results being held in memory
- Too many cached entries
- Connection pool too large
Solutions:
toolkits:
trino:
primary:
max_limit: 10000 # Reduce maximum result size
semantic:
cache:
max_entries: 5000 # Reduce cache size
ttl: 1m # Shorter TTL
database:
max_open_conns: 10 # Reduce connection pool
max_idle_conns: 5
Debugging Guide¶
Enable verbose logging¶
# Set log level
export LOG_LEVEL=debug
# Run with verbose output
mcp-data-platform --config platform.yaml 2>&1 | tee debug.log
Log format¶
2024-01-15T10:30:45.123Z INFO server started address=:8080 transport=http
2024-01-15T10:30:46.456Z DEBUG auth middleware: validating token
2024-01-15T10:30:46.457Z DEBUG persona middleware: resolved persona=analyst
2024-01-15T10:30:46.458Z INFO tool call tool=trino_query [email protected] persona=analyst
2024-01-15T10:30:46.789Z DEBUG enrichment: fetching semantic context table=orders
2024-01-15T10:30:47.012Z INFO tool call complete tool=trino_query duration=554ms
Request tracing¶
Each request has a unique ID for correlation:
Find all logs for a request:
Audit log queries¶
If audit logging is enabled, query the database:
-- Recent tool calls
SELECT * FROM audit_logs
ORDER BY created_at DESC
LIMIT 100;
-- Failed requests
SELECT * FROM audit_logs
WHERE status = 'error'
ORDER BY created_at DESC;
-- Requests by user
SELECT tool_name, COUNT(*) as count
FROM audit_logs
WHERE user_id = '[email protected]'
GROUP BY tool_name;
Common Error Codes¶
| Code | Meaning | Solution |
|---|---|---|
AUTH_ERROR |
Authentication failed | Check credentials, token expiration |
AUTHZ_ERROR |
Authorization failed | Check persona tool rules |
TOOLKIT_ERROR |
Toolkit operation failed | Check service connectivity |
PROVIDER_ERROR |
Provider operation failed | Check DataHub/Trino config |
CONFIG_ERROR |
Configuration invalid | Validate YAML, check env vars |
TIMEOUT_ERROR |
Operation timed out | Increase timeout, check service |
RATE_LIMIT_ERROR |
Too many requests | Wait and retry, increase limits |
Getting Help¶
-
Search existing issues: GitHub Issues
-
Report a bug: Include:
- Platform version (
mcp-data-platform --version) - Configuration (redact secrets)
- Full error message
- Steps to reproduce
-
Relevant logs
-
Community support: GitHub Discussions
-
Security issues: Email security@txn2.com (do not open public issues for security vulnerabilities)