Skip to content

Audit Logging

Every tool call flows through the audit logging middleware, which records who called what, when, how long it took, and whether it succeeded. Audit logs are stored in PostgreSQL and automatically cleaned up based on a configurable retention period.

Prerequisites

Audit logging requires:

  1. A PostgreSQL database (version 13+)
  2. The database.dsn configuration set
  3. Both audit.enabled and audit.log_tool_calls set to true

Database migrations run automatically on startup and create the audit_logs table.

Configuration

database:
  dsn: "${DATABASE_DSN}"

audit:
  enabled: true
  log_tool_calls: true
  retention_days: 90
Option Type Default Description
database.dsn string - PostgreSQL connection string. Required for audit logging.
audit.enabled bool false Master switch for audit logging.
audit.log_tool_calls bool false Log every tools/call request. Both this and enabled must be true.
audit.retention_days int 90 Days to keep audit logs before automatic cleanup.

If audit.enabled is true but no database is configured, the platform logs a warning and falls back to a no-op logger.

What Gets Logged

Every successful or failed tool call produces one row in audit_logs:

{
  "id": "3sK2DlE7x9mPqR4vNw8bYA",
  "timestamp": "2026-02-04T10:30:00.123Z",
  "duration_ms": 152,
  "request_id": "req-a1b2c3d4e5f67890abcdef1234567890",
  "user_id": "[email protected]",
  "user_email": "[email protected]",
  "persona": "analyst",
  "tool_name": "trino_query",
  "toolkit_kind": "trino",
  "toolkit_name": "production",
  "connection": "prod-trino",
  "parameters": {"sql": "SELECT count(*) FROM orders"},
  "success": true,
  "error_message": "",
  "event_kind": "mcp_tool_call",
  "created_date": "2026-02-04"
}

Field Reference

Field Type Description
id VARCHAR(32) Cryptographically random event ID (base64url-encoded, 16 bytes).
timestamp TIMESTAMPTZ When the tool call started.
duration_ms INTEGER Wall-clock time from request start to handler completion, in milliseconds. Includes rule enforcement and semantic enrichment time.
request_id VARCHAR(255) Unique request ID generated by the auth middleware (hex-encoded, 16 bytes).
user_id VARCHAR(255) Authenticated user identity. From OIDC sub claim, API key name, or OAuth token subject.
user_email VARCHAR(255) User email from OIDC claims, if available.
persona VARCHAR(100) Resolved persona name (e.g., analyst, admin). Set by the authorizer based on the user's roles.
tool_name VARCHAR(255) MCP tool name (e.g., trino_query, datahub_search, s3_list_buckets). For gateway-proxied tools, <connection>__<remote_tool> (e.g., vendor__list_contacts).
toolkit_kind VARCHAR(100) Toolkit type: trino, datahub, s3, or mcp (proxied through the gateway toolkit).
toolkit_name VARCHAR(100) Toolkit instance name from configuration (e.g., production, staging). For gateway proxied tools, the gateway toolkit's own name (typically primary).
connection VARCHAR(100) Connection name. For native toolkits, the connection used to route the call (e.g., prod-trino). For gateway proxied tools, the upstream MCP connection name (e.g., vendor) — populated via the registry's ConnectionResolver interface so per-upstream auditing is accurate without relying on caller-supplied args.
parameters JSONB Tool call arguments with sensitive values redacted. See Parameter Sanitization.
success BOOLEAN true if the tool handler returned without error and IsError was not set.
error_message TEXT Error description if success is false.
session_id VARCHAR(255) MCP session ID. Links tool calls within the same session for pattern analysis.
response_chars INTEGER Character count of the tool response.
content_blocks INTEGER Number of content blocks in the tool response.
request_chars INTEGER Character count of the tool request parameters.
transport VARCHAR(50) Transport type: stdio or http.
source VARCHAR(50) Caller class. mcp = agent over a real MCP transport. rest = external automation through the gateway REST shim (NiFi, cronjobs, integrations). admin = portal-driven tool execution via the admin REST API. See Caller class via source.
enrichment_applied BOOLEAN Whether semantic enrichment was applied to this tool call's response.
authorized BOOLEAN Whether the tool call was authorized by the persona system.
enrichment_tokens_full INTEGER Estimated tokens for the full (non-dedup) enrichment content. Uses chars / 4 approximation.
enrichment_tokens_dedup INTEGER Estimated tokens for the dedup enrichment content. 0 when full enrichment was sent.
enrichment_mode VARCHAR(20) Enrichment mode used: full, summary, reference, none, or empty (not enriched).
event_kind VARCHAR(64) High-level event category: apigateway_invoke for HTTP API calls through the apigateway toolkit, mcp_tool_call for every other toolkit. Lets the Activity view split gateway traffic from MCP tool calls. See Event kind.
created_date DATE Partition key derived from timestamp. Used for retention cleanup.

Caller class via source

Tools on this platform are reachable through three entry points, all of which fire the same MCP audit middleware. The source field on each audit row records which path was used so operators can separate the populations without having to know which user IDs belong to which class of caller.

source What it means Typical caller
mcp Real MCP transport (stdio or HTTP/SSE) Claude, other interactive MCP agents
rest Gateway REST shim at POST /api/v1/gateway/{connection}/invoke Apache NiFi, cronjobs, integrations, anything HTTP that wraps the platform
admin Admin REST API tool execution at POST /api/v1/admin/tools/call Portal UI "test this tool" buttons, ops scripts

Both the gateway REST shim (pkg/gatewayhttp/handler.go) and the admin tool runner (pkg/admin/tools.go) open an in-memory MCP session against the assembled server and call the same api_invoke_endpoint (or other) tool that an agent would call. The handlers tag the context with middleware.WithSource before opening that session so the audit middleware records the originating caller class, not just "mcp".

Filter by source in the admin API:

GET /api/v1/admin/audit/events?source=mcp     # agents only
GET /api/v1/admin/audit/events?source=rest    # NiFi-class only
GET /api/v1/admin/audit/events?source=admin   # portal-driven only

Or in the portal UI, use the All Sources dropdown on the Audit Log page. The dropdown lists every source value seen in the current time window.

Event kind: MCP vs API gateway

The event_kind field separates two classes of audited activity that otherwise share the same row shape:

event_kind What it means
mcp_tool_call A tool routed through one of the MCP toolkits (trino, datahub, s3, or the MCP gateway).
apigateway_invoke An HTTP API call proxied through the apigateway toolkit (api_invoke_endpoint, api_export, and the other api_* tools).

The kind is derived at write time from the toolkit kind, so it does not depend on tool-name string matching. A high-traffic API gateway can produce many rows per agent turn; the split lets the MCP Activity view exclude that traffic by default while a dedicated gateway view includes it.

Filter by event kind in the admin API. The filter is accepted on the event list, stats, and every metrics endpoint (timeseries, breakdown, overview, performance, enrichment, discovery):

GET /api/v1/admin/audit/events?event_kind=mcp_tool_call        # MCP tool calls only
GET /api/v1/admin/audit/events?event_kind=apigateway_invoke    # API gateway calls only
GET /api/v1/admin/audit/metrics/timeseries?event_kind=mcp_tool_call

Parameter Sanitization

Tool call arguments are logged for debugging and compliance, but sensitive values are automatically redacted before storage. The following parameter keys are replaced with [REDACTED]:

  • password
  • secret
  • token
  • api_key
  • authorization
  • credentials

Matching is case-sensitive and exact. A parameter named user_password would not be redacted (only password is matched).

Database Schema

The audit_logs table is partitioned by created_date for efficient retention management:

CREATE TABLE audit_logs (
    id              VARCHAR(32) NOT NULL,
    timestamp       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    duration_ms     INTEGER,
    request_id      VARCHAR(255),
    user_id         VARCHAR(255),
    user_email      VARCHAR(255),
    persona         VARCHAR(100),
    tool_name       VARCHAR(255) NOT NULL,
    toolkit_kind    VARCHAR(100),
    toolkit_name    VARCHAR(100),
    connection      VARCHAR(100),
    parameters      JSONB,
    success         BOOLEAN NOT NULL,
    error_message   TEXT,
    created_date    DATE NOT NULL DEFAULT CURRENT_DATE,
    PRIMARY KEY (id, created_date)
) PARTITION BY RANGE (created_date);

Indexes

Index Column Use Case
idx_audit_logs_timestamp timestamp Time-range queries
idx_audit_logs_user_id user_id Per-user audit trails
idx_audit_logs_tool_name tool_name Tool usage analytics
idx_audit_logs_success success Failure investigation
idx_audit_logs_created_date created_date Partition pruning, retention cleanup

Querying Audit Logs

Recent tool calls by user

SELECT timestamp, tool_name, toolkit_kind, success, duration_ms
FROM audit_logs
WHERE user_id = '[email protected]'
ORDER BY timestamp DESC
LIMIT 20;

Failed calls in the last 24 hours

SELECT timestamp, user_id, tool_name, error_message, duration_ms
FROM audit_logs
WHERE success = false
  AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY timestamp DESC;

Tool usage by toolkit

SELECT toolkit_kind, tool_name, COUNT(*) as calls,
       AVG(duration_ms) as avg_ms, SUM(CASE WHEN success THEN 0 ELSE 1 END) as failures
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY toolkit_kind, tool_name
ORDER BY calls DESC;

Persona activity

SELECT persona, COUNT(*) as calls, COUNT(DISTINCT user_id) as users
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '30 days'
GROUP BY persona
ORDER BY calls DESC;

Slow queries (> 5 seconds)

SELECT timestamp, user_id, tool_name, duration_ms,
       parameters->>'sql' as query
FROM audit_logs
WHERE duration_ms > 5000
  AND toolkit_kind = 'trino'
ORDER BY duration_ms DESC
LIMIT 10;

Enrichment token savings

SELECT enrichment_mode,
       COUNT(*) AS calls,
       SUM(enrichment_tokens_full) AS tokens_full,
       SUM(enrichment_tokens_dedup) AS tokens_dedup,
       SUM(enrichment_tokens_full) - SUM(enrichment_tokens_dedup) AS tokens_saved
FROM audit_logs
WHERE enrichment_applied = true
  AND timestamp > NOW() - INTERVAL '7 days'
GROUP BY enrichment_mode
ORDER BY calls DESC;

Discovery-before-query patterns

WITH session_tools AS (
    SELECT session_id, toolkit_kind,
           MIN(timestamp) AS first_call
    FROM audit_logs
    WHERE timestamp > NOW() - INTERVAL '7 days'
    GROUP BY session_id, toolkit_kind
),
session_patterns AS (
    SELECT session_id,
           BOOL_OR(toolkit_kind = 'datahub') AS has_discovery,
           BOOL_OR(toolkit_kind = 'trino') AS has_query,
           MIN(CASE WHEN toolkit_kind = 'datahub' THEN first_call END) AS first_discovery,
           MIN(CASE WHEN toolkit_kind = 'trino' THEN first_call END) AS first_query
    FROM session_tools
    GROUP BY session_id
)
SELECT
    COUNT(*) AS total_sessions,
    COUNT(*) FILTER (WHERE has_discovery AND has_query AND first_discovery < first_query) AS discovery_first,
    COUNT(*) FILTER (WHERE has_query AND NOT has_discovery) AS query_without_discovery
FROM session_patterns;

Retention, partition rotation, and cleanup

A background maintenance routine runs every 24 hours and performs three steps in order:

  1. Ensure upcoming partitions. Create named monthly partitions for the next two months (audit_logs_YYYY_MM). The current month is intentionally skipped on brownfield deployments so existing rows in audit_logs_default do not conflict with a new named partition over the same date range. This step is idempotent (CREATE TABLE IF NOT EXISTS).
  2. Delete expired rows. Remove rows where timestamp < NOW() - INTERVAL '<retention_days> days'. PostgreSQL prunes the DELETE to only the partitions that overlap the retention window.
  3. Drop fully-expired partitions. Drop any audit_logs_YYYY_MM whose entire date range ends at or before the retention cutoff. DROP TABLE on a partition is effectively constant time and reclaims storage immediately, unlike row-level DELETE.
-- step 1 (illustrative)
CREATE TABLE IF NOT EXISTS audit_logs_2026_06 PARTITION OF audit_logs
    FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');

-- step 2
DELETE FROM audit_logs WHERE timestamp < NOW() - INTERVAL '90 days';

-- step 3 (illustrative, when audit_logs_2025_12 is fully past retention)
DROP TABLE IF EXISTS audit_logs_2025_12;

The maintenance routine starts automatically when the audit store is initialized. The eager pre-tick partition creation also runs once at startup so rows written between startup and the first tick land in a named partition when their month is covered.

Failures in any one step are logged and isolated: a transient failure to create a partition does not skip the retention DELETE on the same tick.

Multi-replica safety

Every replica runs its own 24h ticker, but the maintenance work itself is guarded by a PostgreSQL advisory lock (pg_try_advisory_lock). Each tick:

  1. Each pod opens a dedicated connection and calls pg_try_advisory_lock on a stable lock key.
  2. Exactly one pod acquires the lock; the rest get false back and exit silently to wait for the next tick.
  3. The winning pod runs ensure → DELETE → drop, then pg_advisory_unlocks.

This means the DELETE scan, partition CREATE, and partition DROP each run exactly once per tick across the cluster, regardless of replica count. The advisory lock is session-scoped to the dedicated connection so it cannot leak across pods, and IF NOT EXISTS / IF EXISTS on CREATE/DROP means even a degenerate dual-acquire (which the lock prevents) would still be idempotent.

For deployments running at high volume (the canonical motivating case is Apache NiFi calling the gateway REST shim at order-of-magnitude-per-second), monthly partition rotation keeps the working DELETE bounded to recent partitions and lets old data be bulk-dropped as whole partitions rather than scanned row-by-row.

How It Works

Audit logging is implemented as MCP protocol-level middleware. The middleware chain execution order is:

Auth/Authz -> Audit -> Rules -> Enrichment -> Tool Handler
  1. MCPToolCallMiddleware (outermost) authenticates the user, resolves the persona, looks up toolkit metadata, and stores everything in a PlatformContext on the request context.
  2. MCPAuditMiddleware (inner to auth) receives the context with PlatformContext already set. It records the start time, calls the next handler, measures duration, then reads all fields from PlatformContext to build the audit event.
  3. The audit event is written asynchronously in a goroutine to avoid blocking the tool response. If the database write fails, the error is logged via slog.Error but the tool call still succeeds.

Unauthorized requests are rejected by MCPToolCallMiddleware before reaching the audit middleware, so they are not logged. Only authenticated, authorized tool calls appear in audit logs.

Troubleshooting

No audit logs appearing

  1. Verify both audit.enabled: true AND audit.log_tool_calls: true are set in your config.
  2. Verify database.dsn is configured and the database is reachable.
  3. Check platform startup logs for "audit logging enabled". If you see "using noop logger", the database connection failed.
  4. Check for slog.Error messages containing "failed to log audit event", which indicate database write failures.

Audit logs have empty fields

If user_id or other identity fields are empty, check your authentication configuration. API key authentication populates user_id with the key name. OIDC populates it from the sub claim.

If toolkit_kind, toolkit_name, or connection are empty, the tool was not found in any registered toolkit's tool list. This can happen with platform-level tools like platform_info that are not part of a toolkit.