Audit Logging¶
Every tool call flows through the audit logging middleware, which records who called what, when, how long it took, and whether it succeeded. Audit logs are stored in PostgreSQL and automatically cleaned up based on a configurable retention period.
Prerequisites¶
Audit logging requires:
- A PostgreSQL database (version 13+)
- The
database.dsnconfiguration set - Both
audit.enabledandaudit.log_tool_callsset totrue
Database migrations run automatically on startup and create the audit_logs table.
Configuration¶
| Option | Type | Default | Description |
|---|---|---|---|
database.dsn |
string | - | PostgreSQL connection string. Required for audit logging. |
audit.enabled |
bool | false |
Master switch for audit logging. |
audit.log_tool_calls |
bool | false |
Log every tools/call request. Both this and enabled must be true. |
audit.retention_days |
int | 90 |
Days to keep audit logs before automatic cleanup. |
If audit.enabled is true but no database is configured, the platform logs a warning and falls back to a no-op logger.
What Gets Logged¶
Every successful or failed tool call produces one row in audit_logs:
{
"id": "3sK2DlE7x9mPqR4vNw8bYA",
"timestamp": "2026-02-04T10:30:00.123Z",
"duration_ms": 152,
"request_id": "req-a1b2c3d4e5f67890abcdef1234567890",
"user_id": "[email protected]",
"user_email": "[email protected]",
"persona": "analyst",
"tool_name": "trino_query",
"toolkit_kind": "trino",
"toolkit_name": "production",
"connection": "prod-trino",
"parameters": {"sql": "SELECT count(*) FROM orders"},
"success": true,
"error_message": "",
"event_kind": "mcp_tool_call",
"created_date": "2026-02-04"
}
Field Reference¶
| Field | Type | Description |
|---|---|---|
id |
VARCHAR(32) | Cryptographically random event ID (base64url-encoded, 16 bytes). |
timestamp |
TIMESTAMPTZ | When the tool call started. |
duration_ms |
INTEGER | Wall-clock time from request start to handler completion, in milliseconds. Includes rule enforcement and semantic enrichment time. |
request_id |
VARCHAR(255) | Unique request ID generated by the auth middleware (hex-encoded, 16 bytes). |
user_id |
VARCHAR(255) | Authenticated user identity. From OIDC sub claim, API key name, or OAuth token subject. |
user_email |
VARCHAR(255) | User email from OIDC claims, if available. |
persona |
VARCHAR(100) | Resolved persona name (e.g., analyst, admin). Set by the authorizer based on the user's roles. |
tool_name |
VARCHAR(255) | MCP tool name (e.g., trino_query, datahub_search, s3_list_buckets). For gateway-proxied tools, <connection>__<remote_tool> (e.g., vendor__list_contacts). |
toolkit_kind |
VARCHAR(100) | Toolkit type: trino, datahub, s3, or mcp (proxied through the gateway toolkit). |
toolkit_name |
VARCHAR(100) | Toolkit instance name from configuration (e.g., production, staging). For gateway proxied tools, the gateway toolkit's own name (typically primary). |
connection |
VARCHAR(100) | Connection name. For native toolkits, the connection used to route the call (e.g., prod-trino). For gateway proxied tools, the upstream MCP connection name (e.g., vendor) — populated via the registry's ConnectionResolver interface so per-upstream auditing is accurate without relying on caller-supplied args. |
parameters |
JSONB | Tool call arguments with sensitive values redacted. See Parameter Sanitization. |
success |
BOOLEAN | true if the tool handler returned without error and IsError was not set. |
error_message |
TEXT | Error description if success is false. |
session_id |
VARCHAR(255) | MCP session ID. Links tool calls within the same session for pattern analysis. |
response_chars |
INTEGER | Character count of the tool response. |
content_blocks |
INTEGER | Number of content blocks in the tool response. |
request_chars |
INTEGER | Character count of the tool request parameters. |
transport |
VARCHAR(50) | Transport type: stdio or http. |
source |
VARCHAR(50) | Caller class. mcp = agent over a real MCP transport. rest = external automation through the gateway REST shim (NiFi, cronjobs, integrations). admin = portal-driven tool execution via the admin REST API. See Caller class via source. |
enrichment_applied |
BOOLEAN | Whether semantic enrichment was applied to this tool call's response. |
authorized |
BOOLEAN | Whether the tool call was authorized by the persona system. |
enrichment_tokens_full |
INTEGER | Estimated tokens for the full (non-dedup) enrichment content. Uses chars / 4 approximation. |
enrichment_tokens_dedup |
INTEGER | Estimated tokens for the dedup enrichment content. 0 when full enrichment was sent. |
enrichment_mode |
VARCHAR(20) | Enrichment mode used: full, summary, reference, none, or empty (not enriched). |
event_kind |
VARCHAR(64) | High-level event category: apigateway_invoke for HTTP API calls through the apigateway toolkit, mcp_tool_call for every other toolkit. Lets the Activity view split gateway traffic from MCP tool calls. See Event kind. |
created_date |
DATE | Partition key derived from timestamp. Used for retention cleanup. |
Caller class via source¶
Tools on this platform are reachable through three entry points, all of which fire the same MCP audit middleware. The source field on each audit row records which path was used so operators can separate the populations without having to know which user IDs belong to which class of caller.
source |
What it means | Typical caller |
|---|---|---|
mcp |
Real MCP transport (stdio or HTTP/SSE) | Claude, other interactive MCP agents |
rest |
Gateway REST shim at POST /api/v1/gateway/{connection}/invoke |
Apache NiFi, cronjobs, integrations, anything HTTP that wraps the platform |
admin |
Admin REST API tool execution at POST /api/v1/admin/tools/call |
Portal UI "test this tool" buttons, ops scripts |
Both the gateway REST shim (pkg/gatewayhttp/handler.go) and the admin tool runner (pkg/admin/tools.go) open an in-memory MCP session against the assembled server and call the same api_invoke_endpoint (or other) tool that an agent would call. The handlers tag the context with middleware.WithSource before opening that session so the audit middleware records the originating caller class, not just "mcp".
Filter by source in the admin API:
GET /api/v1/admin/audit/events?source=mcp # agents only
GET /api/v1/admin/audit/events?source=rest # NiFi-class only
GET /api/v1/admin/audit/events?source=admin # portal-driven only
Or in the portal UI, use the All Sources dropdown on the Audit Log page. The dropdown lists every source value seen in the current time window.
Event kind: MCP vs API gateway¶
The event_kind field separates two classes of audited activity that otherwise share the same row shape:
event_kind |
What it means |
|---|---|
mcp_tool_call |
A tool routed through one of the MCP toolkits (trino, datahub, s3, or the MCP gateway). |
apigateway_invoke |
An HTTP API call proxied through the apigateway toolkit (api_invoke_endpoint, api_export, and the other api_* tools). |
The kind is derived at write time from the toolkit kind, so it does not depend on tool-name string matching. A high-traffic API gateway can produce many rows per agent turn; the split lets the MCP Activity view exclude that traffic by default while a dedicated gateway view includes it.
Filter by event kind in the admin API. The filter is accepted on the event list, stats, and every metrics endpoint (timeseries, breakdown, overview, performance, enrichment, discovery):
GET /api/v1/admin/audit/events?event_kind=mcp_tool_call # MCP tool calls only
GET /api/v1/admin/audit/events?event_kind=apigateway_invoke # API gateway calls only
GET /api/v1/admin/audit/metrics/timeseries?event_kind=mcp_tool_call
Parameter Sanitization¶
Tool call arguments are logged for debugging and compliance, but sensitive values are automatically redacted before storage. The following parameter keys are replaced with [REDACTED]:
passwordsecrettokenapi_keyauthorizationcredentials
Matching is case-sensitive and exact. A parameter named user_password would not be redacted (only password is matched).
Database Schema¶
The audit_logs table is partitioned by created_date for efficient retention management:
CREATE TABLE audit_logs (
id VARCHAR(32) NOT NULL,
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
duration_ms INTEGER,
request_id VARCHAR(255),
user_id VARCHAR(255),
user_email VARCHAR(255),
persona VARCHAR(100),
tool_name VARCHAR(255) NOT NULL,
toolkit_kind VARCHAR(100),
toolkit_name VARCHAR(100),
connection VARCHAR(100),
parameters JSONB,
success BOOLEAN NOT NULL,
error_message TEXT,
created_date DATE NOT NULL DEFAULT CURRENT_DATE,
PRIMARY KEY (id, created_date)
) PARTITION BY RANGE (created_date);
Indexes¶
| Index | Column | Use Case |
|---|---|---|
idx_audit_logs_timestamp |
timestamp |
Time-range queries |
idx_audit_logs_user_id |
user_id |
Per-user audit trails |
idx_audit_logs_tool_name |
tool_name |
Tool usage analytics |
idx_audit_logs_success |
success |
Failure investigation |
idx_audit_logs_created_date |
created_date |
Partition pruning, retention cleanup |
Querying Audit Logs¶
Recent tool calls by user¶
SELECT timestamp, tool_name, toolkit_kind, success, duration_ms
FROM audit_logs
WHERE user_id = '[email protected]'
ORDER BY timestamp DESC
LIMIT 20;
Failed calls in the last 24 hours¶
SELECT timestamp, user_id, tool_name, error_message, duration_ms
FROM audit_logs
WHERE success = false
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY timestamp DESC;
Tool usage by toolkit¶
SELECT toolkit_kind, tool_name, COUNT(*) as calls,
AVG(duration_ms) as avg_ms, SUM(CASE WHEN success THEN 0 ELSE 1 END) as failures
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY toolkit_kind, tool_name
ORDER BY calls DESC;
Persona activity¶
SELECT persona, COUNT(*) as calls, COUNT(DISTINCT user_id) as users
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '30 days'
GROUP BY persona
ORDER BY calls DESC;
Slow queries (> 5 seconds)¶
SELECT timestamp, user_id, tool_name, duration_ms,
parameters->>'sql' as query
FROM audit_logs
WHERE duration_ms > 5000
AND toolkit_kind = 'trino'
ORDER BY duration_ms DESC
LIMIT 10;
Enrichment token savings¶
SELECT enrichment_mode,
COUNT(*) AS calls,
SUM(enrichment_tokens_full) AS tokens_full,
SUM(enrichment_tokens_dedup) AS tokens_dedup,
SUM(enrichment_tokens_full) - SUM(enrichment_tokens_dedup) AS tokens_saved
FROM audit_logs
WHERE enrichment_applied = true
AND timestamp > NOW() - INTERVAL '7 days'
GROUP BY enrichment_mode
ORDER BY calls DESC;
Discovery-before-query patterns¶
WITH session_tools AS (
SELECT session_id, toolkit_kind,
MIN(timestamp) AS first_call
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY session_id, toolkit_kind
),
session_patterns AS (
SELECT session_id,
BOOL_OR(toolkit_kind = 'datahub') AS has_discovery,
BOOL_OR(toolkit_kind = 'trino') AS has_query,
MIN(CASE WHEN toolkit_kind = 'datahub' THEN first_call END) AS first_discovery,
MIN(CASE WHEN toolkit_kind = 'trino' THEN first_call END) AS first_query
FROM session_tools
GROUP BY session_id
)
SELECT
COUNT(*) AS total_sessions,
COUNT(*) FILTER (WHERE has_discovery AND has_query AND first_discovery < first_query) AS discovery_first,
COUNT(*) FILTER (WHERE has_query AND NOT has_discovery) AS query_without_discovery
FROM session_patterns;
Retention, partition rotation, and cleanup¶
A background maintenance routine runs every 24 hours and performs three steps in order:
- Ensure upcoming partitions. Create named monthly partitions for the next two months (
audit_logs_YYYY_MM). The current month is intentionally skipped on brownfield deployments so existing rows inaudit_logs_defaultdo not conflict with a new named partition over the same date range. This step is idempotent (CREATE TABLE IF NOT EXISTS). - Delete expired rows. Remove rows where
timestamp < NOW() - INTERVAL '<retention_days> days'. PostgreSQL prunes the DELETE to only the partitions that overlap the retention window. - Drop fully-expired partitions. Drop any
audit_logs_YYYY_MMwhose entire date range ends at or before the retention cutoff.DROP TABLEon a partition is effectively constant time and reclaims storage immediately, unlike row-level DELETE.
-- step 1 (illustrative)
CREATE TABLE IF NOT EXISTS audit_logs_2026_06 PARTITION OF audit_logs
FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
-- step 2
DELETE FROM audit_logs WHERE timestamp < NOW() - INTERVAL '90 days';
-- step 3 (illustrative, when audit_logs_2025_12 is fully past retention)
DROP TABLE IF EXISTS audit_logs_2025_12;
The maintenance routine starts automatically when the audit store is initialized. The eager pre-tick partition creation also runs once at startup so rows written between startup and the first tick land in a named partition when their month is covered.
Failures in any one step are logged and isolated: a transient failure to create a partition does not skip the retention DELETE on the same tick.
Multi-replica safety¶
Every replica runs its own 24h ticker, but the maintenance work itself is guarded by a PostgreSQL advisory lock (pg_try_advisory_lock). Each tick:
- Each pod opens a dedicated connection and calls
pg_try_advisory_lockon a stable lock key. - Exactly one pod acquires the lock; the rest get
falseback and exit silently to wait for the next tick. - The winning pod runs ensure → DELETE → drop, then
pg_advisory_unlocks.
This means the DELETE scan, partition CREATE, and partition DROP each run exactly once per tick across the cluster, regardless of replica count. The advisory lock is session-scoped to the dedicated connection so it cannot leak across pods, and IF NOT EXISTS / IF EXISTS on CREATE/DROP means even a degenerate dual-acquire (which the lock prevents) would still be idempotent.
For deployments running at high volume (the canonical motivating case is Apache NiFi calling the gateway REST shim at order-of-magnitude-per-second), monthly partition rotation keeps the working DELETE bounded to recent partitions and lets old data be bulk-dropped as whole partitions rather than scanned row-by-row.
How It Works¶
Audit logging is implemented as MCP protocol-level middleware. The middleware chain execution order is:
- MCPToolCallMiddleware (outermost) authenticates the user, resolves the persona, looks up toolkit metadata, and stores everything in a
PlatformContexton the request context. - MCPAuditMiddleware (inner to auth) receives the context with
PlatformContextalready set. It records the start time, calls the next handler, measures duration, then reads all fields fromPlatformContextto build the audit event. - The audit event is written asynchronously in a goroutine to avoid blocking the tool response. If the database write fails, the error is logged via
slog.Errorbut the tool call still succeeds.
Unauthorized requests are rejected by MCPToolCallMiddleware before reaching the audit middleware, so they are not logged. Only authenticated, authorized tool calls appear in audit logs.
Troubleshooting¶
No audit logs appearing¶
- Verify both
audit.enabled: trueANDaudit.log_tool_calls: trueare set in your config. - Verify
database.dsnis configured and the database is reachable. - Check platform startup logs for
"audit logging enabled". If you see"using noop logger", the database connection failed. - Check for
slog.Errormessages containing"failed to log audit event", which indicate database write failures.
Audit logs have empty fields¶
If user_id or other identity fields are empty, check your authentication configuration. API key authentication populates user_id with the key name. OIDC populates it from the sub claim.
If toolkit_kind, toolkit_name, or connection are empty, the tool was not found in any registered toolkit's tool list. This can happen with platform-level tools like platform_info that are not part of a toolkit.