Audit Logging¶

Every tool call flows through the audit logging middleware, which records who called what, when, how long it took, and whether it succeeded. Audit logs are stored in PostgreSQL and automatically cleaned up based on a configurable retention period.

Prerequisites¶

Audit logging requires:

A PostgreSQL database (version 13+)
The database.dsn configuration set
Both audit.enabled and audit.log_tool_calls set to true

Database migrations run automatically on startup and create the audit_logs table.

Configuration¶

database:
  dsn: "${DATABASE_DSN}"

audit:
  enabled: true
  log_tool_calls: true
  retention_days: 90

Option	Type	Default	Description
`database.dsn`	string	-	PostgreSQL connection string. Required for audit logging.
`audit.enabled`	bool	`false`	Master switch for audit logging.
`audit.log_tool_calls`	bool	`false`	Log every `tools/call` request. Both this and `enabled` must be `true`.
`audit.retention_days`	int	`90`	Days to keep audit logs before automatic cleanup.

If audit.enabled is true but no database is configured, the platform logs a warning and falls back to a no-op logger.

What Gets Logged¶

Every successful or failed tool call produces one row in audit_logs:

{
  "id": "3sK2DlE7x9mPqR4vNw8bYA",
  "timestamp": "2026-02-04T10:30:00.123Z",
  "duration_ms": 152,
  "request_id": "req-a1b2c3d4e5f67890abcdef1234567890",
  "user_id": "[email protected]",
  "user_email": "[email protected]",
  "persona": "analyst",
  "tool_name": "trino_query",
  "toolkit_kind": "trino",
  "toolkit_name": "production",
  "connection": "prod-trino",
  "parameters": {"sql": "SELECT count(*) FROM orders"},
  "success": true,
  "error_message": "",
  "event_kind": "mcp_tool_call",
  "created_date": "2026-02-04"
}

Field Reference¶

Field	Type	Description
`id`	VARCHAR(32)	Cryptographically random event ID (base64url-encoded, 16 bytes).
`timestamp`	TIMESTAMPTZ	When the tool call started.
`duration_ms`	INTEGER	Wall-clock time from request start to handler completion, in milliseconds. Includes rule enforcement and semantic enrichment time.
`request_id`	VARCHAR(255)	Unique request ID generated by the auth middleware (hex-encoded, 16 bytes).
`user_id`	VARCHAR(255)	Authenticated user identity. From OIDC `sub` claim, API key name, or OAuth token subject.
`user_email`	VARCHAR(255)	User email from OIDC claims, if available.
`persona`	VARCHAR(100)	Resolved persona name (e.g., `analyst`, `admin`). Set by the authorizer based on the user's roles.
`tool_name`	VARCHAR(255)	MCP tool name (e.g., `trino_query`, `datahub_search`, `s3_list_buckets`). For gateway-proxied tools, `<connection>__<remote_tool>` (e.g., `vendor__list_contacts`).
`toolkit_kind`	VARCHAR(100)	Toolkit type: `trino`, `datahub`, `s3`, or `mcp` (proxied through the gateway toolkit).
`toolkit_name`	VARCHAR(100)	Toolkit instance name from configuration (e.g., `production`, `staging`). For gateway proxied tools, the gateway toolkit's own name (typically `primary`).
`connection`	VARCHAR(100)	Connection name. For native toolkits, the connection used to route the call (e.g., `prod-trino`). For gateway proxied tools, the upstream MCP connection name (e.g., `vendor`) — populated via the registry's `ConnectionResolver` interface so per-upstream auditing is accurate without relying on caller-supplied args.
`parameters`	JSONB	Tool call arguments with sensitive values redacted. See Parameter Sanitization.
`success`	BOOLEAN	`true` if the tool handler returned without error and `IsError` was not set.
`error_message`	TEXT	Error description if `success` is `false`.
`session_id`	VARCHAR(255)	MCP session ID. Links tool calls within the same session for pattern analysis.
`response_chars`	INTEGER	Character count of the tool response.
`content_blocks`	INTEGER	Number of content blocks in the tool response.
`request_chars`	INTEGER	Character count of the tool request parameters.
`transport`	VARCHAR(50)	Transport type: `stdio` or `http`.
`source`	VARCHAR(50)	Caller class. `mcp` = agent over a real MCP transport. `rest` = external automation through the gateway REST shim (NiFi, cronjobs, integrations). `admin` = portal-driven tool execution via the admin REST API. See Caller class via source.
`enrichment_applied`	BOOLEAN	Whether semantic enrichment was applied to this tool call's response.
`authorized`	BOOLEAN	Whether the tool call was authorized by the persona system.
`enrichment_tokens_full`	INTEGER	Estimated tokens for the full (non-dedup) enrichment content. Uses `chars / 4` approximation.
`enrichment_tokens_dedup`	INTEGER	Estimated tokens for the dedup enrichment content. `0` when full enrichment was sent.
`enrichment_mode`	VARCHAR(20)	Enrichment mode used: `full`, `summary`, `reference`, `none`, or empty (not enriched).
`event_kind`	VARCHAR(64)	High-level event category: `apigateway_invoke` for HTTP API calls through the apigateway toolkit, `mcp_tool_call` for every other toolkit. Lets the Activity view split gateway traffic from MCP tool calls. See Event kind.
`created_date`	DATE	Partition key derived from `timestamp`. Used for retention cleanup.

Caller class via source¶

Tools on this platform are reachable through three entry points, all of which fire the same MCP audit middleware. The source field on each audit row records which path was used so operators can separate the populations without having to know which user IDs belong to which class of caller.

`source`	What it means	Typical caller
`mcp`	Real MCP transport (stdio or HTTP/SSE)	Claude, other interactive MCP agents
`rest`	Gateway REST shim at `POST /api/v1/gateway/{connection}/invoke`	Apache NiFi, cronjobs, integrations, anything HTTP that wraps the platform
`admin`	Admin REST API tool execution at `POST /api/v1/admin/tools/call`	Portal UI "test this tool" buttons, ops scripts

Both the gateway REST shim (pkg/gatewayhttp/handler.go) and the admin tool runner (pkg/admin/tools.go) open an in-memory MCP session against the assembled server and call the same api_invoke_endpoint (or other) tool that an agent would call. The handlers tag the context with middleware.WithSource before opening that session so the audit middleware records the originating caller class, not just "mcp".

Filter by source in the admin API:

GET /api/v1/admin/audit/events?source=mcp     # agents only
GET /api/v1/admin/audit/events?source=rest    # NiFi-class only
GET /api/v1/admin/audit/events?source=admin   # portal-driven only

Or in the portal UI, use the All Sources dropdown on the Audit Log page. The dropdown lists every source value seen in the current time window.

Event kind: MCP vs API gateway¶

The event_kind field separates two classes of audited activity that otherwise share the same row shape:

`event_kind`	What it means
`mcp_tool_call`	A tool routed through one of the MCP toolkits (`trino`, `datahub`, `s3`, or the MCP gateway).
`apigateway_invoke`	An HTTP API call proxied through the apigateway toolkit (`api_invoke_endpoint`, `api_export`, and the other `api_*` tools).

The kind is derived at write time from the toolkit kind, so it does not depend on tool-name string matching. A high-traffic API gateway can produce many rows per agent turn; the split lets the MCP Activity view exclude that traffic by default while a dedicated gateway view includes it.

Filter by event kind in the admin API. The filter is accepted on the event list, stats, and every metrics endpoint (timeseries, breakdown, overview, performance, enrichment, discovery):

GET /api/v1/admin/audit/events?event_kind=mcp_tool_call        # MCP tool calls only
GET /api/v1/admin/audit/events?event_kind=apigateway_invoke    # API gateway calls only
GET /api/v1/admin/audit/metrics/timeseries?event_kind=mcp_tool_call

Parameter Sanitization¶

Tool call arguments are logged for debugging and compliance, but sensitive values are automatically redacted before storage. The following parameter keys are replaced with [REDACTED]:

password
secret
token
api_key
authorization
credentials

Matching is case-sensitive and exact. A parameter named user_password would not be redacted (only password is matched).

Database Schema¶

The audit_logs table is partitioned by created_date for efficient retention management:

CREATE TABLE audit_logs (
    id              VARCHAR(32) NOT NULL,
    timestamp       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    duration_ms     INTEGER,
    request_id      VARCHAR(255),
    user_id         VARCHAR(255),
    user_email      VARCHAR(255),
    persona         VARCHAR(100),
    tool_name       VARCHAR(255) NOT NULL,
    toolkit_kind    VARCHAR(100),
    toolkit_name    VARCHAR(100),
    connection      VARCHAR(100),
    parameters      JSONB,
    success         BOOLEAN NOT NULL,
    error_message   TEXT,
    created_date    DATE NOT NULL DEFAULT CURRENT_DATE,
    PRIMARY KEY (id, created_date)
) PARTITION BY RANGE (created_date);

Indexes¶

Index	Column	Use Case
`idx_audit_logs_timestamp`	`timestamp`	Time-range queries
`idx_audit_logs_user_id`	`user_id`	Per-user audit trails
`idx_audit_logs_tool_name`	`tool_name`	Tool usage analytics
`idx_audit_logs_success`	`success`	Failure investigation
`idx_audit_logs_created_date`	`created_date`	Partition pruning, retention cleanup

Querying Audit Logs¶

Recent tool calls by user¶

SELECT timestamp, tool_name, toolkit_kind, success, duration_ms
FROM audit_logs
WHERE user_id = '[email protected]'
ORDER BY timestamp DESC
LIMIT 20;

Failed calls in the last 24 hours¶

SELECT timestamp, user_id, tool_name, error_message, duration_ms
FROM audit_logs
WHERE success = false
  AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY timestamp DESC;

Tool usage by toolkit¶

SELECT toolkit_kind, tool_name, COUNT(*) as calls,
       AVG(duration_ms) as avg_ms, SUM(CASE WHEN success THEN 0 ELSE 1 END) as failures
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY toolkit_kind, tool_name
ORDER BY calls DESC;

Persona activity¶

SELECT persona, COUNT(*) as calls, COUNT(DISTINCT user_id) as users
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '30 days'
GROUP BY persona
ORDER BY calls DESC;

Slow queries (> 5 seconds)¶

SELECT timestamp, user_id, tool_name, duration_ms,
       parameters->>'sql' as query
FROM audit_logs
WHERE duration_ms > 5000
  AND toolkit_kind = 'trino'
ORDER BY duration_ms DESC
LIMIT 10;

Enrichment token savings¶

SELECT enrichment_mode,
       COUNT(*) AS calls,
       SUM(enrichment_tokens_full) AS tokens_full,
       SUM(enrichment_tokens_dedup) AS tokens_dedup,
       SUM(enrichment_tokens_full) - SUM(enrichment_tokens_dedup) AS tokens_saved
FROM audit_logs
WHERE enrichment_applied = true
  AND timestamp > NOW() - INTERVAL '7 days'
GROUP BY enrichment_mode
ORDER BY calls DESC;

Discovery-before-query patterns¶

WITH session_tools AS (
    SELECT session_id, toolkit_kind,
           MIN(timestamp) AS first_call
    FROM audit_logs
    WHERE timestamp > NOW() - INTERVAL '7 days'
    GROUP BY session_id, toolkit_kind
),
session_patterns AS (
    SELECT session_id,
           BOOL_OR(toolkit_kind = 'datahub') AS has_discovery,
           BOOL_OR(toolkit_kind = 'trino') AS has_query,
           MIN(CASE WHEN toolkit_kind = 'datahub' THEN first_call END) AS first_discovery,
           MIN(CASE WHEN toolkit_kind = 'trino' THEN first_call END) AS first_query
    FROM session_tools
    GROUP BY session_id
)
SELECT
    COUNT(*) AS total_sessions,
    COUNT(*) FILTER (WHERE has_discovery AND has_query AND first_discovery < first_query) AS discovery_first,
    COUNT(*) FILTER (WHERE has_query AND NOT has_discovery) AS query_without_discovery
FROM session_patterns;

Retention, partition rotation, and cleanup¶

A background maintenance routine runs every 24 hours and performs three steps in order:

Ensure upcoming partitions. Create named monthly partitions for the next two months (audit_logs_YYYY_MM). The current month is intentionally skipped on brownfield deployments so existing rows in audit_logs_default do not conflict with a new named partition over the same date range. This step is idempotent (CREATE TABLE IF NOT EXISTS).
Delete expired rows. Remove rows where timestamp < NOW() - INTERVAL '<retention_days> days'. PostgreSQL prunes the DELETE to only the partitions that overlap the retention window.
Drop fully-expired partitions. Drop any audit_logs_YYYY_MM whose entire date range ends at or before the retention cutoff. DROP TABLE on a partition is effectively constant time and reclaims storage immediately, unlike row-level DELETE.

-- step 1 (illustrative)
CREATE TABLE IF NOT EXISTS audit_logs_2026_06 PARTITION OF audit_logs
    FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');

-- step 2
DELETE FROM audit_logs WHERE timestamp < NOW() - INTERVAL '90 days';

-- step 3 (illustrative, when audit_logs_2025_12 is fully past retention)
DROP TABLE IF EXISTS audit_logs_2025_12;

The maintenance routine starts automatically when the audit store is initialized. The eager pre-tick partition creation also runs once at startup so rows written between startup and the first tick land in a named partition when their month is covered.

Failures in any one step are logged and isolated: a transient failure to create a partition does not skip the retention DELETE on the same tick.

Multi-replica safety¶

Every replica runs its own 24h ticker, but the maintenance work itself is guarded by a PostgreSQL advisory lock (pg_try_advisory_lock). Each tick:

Each pod opens a dedicated connection and calls pg_try_advisory_lock on a stable lock key.
Exactly one pod acquires the lock; the rest get false back and exit silently to wait for the next tick.
The winning pod runs ensure → DELETE → drop, then pg_advisory_unlocks.

This means the DELETE scan, partition CREATE, and partition DROP each run exactly once per tick across the cluster, regardless of replica count. The advisory lock is session-scoped to the dedicated connection so it cannot leak across pods, and IF NOT EXISTS / IF EXISTS on CREATE/DROP means even a degenerate dual-acquire (which the lock prevents) would still be idempotent.

For deployments running at high volume (the canonical motivating case is Apache NiFi calling the gateway REST shim at order-of-magnitude-per-second), monthly partition rotation keeps the working DELETE bounded to recent partitions and lets old data be bulk-dropped as whole partitions rather than scanned row-by-row.

How It Works¶

Audit logging is implemented as MCP protocol-level middleware. The middleware chain execution order is:

Auth/Authz -> Audit -> Rules -> Enrichment -> Tool Handler

MCPToolCallMiddleware (outermost) authenticates the user, resolves the persona, looks up toolkit metadata, and stores everything in a PlatformContext on the request context.
MCPAuditMiddleware (inner to auth) receives the context with PlatformContext already set. It records the start time, calls the next handler, measures duration, then reads all fields from PlatformContext to build the audit event.
The audit event is written asynchronously in a goroutine to avoid blocking the tool response. If the database write fails, the error is logged via slog.Error but the tool call still succeeds.

Unauthorized requests are rejected by MCPToolCallMiddleware before reaching the audit middleware, so they are not logged. Only authenticated, authorized tool calls appear in audit logs.

Troubleshooting¶

No audit logs appearing¶

Verify both audit.enabled: true AND audit.log_tool_calls: true are set in your config.
Verify database.dsn is configured and the database is reachable.
Check platform startup logs for "audit logging enabled". If you see "using noop logger", the database connection failed.
Check for slog.Error messages containing "failed to log audit event", which indicate database write failures.

Audit logs have empty fields¶

If user_id or other identity fields are empty, check your authentication configuration. API key authentication populates user_id with the key name. OIDC populates it from the sub claim.

If toolkit_kind, toolkit_name, or connection are empty, the tool was not found in any registered toolkit's tool list. This can happen with platform-level tools like platform_info that are not part of a toolkit.