Tools API Reference¶

Complete specification for all MCP tools provided by mcp-data-platform.

Error contract¶

Every failed tool call returns a uniform, self-describing error so an agent can tell a correctable mistake from a platform problem and act on it. A failure sets isError: true and carries both a human-readable text message and a machine-readable structuredContent.error object:

{
  "isError": true,
  "content": [
    { "type": "text", "text": "the \"asset_id\" parameter is required (code: missing_required_parameter) Hint: Supply \"asset_id\" and retry. This is a problem with the call's arguments, not a platform fault." }
  ],
  "structuredContent": {
    "error": {
      "code": "missing_required_parameter",
      "category": "client_input",
      "message": "the \"asset_id\" parameter is required",
      "hint": "Supply \"asset_id\" and retry. This is a problem with the call's arguments, not a platform fault."
    }
  }
}

Field	Meaning
`code`	Stable, machine-readable identifier the agent may branch on (for example `missing_required_parameter`, `not_found`, `unauthorized`, `setup_required`, `internal_error`).
`category`	Broad class (see below) telling the agent whose fault the failure is.
`message`	The specific failure.
`hint`	The corrective action, when the caller can take one.

Categories

Category	Whose fault	What to do
`client_input`	The call	Fix the arguments and retry.
`not_found`	The call	The named resource does not exist; correct the reference.
`authentication_failed`	The caller's identity	Provide valid credentials.
`authorization_denied`	The caller's identity	The persona is not permitted; request access.
`user_declined`	The user	A consent prompt was declined.
`setup_required`	Session state	Call the required setup tool first.
`feature_unavailable`	Deployment config	The feature is not enabled on this deployment; do not present it as an outage.
`internal`	The platform	Not the caller's fault; do not retry with modified input.
`tool_error`	Unclassified	A tool failure that has not been given a finer category; the message is still descriptive.

The contract is uniform by construction: a normalization layer guarantees every error result carries this envelope even when an individual tool returns only a bare message, so an agent never receives an opaque, undifferentiated string. The category is also recorded on the audit log (error_category) for operators.

Trino Tools¶

trino_query¶

Execute a read-only SQL query against the Trino cluster. Write operations (INSERT, UPDATE, DELETE, CREATE, DROP, etc.) are rejected before reaching Trino. Annotated with ReadOnlyHint: true for MCP client auto-approval.

Parameters:

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	SQL query to execute (read-only)
`limit`	integer	No	1000	Maximum rows to return (capped by max_limit config)
`connection`	string	No	first configured	Trino connection name

Response Schema:

{
  "columns": [
    {"name": "column_name", "type": "varchar"}
  ],
  "rows": [
    ["value1", "value2"]
  ],
  "row_count": 100,
  "execution_time_ms": 250,
  "query_id": "20240115_123456_00001_xxxxx"
}

Enrichment (when enabled):

{
  "semantic_context": {
    "description": "Table description from DataHub",
    "owners": [{"name": "Team Name", "type": "group"}],
    "tags": ["tag1", "tag2"],
    "domain": {"name": "Domain Name"},
    "quality_score": 0.95,
    "deprecation": null
  }
}

Error Codes:

Code	Cause
`SYNTAX_ERROR`	Invalid SQL syntax
`TABLE_NOT_FOUND`	Referenced table doesn't exist
`PERMISSION_DENIED`	Insufficient privileges
`TIMEOUT`	Query exceeded timeout
`WRITE_REJECTED`	Write SQL rejected (use `trino_execute` instead)

trino_execute¶

Execute any SQL against the Trino cluster, including write operations (INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, etc.). Annotated with DestructiveHint: true so MCP clients prompt for confirmation.

When read_only: true is configured at the instance level, write operations are blocked.

Parameters:

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	SQL query to execute
`limit`	integer	No	1000	Maximum rows to return (capped by max_limit config)
`connection`	string	No	first configured	Trino connection name

Response Schema: Same as trino_query.

trino_explain¶

Get the execution plan for a SQL query.

Parameters:

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	SQL query to explain
`connection`	string	No	first configured	Trino connection name

Response Schema:

{
  "plan": "Query Plan\n- TableScan[table = ...]\n  ...",
  "format": "text"
}

trino_browse¶

Browse the Trino catalog hierarchy. Omit all parameters to list catalogs. Provide catalog to list schemas. Provide catalog and schema to list tables.

Parameters:

Parameter	Type	Required	Default	Description
`catalog`	string	No	-	Catalog name. Omit to list all catalogs
`schema`	string	No	-	Schema name. Requires `catalog`. Omit to list schemas
`pattern`	string	No	-	LIKE pattern to filter tables (only when listing tables)
`connection`	string	No	first configured	Trino connection name

Response Schema (list catalogs):

{
  "catalogs": ["hive", "iceberg", "memory"]
}

Response Schema (list schemas):

{
  "catalog": "hive",
  "schemas": ["default", "sales", "marketing"]
}

Response Schema (list tables):

{
  "catalog": "hive",
  "schema": "sales",
  "tables": [
    {"name": "orders", "type": "TABLE"},
    {"name": "customers", "type": "TABLE"},
    {"name": "daily_revenue", "type": "VIEW"}
  ]
}

trino_describe_table¶

Get table schema and metadata.

Parameters:

Parameter	Type	Required	Default	Description
`table`	string	Yes	-	Table name (can be `catalog.schema.table`)
`connection`	string	No	first configured	Trino connection name

Response Schema:

{
  "table": {
    "catalog": "hive",
    "schema": "sales",
    "name": "orders"
  },
  "columns": [
    {
      "name": "order_id",
      "type": "bigint",
      "nullable": false,
      "comment": "Unique order identifier"
    }
  ],
  "partitioning": ["order_date"],
  "properties": {
    "format": "PARQUET"
  }
}

trino_list_connections¶

List configured Trino connections.

Parameters: None

Response Schema:

{
  "connections": [
    {
      "name": "primary",
      "display_name": "Production",
      "host": "trino.example.com",
      "catalog": "hive",
      "schema": "default"
    }
  ]
}

DataHub Tools¶

datahub_search¶

Search for entities in the catalog.

Parameters:

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	Search query
`type`	string	No	-	Entity type: `dataset`, `dashboard`, `chart`, `dataflow`
`platform`	string	No	-	Platform filter: `trino`, `snowflake`, `s3`, etc.
`limit`	integer	No	10	Maximum results (capped by max_limit config)
`connection`	string	No	first configured	DataHub connection name

Response Schema:

{
  "results": [
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)",
      "name": "orders",
      "description": "Customer orders",
      "platform": "trino",
      "type": "dataset",
      "owners": ["Data Team"],
      "tags": ["pii", "financial"]
    }
  ],
  "total": 150,
  "has_more": true
}

Enrichment (when enabled):

{
  "query_context": {
    "urn:li:dataset:...": {
      "queryable": true,
      "connection": "primary",
      "table_identifier": {
        "catalog": "hive",
        "schema": "sales",
        "table": "orders"
      },
      "sample_query": "SELECT * FROM hive.sales.orders LIMIT 10"
    }
  }
}

datahub_get_entity¶

Get detailed entity information.

Parameters:

Parameter	Type	Required	Default	Description
`urn`	string	Yes	-	Entity URN
`connection`	string	No	first configured	DataHub connection name

Response Schema:

{
  "urn": "urn:li:dataset:...",
  "type": "dataset",
  "name": "orders",
  "description": "Customer orders from e-commerce platform",
  "platform": "trino",
  "created": "2024-01-01T00:00:00Z",
  "modified": "2024-01-15T12:00:00Z",
  "owners": [
    {"name": "Data Team", "type": "group", "email": "[email protected]"}
  ],
  "tags": ["pii", "financial"],
  "glossary_terms": ["Order", "Transaction"],
  "domain": {
    "urn": "urn:li:domain:sales",
    "name": "Sales"
  },
  "deprecation": null,
  "custom_properties": {
    "refresh_schedule": "daily"
  }
}

datahub_get_schema¶

Get dataset schema.

Parameters:

Parameter	Type	Required	Default	Description
`urn`	string	Yes	-	Dataset URN
`connection`	string	No	first configured	DataHub connection name

Response Schema:

{
  "urn": "urn:li:dataset:...",
  "fields": [
    {
      "name": "order_id",
      "type": "NUMBER",
      "native_type": "bigint",
      "nullable": false,
      "description": "Unique order identifier",
      "tags": ["pii"],
      "glossary_terms": ["Order ID"]
    }
  ],
  "primary_keys": ["order_id"],
  "foreign_keys": []
}

datahub_get_lineage¶

Get upstream or downstream lineage for an entity. Set level=column for column-level lineage showing which upstream columns feed each downstream column. Default (dataset) returns dataset-level relationships with direction and depth control.

Parameters:

Parameter	Type	Required	Default	Description
`urn`	string	Yes	-	Entity URN
`level`	string	No	`dataset`	Granularity: `dataset` or `column`
`direction`	string	No	`downstream`	`upstream` or `downstream` (dataset level only)
`depth`	integer	No	3	Maximum traversal depth, max 5 (dataset level only)
`connection`	string	No	first configured	DataHub connection name

Response Schema (dataset level):

{
  "root": "urn:li:dataset:...",
  "direction": "downstream",
  "entities": [
    {
      "urn": "urn:li:dataset:...",
      "name": "daily_orders_agg",
      "type": "dataset",
      "depth": 1
    }
  ],
  "relationships": [
    {
      "source": "urn:li:dataset:orders",
      "target": "urn:li:dataset:daily_orders_agg",
      "type": "TRANSFORMED"
    }
  ]
}

Response Schema (column level):

{
  "root": "urn:li:dataset:...",
  "column_lineage": [
    {
      "downstream": {
        "urn": "urn:li:dataset:daily_orders_agg",
        "column": "total_revenue"
      },
      "upstreams": [
        {
          "urn": "urn:li:dataset:orders",
          "column": "total_amount"
        }
      ]
    }
  ]
}

datahub_get_queries¶

Get popular queries for a dataset.

Parameters:

Parameter	Type	Required	Default	Description
`urn`	string	Yes	-	Dataset URN
`limit`	integer	No	10	Maximum queries to return
`connection`	string	No	first configured	DataHub connection name

Response Schema:

{
  "urn": "urn:li:dataset:...",
  "queries": [
    {
      "query": "SELECT * FROM orders WHERE status = 'completed'",
      "user": "[email protected]",
      "executed_at": "2024-01-15T10:00:00Z",
      "execution_count": 150
    }
  ]
}

datahub_get_glossary_term¶

Get glossary term details.

Parameters:

Parameter	Type	Required	Default	Description
`urn`	string	Yes	-	Glossary term URN
`connection`	string	No	first configured	DataHub connection name

Response Schema:

{
  "urn": "urn:li:glossaryTerm:Revenue",
  "name": "Revenue",
  "description": "Total monetary value from sales transactions",
  "parent": "urn:li:glossaryTerm:FinancialMetrics",
  "related_terms": ["Gross Revenue", "Net Revenue"],
  "custom_properties": {
    "calculation": "SUM(line_item_amount)"
  }
}

datahub_browse¶

Browse the DataHub catalog by category. Set what=tags to list tags, what=domains to list data domains, or what=data_products to list data products.

Parameters:

Parameter	Type	Required	Default	Description
`what`	string	Yes	-	What to browse: `tags`, `domains`, or `data_products`
`filter`	string	No	-	Optional filter string (tags only)
`connection`	string	No	first configured	DataHub connection name

Response Schema (tags):

{
  "tags": [
    {"urn": "urn:li:tag:pii", "name": "pii", "description": "Contains PII"},
    {"urn": "urn:li:tag:financial", "name": "financial", "description": "Financial data"}
  ]
}

Response Schema (domains):

{
  "domains": [
    {
      "urn": "urn:li:domain:sales",
      "name": "Sales",
      "description": "Sales and revenue data",
      "entity_count": 45
    }
  ]
}

Response Schema (data_products):

{
  "data_products": [
    {
      "urn": "urn:li:dataProduct:customer360",
      "name": "Customer 360",
      "description": "Unified customer view",
      "domain": "urn:li:domain:marketing",
      "assets": 12
    }
  ]
}

datahub_get_data_product¶

Get data product details.

Parameters:

Parameter	Type	Required	Default	Description
`urn`	string	Yes	-	Data product URN
`connection`	string	No	first configured	DataHub connection name

Response Schema:

{
  "urn": "urn:li:dataProduct:customer360",
  "name": "Customer 360",
  "description": "Unified customer view combining all customer data sources",
  "domain": {
    "urn": "urn:li:domain:marketing",
    "name": "Marketing"
  },
  "owners": ["Marketing Data Team"],
  "assets": [
    {"urn": "urn:li:dataset:customers", "name": "customers", "type": "dataset"},
    {"urn": "urn:li:dataset:customer_events", "name": "customer_events", "type": "dataset"}
  ],
  "custom_properties": {
    "sla": "99.9%",
    "refresh": "hourly"
  }
}

datahub_list_connections¶

List configured DataHub connections.

Parameters: None

Response Schema:

{
  "connections": [
    {
      "name": "primary",
      "display_name": "Primary Catalog",
      "url": "https://datahub.example.com"
    }
  ]
}

datahub_create¶

Create a new entity or resource in DataHub. Uses the what discriminator to select the entity type. Only available when read_only: false.

Annotated with DestructiveHint: false, IdempotentHint: false, OpenWorldHint: true.

Parameters:

Parameter	Type	Required	Default	Description
`what`	string	Yes	-	Entity type to create (see table below)
`name`	string	Varies	-	Entity name (required for most types)
`connection`	string	No	first configured	DataHub connection name

Additional parameters vary by what value — see the mcp-datahub documentation for full parameter details per entity type.

`what`	Creates	Key fields
`tag`	Tag	`name`
`domain`	Domain	`name`
`glossary_term`	Glossary term	`name`
`data_product`	Data product	`name`, `domain_urn`
`document`	Context document (1.4.x+)	`name`
`application`	Application	`name`
`query`	Saved query	`value` (SQL)
`incident`	Incident	`name`, `incident_type`, `entity_urns`
`structured_property`	Structured property	`qualified_name`, `value_type`, `entity_types`
`data_contract`	Data contract	`dataset_urns`

Response Schema:

{
  "urn": "urn:li:tag:new-tag",
  "message": "Created tag 'new-tag'"
}

datahub_update¶

Update metadata on an existing DataHub entity. Uses the what discriminator to select what to update, with an optional action for add/remove operations. Only available when read_only: false.

Annotated with DestructiveHint: false, IdempotentHint: true, OpenWorldHint: true.

Parameters:

Parameter	Type	Required	Default	Description
`what`	string	Yes	-	What to update (see table below)
`urn`	string	Varies	-	Entity URN to update
`action`	string	Varies	-	`add` or `remove` (required for tags, glossary terms, links, owners)
`connection`	string	No	first configured	DataHub connection name

Additional parameters vary by what value — see the mcp-datahub documentation for full parameter details.

`what`	`action`	Description
`description`	—	Set entity description
`column_description`	—	Set schema field description
`tag`	add/remove	Add or remove a tag
`glossary_term`	add/remove	Add or remove a glossary term
`link`	add/remove	Add or remove a link
`owner`	add/remove	Add or remove an owner
`domain`	set/remove	Set or remove domain assignment
`structured_properties`	set/remove	Set or remove structured property values
`structured_property`	—	Update a structured property definition
`incident_status`	—	Update incident status
`incident`	—	Update incident details
`query`	—	Update query properties
`document_contents`	—	Update document title/text (1.4.x+)
`document_status`	—	Update document status (1.4.x+)
`document_related_entities`	—	Update document related entities (1.4.x+)
`document_sub_type`	—	Update document sub-type (1.4.x+)
`data_contract`	—	Upsert a data contract

Response Schema:

{
  "urn": "urn:li:dataset:...",
  "message": "Updated description on urn:li:dataset:..."
}

datahub_delete¶

Delete an entity or resource from DataHub. Uses the what discriminator to select the entity type. Only available when read_only: false.

Annotated with DestructiveHint: true, IdempotentHint: true, OpenWorldHint: true.

Parameters:

Parameter	Type	Required	Default	Description
`what`	string	Yes	-	Entity type to delete (see below)
`urn`	string	Yes	-	Entity URN to delete
`connection`	string	No	first configured	DataHub connection name

Supported what values: query, tag, domain, glossary_entity, data_product, application, document, structured_property.

Response Schema:

{
  "urn": "urn:li:tag:old-tag",
  "message": "Deleted tag 'old-tag'"
}

S3 Tools¶

s3_list_buckets¶

List available S3 buckets.

Parameters:

Parameter	Type	Required	Default	Description
`connection`	string	No	first configured	S3 connection name

Response Schema:

{
  "buckets": [
    {
      "name": "data-lake",
      "creation_date": "2024-01-01T00:00:00Z",
      "region": "us-east-1"
    }
  ]
}

s3_list_objects¶

List objects in a bucket.

Parameters:

Parameter	Type	Required	Default	Description
`bucket`	string	Yes	-	Bucket name
`prefix`	string	No	-	Key prefix filter
`delimiter`	string	No	-	Delimiter for hierarchy (typically `/`)
`max_keys`	integer	No	1000	Maximum objects to return
`connection`	string	No	first configured	S3 connection name

Response Schema:

{
  "bucket": "data-lake",
  "prefix": "sales/orders/",
  "objects": [
    {
      "key": "sales/orders/2024/01/data.parquet",
      "size": 52428800,
      "last_modified": "2024-01-15T10:30:00Z",
      "storage_class": "STANDARD"
    }
  ],
  "common_prefixes": ["sales/orders/2024/02/"],
  "is_truncated": false
}

s3_get_object¶

Get object contents.

Parameters:

Parameter	Type	Required	Default	Description
`bucket`	string	Yes	-	Bucket name
`key`	string	Yes	-	Object key
`connection`	string	No	first configured	S3 connection name

Response Schema:

{
  "bucket": "data-lake",
  "key": "config/settings.json",
  "content": "{\"setting\": \"value\"}",
  "content_type": "application/json",
  "size": 25,
  "last_modified": "2024-01-15T10:30:00Z"
}

Note: Content is limited by max_get_size configuration.

s3_get_object_metadata¶

Get object metadata without downloading content.

Parameters:

Parameter	Type	Required	Default	Description
`bucket`	string	Yes	-	Bucket name
`key`	string	Yes	-	Object key
`connection`	string	No	first configured	S3 connection name

Response Schema:

{
  "bucket": "data-lake",
  "key": "sales/orders/data.parquet",
  "size": 52428800,
  "content_type": "application/octet-stream",
  "last_modified": "2024-01-15T10:30:00Z",
  "etag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
  "metadata": {
    "x-amz-meta-created-by": "etl-pipeline"
  }
}

s3_presign_url¶

Generate a pre-signed URL.

Parameters:

Parameter	Type	Required	Default	Description
`bucket`	string	Yes	-	Bucket name
`key`	string	Yes	-	Object key
`expires`	string	No	`15m`	URL expiration (e.g., `1h`, `30m`)
`connection`	string	No	first configured	S3 connection name

Response Schema:

{
  "url": "https://bucket.s3.amazonaws.com/key?X-Amz-...",
  "expires_at": "2024-01-15T11:00:00Z"
}

s3_list_connections¶

List configured S3 connections.

Parameters: None

Response Schema:

{
  "connections": [
    {
      "name": "primary",
      "display_name": "Data Lake",
      "region": "us-east-1",
      "read_only": true
    }
  ]
}

s3_put_object¶

Upload an object. Only available when read_only: false.

Parameters:

Parameter	Type	Required	Default	Description
`bucket`	string	Yes	-	Bucket name
`key`	string	Yes	-	Object key
`content`	string	Yes	-	Object content
`content_type`	string	No	`application/octet-stream`	MIME type
`connection`	string	No	first configured	S3 connection name

Response Schema:

{
  "bucket": "data-lake",
  "key": "uploads/file.json",
  "etag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
  "size": 1024
}

s3_delete_object¶

Delete an object. Only available when read_only: false.

Parameters:

Parameter	Type	Required	Default	Description
`bucket`	string	Yes	-	Bucket name
`key`	string	Yes	-	Object key
`connection`	string	No	first configured	S3 connection name

Response Schema:

{
  "bucket": "data-lake",
  "key": "uploads/file.json",
  "deleted": true
}

s3_copy_object¶

Copy an object. Only available when read_only: false.

Parameters:

Parameter	Type	Required	Default	Description
`source_bucket`	string	Yes	-	Source bucket name
`source_key`	string	Yes	-	Source object key
`dest_bucket`	string	Yes	-	Destination bucket name
`dest_key`	string	Yes	-	Destination object key
`connection`	string	No	first configured	S3 connection name

Response Schema:

{
  "source": {
    "bucket": "data-lake",
    "key": "original/file.json"
  },
  "destination": {
    "bucket": "data-lake",
    "key": "backup/file.json"
  },
  "copied": true
}

Knowledge Tools¶

For the full governance workflow, see Knowledge Capture.

memory_capture¶

Record domain knowledge shared during a session (memory toolkit). Available to all personas when the memory layer is enabled (memory defaults on when a database is configured; memory.enabled: false disables capture).

Parameters:

Parameter	Type	Required	Default	Description
`type`	string	Yes	-	Sink-class. Live: `personal_preference`, `episodic_event`. Reviewed (creates a pending insight): `business_knowledge`, `schema_entity`, `operational_rule`
`content`	string	Yes	-	The knowledge to record (10-4000 characters)
`confidence`	string	No	`medium`	Confidence level: `high`, `medium`, `low`
`entity_urns`	array	No	`[]`	DataHub URNs this knowledge relates to (max 10)
`related_columns`	array	No	`[]`	Columns related to this knowledge (max 20)
`suggested_actions`	array	No	`[]`	Proposed catalog changes (max 5)

Suggested Action Schema:

Field	Type	Required	Description
`action_type`	string	Yes	One of: `update_description`, `add_tag`, `add_glossary_term`, `flag_quality_issue`, `add_documentation`, `add_curated_query`
`target`	string	Yes	Target of the change (entity name, column name, or URL)
`detail`	string	Yes	Change detail (new description, tag name, term name, query name, etc.)
`query_sql`	string	Conditional	SQL statement (required for `add_curated_query`)
`query_description`	string	No	Optional description for `add_curated_query`

Response Schema:

{
  "insight_id": "a1b2c3d4e5f67890a1b2c3d4e5f67890",
  "status": "pending",
  "message": "Insight captured. It will be reviewed by a data catalog administrator."
}

apply_knowledge¶

Review, synthesize, and apply captured insights to the data catalog. Admin-only. Requires knowledge.apply.enabled: true.

Parameters:

Parameter	Type	Required	Description
`action`	string	Yes	One of: `bulk_review`, `review`, `synthesize`, `apply`, `approve`, `reject`, `rollback`, `list_changesets`
`entity_urn`	string	Conditional	Required for `review`, `synthesize`, `apply`, `list_changesets`; optional for `rollback` (validates the changeset belongs to this entity)
`insight_ids`	array	Conditional	Required for `approve`, `reject`; optional for `synthesize`, `apply`
`changes`	array	Conditional	Required for `apply`
`changeset_id`	string	Conditional	Required for `rollback`
`confirm`	bool	No	Required when `require_confirmation` is enabled (for `apply` and `rollback`)
`review_notes`	string	No	Notes for `approve`/`reject` actions
`itemize`	bool	No	With `bulk_review`, also return the pending insights themselves (each with `captured_by`, `sink_class`), paginated by `offset`/`limit`
`limit`	int	No	Page size for itemized `bulk_review` (default 20, max 100)
`offset`	int	No	Page start for itemized `bulk_review`; pass the previous `next_offset` to continue

Change Schema (for apply action):

Field	Type	Required	Description
`change_type`	string	Yes	One of: `update_description`, `add_tag`, `remove_tag`, `add_glossary_term`, `flag_quality_issue`, `add_documentation`, `add_curated_query`, `set_structured_property`, `remove_structured_property`, `raise_incident`, `resolve_incident`, `add_context_document`, `update_context_document`, `remove_context_document`
`target`	string	Yes	Target of the change (see below)
`detail`	string	Yes	Change detail (see below)
`query_sql`	string	Conditional	SQL statement (required for `add_curated_query`). For `update_context_document`, the new title
`query_description`	string	No	Optional description for `add_curated_query`. For `add_context_document`/`update_context_document`, the document category

Target and detail by change type:

Change Type	Target	Detail
`update_description`	`column:<fieldPath>` for column-level, empty for entity-level	Description text
`add_tag` / `remove_tag`	Ignored	Tag name or URN
`add_glossary_term`	Ignored	Term name or URN
`flag_quality_issue`	Ignored	Quality issue description (sets the `QualityIssue` tag; with a detail, also raises a DataHub incident carrying it)
`add_documentation`	URL	Link description
`add_curated_query`	Ignored	Query name
`set_structured_property`	Property qualified name or URN	Value or JSON array
`remove_structured_property`	Property qualified name or URN	Removal reason
`raise_incident`	Incident title	Description
`resolve_incident`	Incident URN	Resolution message
`add_context_document`	Document title	Document content
`update_context_document`	Document ID	New content
`remove_context_document`	Document ID	Ignored

Actions:

Action	Description	Required Params
`bulk_review`	Counts of all pending insights; pass optional `itemize: true` (with `limit`/`offset`) to enumerate the queue, each with `captured_by` and `sink_class`	None
`review`	Insights for a specific entity with current DataHub metadata	`entity_urn`
`approve`	Transition insights to approved status	`insight_ids`
`reject`	Transition insights to rejected status	`insight_ids`
`synthesize`	Structured change proposals from approved insights	`entity_urn`
`apply`	Write changes to DataHub with changeset tracking	`entity_urn`, `changes`
`list_changesets`	List an entity's changesets (id, timestamp, actor, change type, rollback status)	`entity_urn`
`rollback`	Revert a changeset's changes to their before-image	`changeset_id`, `confirm`

rollback reverts the changes an apply made: it removes added tags/glossary terms/documentation links (keeping any that pre-existed in the before-image), restores a changed description, transitions the source insights to rolled_back, and marks the changeset rolled back. It is refused if the changeset is already rolled back, if a newer changeset has since modified the same aspect, or if the changeset touched change types whose prior state was not captured (column descriptions, structured properties, incidents, curated queries, context documents, prompts).

Response Schema (apply):

{
  "changeset_id": "cs_x1y2z3a4b5c6d7e8f9a0b1c2d3e4f5a6",
  "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)",
  "changes_applied": 2,
  "insights_marked_applied": 1,
  "resulting_state": {
    "description": "Order records with gross margin amounts (before returns)",
    "tags": ["urn:li:tag:gross-margin"],
    "glossary_terms": [],
    "owners": []
  },
  "message": "Changes applied to DataHub. Roll back with action=rollback changeset_id=cs_x1y2z3a4b5c6d7e8f9a0b1c2d3e4f5a6. changes_applied counts requested changes; verify against resulting_state below."
}

See Governance Workflow for detailed examples of each action.

Portal Tools¶

The portal toolkit persists AI-generated artifacts to S3 with PostgreSQL metadata. Requires portal.enabled: true.

save_artifact¶

Save an AI-generated artifact (JSX dashboard, HTML report, SVG chart, etc.) to the asset portal. Automatically captures provenance — which tool calls in the current session produced this artifact.

Parameters:

Parameter	Type	Required	Default	Description
`name`	string	Yes	-	Display name (max 255 chars)
`content`	string	Yes	-	Artifact content
`content_type`	string	Yes	-	MIME type: `text/html`, `text/jsx`, `image/svg+xml`, `text/markdown`, `application/json`, `text/csv`
`description`	string	No	`""`	Description (max 2000 chars)
`tags`	array	No	`[]`	Tags for categorization (max 20 tags, each max 100 chars)

Response Schema:

{
  "asset_id": "a1b2c3d4e5f67890a1b2c3d4e5f67890",
  "portal_url": "https://portal.example.com/portal/assets/a1b2c3d4e5f67890a1b2c3d4e5f67890",
  "message": "Artifact saved successfully.",
  "provenance_captured": true,
  "tool_calls_recorded": 5
}

Storage layout:

Content is stored in S3 at {s3_prefix}{user_id}/{asset_id}/content.{ext} where the extension is derived from the content type.

manage_artifact¶

List, retrieve, update, or delete saved artifacts. All mutations enforce ownership — users can only modify their own artifacts.

Parameters:

Parameter	Type	Required	Default	Description
`action`	string	Yes	-	One of: `list`, `get`, `update`, `delete`, `search`
`asset_id`	string	Conditional	-	Required for `get`, `update`, `delete`
`content`	string	No	-	New content (for `update` — replaces S3 object)
`name`	string	No	-	New name (for `update`)
`description`	string	No	-	New description (for `update`)
`tags`	array	No	-	New tags (for `update`)
`content_type`	string	No	-	New content type (for `update`, only when replacing content)
`query`	string	Conditional	-	Free-text relevance query (required for `search`)
`limit`	integer	No	50	Max results for `list` (max 200); ranked `search` defaults to 20 (max 100)

Actions:

Action	Description	Required Params
`list`	Show current user's artifacts	None
`get`	Retrieve full asset metadata	`asset_id`
`update`	Change metadata or replace content	`asset_id`
`delete`	Soft-delete an artifact	`asset_id`
`search`	Rank the caller's own assets by relevance to `query` (hybrid vector + lexical, lexical-only fallback). Returns each match with a `score` plus a `ranking` field; scoped server-side to the caller's own assets by `owner_id` (the library's ownership key) and fails closed without an identity.	`query`

Response Schema (list):

{
  "assets": [
    {
      "id": "a1b2c3d4e5f67890a1b2c3d4e5f67890",
      "owner_id": "[email protected]",
      "name": "Revenue Dashboard",
      "description": "Monthly revenue breakdown",
      "content_type": "text/html",
      "s3_bucket": "portal-artifacts",
      "s3_key": "artifacts/user/asset-id/content.html",
      "size_bytes": 4096,
      "tags": ["dashboard", "revenue"],
      "provenance": {
        "user_id": "[email protected]",
        "session_id": "sess123",
        "tool_calls": [
          {"tool_name": "trino_query", "timestamp": "2024-01-15T10:00:00Z", "summary": "SELECT ..."}
        ]
      },
      "created_at": "2024-01-15T10:05:00Z",
      "updated_at": "2024-01-15T10:05:00Z"
    }
  ],
  "total": 1
}

Response Schema (update/delete):

{
  "asset_id": "a1b2c3d4e5f67890a1b2c3d4e5f67890",
  "message": "Asset updated successfully."
}

Error Codes:

Condition	Error Message
Missing asset_id	`asset_id is required for {action} action`
Asset not found	`asset not found: ...`
Wrong owner	`you can only {action} your own artifacts`
Invalid action	`invalid action "...": must be one of: list, get, update, delete`