Skip to content

Governance Workflow

Knowledge capture turns catalog metadata into something that evolves with use. Instead of metadata being a static artifact maintained by a central team, it improves continuously as domain experts share what they know during their normal work. This is active metadata management: the catalog gets better every time someone uses it.

This page covers the admin-side workflow for reviewing captured insights and writing approved changes back to DataHub.

The Review Process

All captured insights start with status pending. Admins review them through the apply_knowledge tool or the Admin REST API.

Bulk Review

Get an overview of all pending insights:

{"action": "bulk_review"}

Response:

{
  "total_pending": 7,
  "by_entity": [
    {
      "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)",
      "count": 3,
      "categories": ["correction", "business_context"],
      "latest_at": "2025-01-15T14:30:00Z"
    },
    {
      "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.finance.revenue,PROD)",
      "count": 4,
      "categories": ["data_quality", "usage_guidance", "enhancement"],
      "latest_at": "2025-01-15T16:45:00Z"
    }
  ],
  "by_category": {
    "correction": 2,
    "business_context": 1,
    "data_quality": 2,
    "usage_guidance": 1,
    "enhancement": 1
  },
  "by_confidence": {
    "high": 3,
    "medium": 4
  }
}

Review by Entity

Drill into a specific entity to see its pending insights alongside current DataHub metadata:

{
  "action": "review",
  "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)"
}

Response includes:

  • Current DataHub metadata (description, tags, glossary terms, owners)
  • All insights for this entity with their categories, text, confidence, and suggested actions

This side-by-side view helps admins assess whether an insight adds value compared to what's already in the catalog.

Approve or Reject

Transition insight statuses with optional review notes:

{
  "action": "approve",
  "insight_ids": ["a1b2c3d4e5f6", "f6e5d4c3b2a1"],
  "review_notes": "Verified with data engineering team"
}
{
  "action": "reject",
  "insight_ids": ["deadbeef1234"],
  "review_notes": "Already documented in the column description"
}

Response:

{
  "action": "approve",
  "updated": 2,
  "total": 2
}

If any IDs are invalid or the status transition is not allowed, those are reported in an errors array without blocking the valid ones.

Synthesizing Changes

Once insights are approved, the synthesize action gathers them and builds structured change proposals:

{
  "action": "synthesize",
  "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)"
}

Response:

{
  "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)",
  "current_metadata": {
    "description": "Order records",
    "tags": ["financial"],
    "glossary_terms": [],
    "owners": ["Data Platform Team"]
  },
  "approved_insights": [
    {
      "id": "a1b2c3d4e5f6",
      "category": "correction",
      "insight_text": "The amount column represents gross margin before returns, not revenue.",
      "suggested_actions": [
        {"action_type": "update_description", "target": "entity", "detail": "Order records with gross margin amounts (before returns)"}
      ]
    }
  ],
  "proposed_changes": [
    {
      "change_type": "update_description",
      "target": "entity",
      "current_value": "Order records",
      "suggested_value": "Order records with gross margin amounts (before returns)",
      "source_insight_ids": ["a1b2c3d4e5f6"]
    }
  ]
}

The synthesis output shows current values alongside proposed changes, so admins can see exactly what will change. The source_insight_ids field traces each change back to the insight that proposed it.

Applying Changes

The apply action writes changes to DataHub and records a changeset. This is the data catalog write-back step.

{
  "action": "apply",
  "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)",
  "changes": [
    {
      "change_type": "update_description",
      "target": "entity",
      "detail": "Order records with gross margin amounts (before returns)"
    },
    {
      "change_type": "add_tag",
      "target": "entity",
      "detail": "gross-margin"
    }
  ],
  "insight_ids": ["a1b2c3d4e5f6"],
  "confirm": true
}

When require_confirmation is enabled in configuration and confirm is not true, the tool returns a confirmation prompt instead of applying:

{
  "confirmation_required": true,
  "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)",
  "changes_count": 2,
  "message": "Set confirm: true to apply these changes."
}

Supported Change Types

Change Type Description Target Field Detail Field
update_description Update entity or column description column:<fieldPath> for columns, empty for dataset-level New description text
add_tag Add a tag to the entity (ignored) Tag name or URN (e.g., pii or urn:li:tag:pii)
remove_tag Remove a tag from the entity (ignored) Tag name or URN to remove
add_glossary_term Associate a glossary term (ignored) Glossary term name or URN
flag_quality_issue Add fixed QualityIssue tag to the entity (ignored) Issue description (stored as context in the knowledge store)
add_documentation Add a documentation link URL of the documentation Link description
add_curated_query Create a reusable Query entity linked to the dataset (ignored) Query name. Also requires query_sql (SQL statement) and optionally query_description

Tag names and glossary term names are automatically normalized to full DataHub URNs (e.g., pii becomes urn:li:tag:pii).

Apply Response

{
  "changeset_id": "cs_x1y2z3a4b5c6",
  "entity_urn": "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)",
  "changes_applied": 2,
  "insights_marked_applied": 1,
  "message": "Changes applied to DataHub. Changeset cs_x1y2z3a4b5c6 recorded for rollback."
}

Source insights move to applied status with a reference to the changeset.

Changeset Tracking

Every apply action creates a changeset record in the knowledge_changesets table:

Field Description
id Unique changeset identifier
target_urn The DataHub entity that was modified
change_type Summary of change types applied (or multiple)
previous_value Entity metadata before changes (description, tags, glossary terms, owners)
new_value Changes that were applied
source_insight_ids Insights that produced this changeset
applied_by User who applied the changes
rolled_back Whether this changeset has been reverted
rolled_back_by Who reverted the changes
rolled_back_at When the changes were reverted

The previous_value field captures the entity's metadata at the time of application, which is used for rollback.

Rollback

Changesets can be rolled back through the Admin REST API:

curl -X POST \
  https://mcp.example.com/api/v1/admin/knowledge/changesets/cs_x1y2z3a4b5c6/rollback \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Rollback restores the previous_value metadata to the DataHub entity. The changeset is marked as rolled back with the user and timestamp.

Limitations:

  • Rollback restores the metadata snapshot from the time of application. If the entity's metadata was modified by other means after the apply, the rollback overwrites those changes too.
  • A changeset can only be rolled back once.

Complete Workflow Example

An analyst discovers that the amount column in the orders table represents gross margin, not revenue. Here's how that knowledge flows through the system:

sequenceDiagram
    participant Analyst
    participant AI as AI Assistant
    participant Platform as mcp-data-platform
    participant DB as PostgreSQL
    participant Admin
    participant DH as DataHub

    Note over Analyst,AI: Discovery phase
    Analyst->>AI: What does the amount column mean?
    AI->>Platform: trino_describe_table(orders)
    Platform-->>AI: amount: DECIMAL (no description)
    Analyst->>AI: That's gross margin before returns,<br/>not revenue like the name suggests

    Note over AI,Platform: Capture
    AI->>Platform: capture_insight(<br/>category: correction,<br/>source: user,<br/>entity_urns: [urn:li:dataset:...orders...],<br/>insight_text: "amount column is gross margin<br/>before returns, not revenue",<br/>confidence: high,<br/>suggested_actions: [{<br/>  action_type: update_description,<br/>  target: amount,<br/>  detail: "Gross margin before returns"<br/>}])
    Platform->>DB: INSERT (status: pending)
    Platform-->>AI: Insight captured: a1b2c3

    Note over Admin,DH: Review phase (later)
    Admin->>Platform: apply_knowledge(action: bulk_review)
    Platform-->>Admin: 1 pending insight for orders

    Admin->>Platform: apply_knowledge(action: review,<br/>entity_urn: ...orders...)
    Platform->>DH: Get current metadata
    Platform-->>Admin: Current: no description<br/>Insight: "amount is gross margin"

    Admin->>Platform: apply_knowledge(action: approve,<br/>insight_ids: [a1b2c3])
    Platform->>DB: UPDATE status = approved

    Note over Admin,DH: Apply phase
    Admin->>Platform: apply_knowledge(action: synthesize,<br/>entity_urn: ...orders...)
    Platform-->>Admin: Proposed: update amount description

    Admin->>Platform: apply_knowledge(action: apply,<br/>changes: [{update_description,<br/>amount, "Gross margin before returns"}],<br/>insight_ids: [a1b2c3],<br/>confirm: true)
    Platform->>DH: Update column description
    Platform->>DB: INSERT changeset (previous_value saved)
    Platform->>DB: UPDATE insight status = applied
    Platform-->>Admin: Changeset cs_x1y2 recorded

The next time anyone queries this table, the enriched response includes the corrected description.

Next Steps