Column Tagging
Column tags in the Catalog describe what a column contains — for example PII, CONFIDENTIAL, or INTERNAL. Tags are the link between the metadata layer (Datahub) and the access control layer (Ranger): once a tag is applied to a column in Datahub, it becomes available in Ranger for tag-based restriction and masking policies.
The Tag-to-Policy Flow
Step by Step
1. Apply a tag in the Catalog UI
A user or admin navigates to a table in the Cogrion Catalog, selects a column, and applies a Datahub tag. Tags are free-form strings (e.g. PII, SENSITIVE) and can be applied to individual columns.
2. Datahub commits the change
The BFF forwards the tag write to Datahub GMS. Datahub stores the tag as a GlobalTag attached to the column's schema field entity and emits a MetadataChangeLog (MCL) event to the Kafka topic MetadataChangeLog_Versioned_v1.
3. ranger-tag-sync picks up the event
The ranger-tag-sync service (deployed as part of the aws/datahub bundle) consumes the MCL event from Kafka. It extracts the tag name and registers it in Ranger's tag store against the trino Ranger service.
4. Tag is available in Ranger
The tag now exists in Ranger. It can be selected when creating a column restriction policy or a data masking policy via the Data Access Management UI.
Where Tags Come From
Tags on columns have two origins:
| Source | How it works |
|---|---|
| Manual tagging | An admin applies a tag directly in the Cogrion Catalog UI — the flow described on this page |
| PII scanning | The PII scanning Spark job scans table data on a schedule and writes PII scan results as structured properties back to Datahub |
Important: Tag Scope
A tag-based policy applies to all columns carrying that tag, across all tables and schemas. Removing a tag from a column in Datahub triggers a new MCL event, and the tag sync will update Ranger accordingly — but any existing Ranger policies that reference the tag are not automatically removed.
When removing a tag from a column, verify that the Ranger policy intent still makes sense for the remaining tagged columns.
Go Deeper
- Data Access Management — creating restriction and masking policies using tags
- Datahub — the ranger-tag-sync component and ingestion pipelines
- SQL Auth: Superset, Trino & Ranger — how tag policies are enforced at query time