🔍 Understanding system.access
Tables in Unity Catalog (Azure Databricks)
Databricks’ Unity Catalog provides a powerful centralized data governance solution. One of its key features is the system.access
schema — a collection of tables that allow administrators and data engineers to audit access, track lineage, and monitor security and compliance.
In this blog, we’ll break down each table in system.access
, share example queries, and explain how you can use the data effectively.
📘 1. assistant_events
Purpose: Logs user interactions with the Databricks AI assistant.
Example Query:
SELECT event_time, event_id, user_agent
FROM system.access.assistant_events
ORDER BY event_time DESC
LIMIT 5;
Use Case:
- Monitor usage patterns of AI assistant
- Track which users are leveraging built-in intelligence features
📘 2. audit
Purpose: Tracks detailed user actions such as SELECT, INSERT, or GRANT on data assets.
Example Query:
SELECT event_time, user_identity.email, action_name, event_id
FROM system.access.audit
--WHERE action_name = 'SELECT'
ORDER BY event_time DESC
LIMIT 5;
Use Case:
- Security audits and compliance
- Identify who accessed sensitive data
- Troubleshoot unauthorized changes
📘 3. clean_room_events
Purpose: Logs secure data collaboration events within clean rooms.
Example Query:
SELECT event_time, user_identity.email, action_name, object_name
FROM system.access.clean_room_events
ORDER BY event_time DESC
LIMIT 5;
Use Case:
- Monitor B2B data sharing under compliance boundaries
- Track external collaborators’ access to shared datasets
📘 4. column_lineage
Purpose: Tracks column-level data lineage across transformations and notebooks.
Example Query:
SELECT source_column_name, target_column_name, entity_type
FROM system.access.column_lineage
LIMIT 5;
Use Case:
- Understand impact of source column changes
- Enable column-level audit trails
- Power lineage visualizations
📘 5. outbound_network
Purpose: Logs all outbound HTTP/S requests from Databricks jobs or notebooks.
Example Query:
SELECT event_time, account_id, destination AS target_url
FROM system.access.outbound_network
ORDER BY event_time DESC
LIMIT 5;
Use Case:
- Detect unauthorized external communication
- Enforce egress policies and network boundaries
- Support SOC/ISO audits
📘 6. table_lineage
Purpose: Shows table-level data flow relationships — e.g., joins, inserts, transformations.
Example Query:
SELECT source_table_name, target_table_name, event_id
FROM system.access.table_lineage
LIMIT 5;
Use Case:
- Trace full pipeline from bronze → silver → gold tables
- Understand dependencies between systems
- Help in data impact analysis
📘 7. workspaces_latest
Purpose: Provides a snapshot of all Databricks workspaces including names and IDs.
Example Query:
SELECT workspace_id, workspace_name, account_id
FROM system.access.workspaces_latest
LIMIT 5;
Use Case:
- Map audit logs (
workspace_id
) to friendly names - Workspace inventory for multi-region accounts
- Merge with
system.access.audit
for more clarity
🧠 Pro Tip: Joining audit
with workspaces_latest
You can enrich your audit logs with workspace names like this:
SELECT
wl.workspace_name,
a.user_identity.email,
a.action_name,
a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
ON a.workspace_id = wl.workspace_id
WHERE a.action_name = 'SELECT'
ORDER BY a.event_time DESC
LIMIT 10;
SELECT
wl.account_id,
wl.workspace_id,
wl.workspace_name,
a.user_identity.email AS user_email,
a.action_name,
a.event_time
FROM system.access.audit AS a
JOIN system.access.workspaces_latest AS wl
ON a.workspace_id = wl.workspace_id
WHERE a.action_name = 'SELECT'
ORDER BY a.event_time DESC
LIMIT 100;
Output Columns You’ll Get:
account_id | workspace_id | workspace_name | user_email | action_name | event_time |
---|---|---|---|---|---|
abc-123… | xyz-456… | my-dev-ws | user@org.com | SELECT | 2025-07-02T… |
You’re seeing UUIDs ‘84a3f19c-a964-4ae4-a8f0-c51b23fc696c'
instead of actual email addresses in the email
column from system.access.audit
(or a join with it). This usually means the action was performed by:
- a service principal, or
- a non-interactive identity (like a job, notebook run, or API token)
🔍 How to resolve UUID → User Identity
✅ 1. Check in Unity Catalog system.identity_metadata
Sometimes, the table contains additional metadata like:
sqlCopyEditSELECT DISTINCT
identity_metadata.run_by,
identity_metadata.run_as,
user_identity.subject_name,
user_identity.email
FROM system.access.audit
WHERE user_identity.email = '84a3f19c-a964-4ae4-a8f0-c51b23fc696c';
✅ 2. Look in Admin Console → Service Principals
- Go to Azure Databricks Admin Console
- Click Service Principals or Access Control
- Look for the ID
84a3f19c-a964-4ae4-a8f0-c51b23fc696c
This ID might map to a:
- Service Principal used for pipeline runs
- Azure AD app registration
- Token-based access like Terraform/Databricks CLI
🔍 1. Who Ran What (by Workspace & User)?
SSELECT
wl.workspace_name,
a.user_identity.email AS user_email,
a.action_name,
a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
ON a.workspace_id = wl.workspace_id
ORDER BY a.event_time DESC
LIMIT 100;
➡️ Use this to review all user actions across workspaces.
🔐 2. All Failed Actions or Errors
SELECT
wl.workspace_name,
a.user_identity.email AS user_email,
a.action_name,
a.response.status_code,
a.response.error_message,
a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
ON a.workspace_id = wl.workspace_id
WHERE a.response.status_code != 200
ORDER BY a.event_time DESC;
➡️ Great for debugging failed queries, notebook errors, or pipeline issues.
🕵️♂️ 3. Table Access (SELECT or INSERT) by Workspace
SELECT
wl.workspace_name,
a.user_identity.email,
a.action_name,
a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
ON a.workspace_id = wl.workspace_id
WHERE a.action_name IN ('SELECT', 'INSERT')
ORDER BY a.event_time DESC;
➡️ Use this for table-level access auditing.
🧑💻 4. Identify Service Principal Usage (UUIDs)
SELECT
wl.workspace_name,
a.user_identity.email AS user_or_id,
COUNT(*) AS activity_count
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
ON a.workspace_id = wl.workspace_id
GROUP BY wl.workspace_name, a.user_identity.email
HAVING user_or_id NOT LIKE '%@%' -- likely UUIDs
ORDER BY activity_count DESC;
➡️ Helps you find non-human actors (e.g., automation or service principals).
📆 5. Most Active Users per Workspace (Last 7 Days)
SELECT
wl.workspace_name,
a.user_identity.email AS user_email,
COUNT(*) AS activity_count
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
ON a.workspace_id = wl.workspace_id
WHERE a.event_time >= current_date() - INTERVAL 7 DAYS
GROUP BY wl.workspace_name, a.user_identity.email
ORDER BY activity_count DESC;
➡️ Use this to identify power users or outliers in activity.
🧹 6. Rare Actions (Potentially Suspicious)
SELECT
DISTINCT a.action_name
FROM system.access.audit a
WHERE a.action_name NOT IN ('SELECT', 'INSERT', 'UPDATE', 'DELETE', 'runCommand');
➡️ Helps you uncover less common actions (like permission changes or object drops).
🧾 7. Audit Grant/Permission Changes
SELECT
wl.workspace_name,
a.user_identity.email,
a.action_name,
a.object_name,
a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
ON a.workspace_id = wl.workspace_id
WHERE a.action_name IN ('GRANT', 'REVOKE')
ORDER BY a.event_time DESC;
➡️ Use this to track changes in access control.
🚀 Conclusion
The system.access
schema is a goldmine for security, compliance, and observability. Whether you’re building an internal audit dashboard or debugging unexpected table access, these system tables empower you with the transparency you need.
Start with simple queries, and consider integrating this into scheduled reports, alerting workflows, or even Power BI dashboards.
Leave a Reply