,

Understanding system.access Tables in Unity Catalog (Azure Databricks)

Posted by


🔍 Understanding system.access Tables in Unity Catalog (Azure Databricks)

Databricks’ Unity Catalog provides a powerful centralized data governance solution. One of its key features is the system.access schema — a collection of tables that allow administrators and data engineers to audit access, track lineage, and monitor security and compliance.

In this blog, we’ll break down each table in system.access, share example queries, and explain how you can use the data effectively.


📘 1. assistant_events

Purpose: Logs user interactions with the Databricks AI assistant.

Example Query:

SELECT event_time, event_id, user_agent
FROM system.access.assistant_events
ORDER BY event_time DESC
LIMIT 5;

Use Case:

  • Monitor usage patterns of AI assistant
  • Track which users are leveraging built-in intelligence features

📘 2. audit

Purpose: Tracks detailed user actions such as SELECT, INSERT, or GRANT on data assets.

Example Query:

SELECT event_time, user_identity.email, action_name, event_id 
FROM system.access.audit 
--WHERE action_name = 'SELECT' 
ORDER BY event_time DESC 
LIMIT 5;

Use Case:

  • Security audits and compliance
  • Identify who accessed sensitive data
  • Troubleshoot unauthorized changes

📘 3. clean_room_events

Purpose: Logs secure data collaboration events within clean rooms.

Example Query:

SELECT event_time, user_identity.email, action_name, object_name 
FROM system.access.clean_room_events 
ORDER BY event_time DESC 
LIMIT 5;

Use Case:

  • Monitor B2B data sharing under compliance boundaries
  • Track external collaborators’ access to shared datasets

📘 4. column_lineage

Purpose: Tracks column-level data lineage across transformations and notebooks.

Example Query:

SELECT source_column_name, target_column_name, entity_type 
FROM system.access.column_lineage 
LIMIT 5;

Use Case:

  • Understand impact of source column changes
  • Enable column-level audit trails
  • Power lineage visualizations

📘 5. outbound_network

Purpose: Logs all outbound HTTP/S requests from Databricks jobs or notebooks.

Example Query:

SELECT event_time, account_id, destination AS target_url
FROM system.access.outbound_network
ORDER BY event_time DESC
LIMIT 5;

Use Case:

  • Detect unauthorized external communication
  • Enforce egress policies and network boundaries
  • Support SOC/ISO audits

📘 6. table_lineage

Purpose: Shows table-level data flow relationships — e.g., joins, inserts, transformations.

Example Query:

SELECT source_table_name, target_table_name, event_id 
FROM system.access.table_lineage 
LIMIT 5;

Use Case:

  • Trace full pipeline from bronze → silver → gold tables
  • Understand dependencies between systems
  • Help in data impact analysis

📘 7. workspaces_latest

Purpose: Provides a snapshot of all Databricks workspaces including names and IDs.

Example Query:

SELECT workspace_id, workspace_name, account_id 
FROM system.access.workspaces_latest 
LIMIT 5;

Use Case:

  • Map audit logs (workspace_id) to friendly names
  • Workspace inventory for multi-region accounts
  • Merge with system.access.audit for more clarity

🧠 Pro Tip: Joining audit with workspaces_latest

You can enrich your audit logs with workspace names like this:

SELECT 
  wl.workspace_name,
  a.user_identity.email,
  a.action_name,
  a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
  ON a.workspace_id = wl.workspace_id
WHERE a.action_name = 'SELECT'
ORDER BY a.event_time DESC
LIMIT 10;

SELECT
wl.account_id,
wl.workspace_id,
wl.workspace_name,
a.user_identity.email AS user_email,
a.action_name,
a.event_time
FROM system.access.audit AS a
JOIN system.access.workspaces_latest AS wl
ON a.workspace_id = wl.workspace_id
WHERE a.action_name = 'SELECT'
ORDER BY a.event_time DESC
LIMIT 100;

Output Columns You’ll Get:

account_idworkspace_idworkspace_nameuser_emailaction_nameevent_time
abc-123…xyz-456…my-dev-wsuser@org.comSELECT2025-07-02T…

You’re seeing UUIDs ‘84a3f19c-a964-4ae4-a8f0-c51b23fc696c' instead of actual email addresses in the email column from system.access.audit (or a join with it). This usually means the action was performed by:

  • a service principal, or
  • a non-interactive identity (like a job, notebook run, or API token)

🔍 How to resolve UUID → User Identity

✅ 1. Check in Unity Catalog system.identity_metadata

Sometimes, the table contains additional metadata like:

sqlCopyEditSELECT DISTINCT 
  identity_metadata.run_by, 
  identity_metadata.run_as,
  user_identity.subject_name,
  user_identity.email
FROM system.access.audit
WHERE user_identity.email = '84a3f19c-a964-4ae4-a8f0-c51b23fc696c';

✅ 2. Look in Admin Console → Service Principals

  1. Go to Azure Databricks Admin Console
  2. Click Service Principals or Access Control
  3. Look for the ID 84a3f19c-a964-4ae4-a8f0-c51b23fc696c

This ID might map to a:

  • Service Principal used for pipeline runs
  • Azure AD app registration
  • Token-based access like Terraform/Databricks CLI


🔍 1. Who Ran What (by Workspace & User)?

SSELECT 
  wl.workspace_name,
  a.user_identity.email AS user_email,
  a.action_name,
  a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
  ON a.workspace_id = wl.workspace_id
ORDER BY a.event_time DESC
LIMIT 100;

➡️ Use this to review all user actions across workspaces.


🔐 2. All Failed Actions or Errors

SELECT 
  wl.workspace_name,
  a.user_identity.email AS user_email,
  a.action_name,
  a.response.status_code,
  a.response.error_message,
  a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
  ON a.workspace_id = wl.workspace_id
WHERE a.response.status_code != 200
ORDER BY a.event_time DESC;

➡️ Great for debugging failed queries, notebook errors, or pipeline issues.


🕵️‍♂️ 3. Table Access (SELECT or INSERT) by Workspace


SELECT 
  wl.workspace_name,
  a.user_identity.email,
  a.action_name,
  a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
  ON a.workspace_id = wl.workspace_id
WHERE a.action_name IN ('SELECT', 'INSERT')
ORDER BY a.event_time DESC;

➡️ Use this for table-level access auditing.


🧑‍💻 4. Identify Service Principal Usage (UUIDs)

SELECT 
  wl.workspace_name,
  a.user_identity.email AS user_or_id,
  COUNT(*) AS activity_count
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
  ON a.workspace_id = wl.workspace_id
GROUP BY wl.workspace_name, a.user_identity.email
HAVING user_or_id NOT LIKE '%@%'  -- likely UUIDs
ORDER BY activity_count DESC;

➡️ Helps you find non-human actors (e.g., automation or service principals).


📆 5. Most Active Users per Workspace (Last 7 Days)

SELECT 
  wl.workspace_name,
  a.user_identity.email AS user_email,
  COUNT(*) AS activity_count
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
  ON a.workspace_id = wl.workspace_id
WHERE a.event_time >= current_date() - INTERVAL 7 DAYS
GROUP BY wl.workspace_name, a.user_identity.email
ORDER BY activity_count DESC;

➡️ Use this to identify power users or outliers in activity.


🧹 6. Rare Actions (Potentially Suspicious)

SELECT 
  DISTINCT a.action_name
FROM system.access.audit a
WHERE a.action_name NOT IN ('SELECT', 'INSERT', 'UPDATE', 'DELETE', 'runCommand');

➡️ Helps you uncover less common actions (like permission changes or object drops).


🧾 7. Audit Grant/Permission Changes

SELECT 
  wl.workspace_name,
  a.user_identity.email,
  a.action_name,
  a.object_name,
  a.event_time
FROM system.access.audit a
JOIN system.access.workspaces_latest wl
  ON a.workspace_id = wl.workspace_id
WHERE a.action_name IN ('GRANT', 'REVOKE')
ORDER BY a.event_time DESC;

➡️ Use this to track changes in access control.


🚀 Conclusion

The system.access schema is a goldmine for security, compliance, and observability. Whether you’re building an internal audit dashboard or debugging unexpected table access, these system tables empower you with the transparency you need.

Start with simple queries, and consider integrating this into scheduled reports, alerting workflows, or even Power BI dashboards.


Leave a Reply

Your email address will not be published. Required fields are marked *

0
Would love your thoughts, please comment.x
()
x