Auditing User Access in Databricks with System Tables: From Basics to Advanced

Posted by


🔍 Auditing User Access in Databricks with System Tables: From Basics to Advanced

Managing and auditing user access is a critical part of maintaining a secure and compliant data platform. With the rise of multi-tenant workspaces and collaborative environments, Databricks’ Unity Catalog combined with system tables provides a powerful and scalable way to track and audit user activities and access.

In this blog, we’ll walk through the end-to-end process of auditing user access in Databricks, starting from the basics and advancing into more granular and complex techniques using system tables.


📘 Table of Contents

  1. Introduction to Unity Catalog & System Tables
  2. Key System Schemas for Auditing
  3. Getting Started: Viewing Grants and Permissions
  4. Tracking Access Patterns and Activity
  5. Advanced Auditing with Join Scenarios
  6. Visualizing Audit Data
  7. Security Best Practices for Access Audits
  8. Conclusion

1. 🧠 Introduction to Unity Catalog & System Tables

Unity Catalog is Databricks’ unified governance solution for data and AI assets. It enables centralized access control, auditing, and lineage tracking across all Databricks workspaces.

To support auditing, system tables were introduced. These are built-in tables automatically populated by Databricks and can be queried like regular tables using SQL.

Why System Tables Matter:

  • Centralized logging of user actions
  • SQL-based access to audit data
  • Ready integration with BI tools
  • Real-time monitoring capabilities

2. 🗃️ Key System Schemas for Auditing

Schema NamePurpose
system.accessTracks access events, successful and failed
system.computeLogs cluster and compute usage events
system.billingTracks usage and cost details
system.information_schemaShows grants, users, and privileges

3. 🧾 Getting Started: Viewing Grants and Permissions

Start with the information_schema.table_privileges table to understand who has access to what.

🔍 Example 1: List all table-level privileges

SELECT *
FROM system.information_schema.table_privileges
WHERE grantee LIKE '%@%';  -- filters for users

🔍 Example 2: View privileges granted to groups

SELECT *
FROM system.information_schema.table_privileges
WHERE grantee_type = 'GROUP';

🔍 Example 3: Check privileges for a specific schema or user

SELECT *
FROM system.information_schema.schema_privileges
WHERE grantee = 'analyst_group';

These tables show:

  • Object names
  • Privilege types (SELECT, MODIFY, etc.)
  • Grantor/grantee details

4. 📊 Tracking Access Patterns and Activity

Use the system.access.audit table for detailed logs on access events. This table logs actions such as read/write access, permission changes, and login attempts.

🔍 Example 4: Track who accessed which table

SELECT user_identity.email, object_name, action_name, event_time
FROM system.access.audit
WHERE action_name = 'SELECT'
  AND object_type = 'TABLE'
  AND event_time >= current_date() - INTERVAL 7 days
ORDER BY event_time DESC;

🔍 Example 5: Detect failed access attempts

SELECT *
FROM system.access.audit
WHERE status != 'SUCCESS'
ORDER BY event_time DESC;

5. 🔬 Advanced Auditing with Join Scenarios

You can join multiple system tables to enrich your access audit reports.

🔍 Example 6: Join access logs with cluster metadata

SELECT a.user_identity.email, a.action_name, a.object_name, c.cluster_id, c.cluster_name
FROM system.access.audit a
JOIN system.compute.clusters c
  ON a.request_params:cluster_id = c.cluster_id
WHERE a.event_time >= current_date() - INTERVAL 30 days;

🔍 Example 7: Map user actions to workspace names

If you manage multiple workspaces, you can tag audit entries with workspace identifiers via metadata or custom tagging (using custom_tags on clusters or jobs).


6. 📈 Visualizing Audit Data

Databricks system tables can be connected to Power BI, Tableau, or Databricks SQL Dashboards for visual insights.

Dashboard Ideas:

  • Weekly user access heatmaps
  • Top 10 most accessed datasets
  • Anomalous access detection (e.g., SELECTs on rarely used tables)
  • Failed access attempt trends

7. 🔐 Security Best Practices for Access Audits

  • Enable Unity Catalog across all workspaces to centralize control
  • Use Groups over individual users for easier privilege management
  • Set up alerts on unusual activity (e.g., many failed attempts)
  • Regularly review information_schema grants for overprovisioned access
  • Use column-level lineage to understand downstream impacts of data reads

8. ✅ Conclusion

Auditing user access in Databricks has never been easier thanks to system tables and Unity Catalog. Whether you’re doing a quick compliance check or building a full-fledged monitoring solution, these tools offer deep insights with simple SQL.

By following the steps outlined above—from basic queries to complex joins and dashboards—you can build a robust and automated access auditing framework.


🔗 Resources


Leave a Reply

Your email address will not be published. Required fields are marked *

0
Would love your thoughts, please comment.x
()
x