,

Data Masking Is Not Applying Correctly in Databricks

Posted by

Introduction

Data masking in Databricks Unity Catalog is a critical feature used for securing sensitive data, ensuring compliance with privacy regulations, and enforcing access control policies. However, if data masking is not applying correctly, users might see:

🚨 Common issues with data masking in Databricks:

  • Masked columns still show raw data for unauthorized users.
  • Data masking rules do not take effect on queries.
  • Different users see different masking behaviors unexpectedly.
  • Performance issues with masked queries in Unity Catalog.

This guide covers troubleshooting steps and fixes to ensure that data masking applies correctly in Unity Catalog.


1. Verify That Column Masking Policies Are Defined Correctly

Symptoms:

  • Unauthorized users see unmasked data.
  • Some columns apply masking while others do not.

Causes:

  • Masking policies were not created or applied correctly.
  • Policy syntax errors prevent execution.
  • The wrong policy is assigned to the column.

Fix:

Ensure the masking policy is created correctly:

CREATE MASKING POLICY mask_ssn 
AS (val STRING) RETURNS STRING -> 
CASE 
    WHEN is_account_group_member('finance') THEN val  
    ELSE 'XXX-XX-XXXX' 
END;

Apply the masking policy to the correct column:

ALTER TABLE my_catalog.my_schema.customers 
ALTER COLUMN ssn SET MASKING POLICY mask_ssn;

Check if the policy is assigned properly:

SHOW MASKING POLICIES;

Test the policy by querying the table as different users:

SELECT ssn FROM my_catalog.my_schema.customers;
  • A finance user should see actual values.
  • Other users should see masked values (XXX-XX-XXXX).

2. Check If Users Have the Correct Permissions

Symptoms:

  • Masking policy is applied, but some users see raw data.
  • Certain users cannot query masked columns.

Causes:

  • Users with elevated permissions bypass data masking.
  • Policies are not correctly enforced due to privilege mismatches.

Fix:

Ensure users do not have UNMASK privileges:

REVOKE UNMASK ON TABLE my_catalog.my_schema.customers FROM user@example.com;

Verify which users have access to unmasked data:

SHOW GRANTS ON TABLE my_catalog.my_schema.customers;

Ensure the correct users are assigned roles in the policy definition:

SELECT * FROM system.information_schema.enabled_roles WHERE user_name = 'user@example.com';

3. Check If the Policy Works on All Query Types

Symptoms:

  • Masking applies to SELECT queries but fails on JOINs or Aggregations.
  • Users can extract unmasked data through workarounds.

Causes:

  • Masking does not propagate correctly in complex queries.
  • Aggregations or JOINs can bypass row-level security policies.

Fix:

Ensure masking policies are designed for all query scenarios:

CREATE MASKING POLICY mask_salary 
AS (val DOUBLE) RETURNS STRING -> 
CASE 
    WHEN is_account_group_member('executive') THEN CAST(val AS STRING)
    ELSE 'Confidential'
END;

Apply masking to columns used in aggregations and JOINs:

ALTER TABLE my_catalog.my_schema.payroll 
ALTER COLUMN salary SET MASKING POLICY mask_salary;

Test different queries to ensure the mask is applied:

SELECT salary FROM my_catalog.my_schema.payroll;
SELECT AVG(salary) FROM my_catalog.my_schema.payroll;

4. Masking Policies Are Not Applying on Delta Tables

Symptoms:

  • Masking works on standard tables but not on Delta tables.
  • Users can still read raw data from Delta tables.

Causes:

  • Delta tables require Unity Catalog for proper enforcement.
  • Masking policies do not apply on unmanaged (external) Delta tables.

Fix:

Ensure the Delta table is registered in Unity Catalog:

ALTER TABLE my_catalog.my_schema.sales 
SET TBLPROPERTIES ('delta.feature.unityCatalog.enabled' = 'true');

Ensure masking is set on Delta table columns:

ALTER TABLE my_catalog.my_schema.sales 
ALTER COLUMN credit_card_number SET MASKING POLICY mask_credit_card;

Check if the table is correctly managed by Unity Catalog:

DESCRIBE FORMATTED my_catalog.my_schema.sales;
  • If the table is external, convert it to a managed Unity Catalog table.

Convert external Delta tables to managed Delta tables:

CREATE TABLE my_catalog.my_schema.sales 
AS SELECT * FROM delta.`s3://my-bucket/sales-data/`;

5. Masking Is Not Enforced in Notebooks or Jobs

Symptoms:

  • Masking works in SQL editor but not in notebooks.
  • Jobs running on clusters see unmasked data.

Causes:

  • Clusters do not support Unity Catalog masking.
  • Notebook users have elevated permissions that bypass masking.

Fix:

Ensure the cluster is Unity Catalog-enabled:

  • Go to Clusters → Advanced Options → Enable Unity Catalog.

Run the query using a Unity Catalog-enabled SQL Warehouse:

  • SQL Warehouses enforce masking policies better than interactive clusters.

Ensure job service principals do not have UNMASK privileges:

REVOKE UNMASK ON TABLE my_catalog.my_schema.customers FROM service_account@example.com;

6. Check If Masking Applies to Views and Derived Tables

Symptoms:

  • Data masking works on base tables but not on views.
  • Users can bypass masking by querying a derived view.

Causes:

  • Masking policies are not inherited by views.
  • Views may reference unmasked base tables.

Fix:

Apply masking policies directly to view columns:

CREATE VIEW my_catalog.my_schema.masked_customers AS
SELECT id, mask_ssn(ssn) AS ssn, name FROM my_catalog.my_schema.customers;

Ensure masking policies are assigned to views explicitly:

ALTER VIEW my_catalog.my_schema.masked_customers 
ALTER COLUMN ssn SET MASKING POLICY mask_ssn;

Check view dependencies to ensure masking is not bypassed:

SHOW CREATE VIEW my_catalog.my_schema.masked_customers;

7. Masking Performance Issues on Large Datasets

Symptoms:

  • Masked queries run slower than expected.
  • Masking adds noticeable overhead in analytics queries.

Causes:

  • Masking functions add computational overhead on large datasets.
  • Complex CASE statements slow down execution.

Fix:

Optimize masking logic for performance:

CREATE MASKING POLICY optimized_masking 
AS (val STRING) RETURNS STRING -> 
CASE WHEN is_member('finance') THEN val ELSE repeat('*', length(val)) END;

Apply masking only where necessary (avoid masking indexed columns).
Use caching for frequently queried masked columns:

CACHE SELECT id, masked_column FROM my_catalog.my_schema.sales;

Step-by-Step Troubleshooting Checklist

1. Verify That the Masking Policy Is Defined and Applied

SHOW MASKING POLICIES;
SHOW COLUMNS IN TABLE my_catalog.my_schema.customers;

2. Check If Users Have the Correct Privileges

SHOW GRANTS ON TABLE my_catalog.my_schema.customers;

3. Ensure the Masking Policy Works on Aggregations and Joins

SELECT AVG(salary) FROM my_catalog.my_schema.payroll;

4. Verify That the Table Is a Unity Catalog Managed Table

DESCRIBE FORMATTED my_catalog.my_schema.sales;

5. Ensure Notebooks and Jobs Use Unity Catalog-Enabled Clusters

  • Check that the cluster supports Unity Catalog.
  • Ensure jobs do not bypass security policies.

Conclusion

If data masking is not applying correctly in Databricks, ensure that:
Masking policies are properly defined and assigned to the correct columns.
Users do not have the UNMASK privilege.
Delta tables are Unity Catalog-managed to enforce security policies.
Queries, joins, views, and aggregations still respect the masking policy.
Unity Catalog-enabled clusters and SQL Warehouses are used for enforcement.

By following this guide, you can successfully apply and enforce data masking policies in Databricks!

guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x