Delta Table VACUUM Issues in Databricks: Causes, Troubleshooting, and Solutions

Mohammad Gufran Jahangir February 1, 2025 0

Introduction

The VACUUM command in Delta Lake is used to delete old files no longer referenced by the transaction log, helping to optimize storage and maintain performance. However, VACUUM failures, slow execution, and accidental data loss can occur if not handled correctly.

🚨 Common issues with VACUUM in Delta tables:

VACUUM takes too long or fails to complete.
Files are not deleted as expected, leading to storage bloat.
Accidental data loss due to improper retention settings.
Errors due to Delta log corruption or missing references.

This guide explores the common causes of Delta VACUUM issues, troubleshooting steps, and best practices to ensure smooth execution.

How Delta VACUUM Works

What Does VACUUM Do?

Deletes older data files that are no longer referenced in the Delta log.
Helps reduce storage costs by cleaning up unnecessary files.
Works alongside OPTIMIZE to improve query performance.

Delta VACUUM Syntax

VACUUM delta.`/mnt/delta/table/` RETAIN 168 HOURS; -- Keeps last 7 days of history

💡 Default retention period = 7 days (168 hours).

Common Delta Table VACUUM Issues and Fixes

1. VACUUM Does Not Delete Old Files

Symptoms:

Storage usage does not decrease even after running VACUUM.
Old files still exist in cloud storage (S3, ADLS, GCS) after VACUUM execution.
No errors in logs, but the expected files remain in place.

Causes:

Short retention period (< 7 days) prevents file deletion (Databricks default setting).
Active time travel snapshots prevent deletion.
Files are still referenced in Delta logs, so they cannot be removed.

Fix:
✅ Check retention settings before running VACUUM:

DESCRIBE HISTORY delta.`/mnt/delta/table/`

✅ Use a longer retention period if necessary (default is 7 days):

VACUUM delta.`/mnt/delta/table/` RETAIN 168 HOURS;

✅ Manually verify which files are eligible for deletion:

SHOW FILES IN delta.`/mnt/delta/table/`;

✅ Ensure that old snapshots are not referenced by time travel queries.

2. VACUUM Is Extremely Slow or Hangs

Symptoms:

VACUUM takes hours to complete or never finishes.
High compute usage without significant file deletions.
Databricks cluster becomes unresponsive during VACUUM execution.

Causes:

Too many small files, increasing metadata overhead.
Large Delta log files requiring excessive processing.
High concurrent activity on the Delta table (writes/reads blocking VACUUM).

Fix:
✅ Run OPTIMIZE before VACUUM to reduce small files:

OPTIMIZE delta.`/mnt/delta/table/` ZORDER BY (primary_column);

✅ Check Delta log size before running VACUUM:

du -sh /dbfs/mnt/delta/table/_delta_log/

✅ Increase Databricks cluster resources if necessary:

Use larger clusters with more memory and CPUs for VACUUM execution.
Enable auto-scaling to dynamically allocate more resources.

3. VACUUM Fails with “Cannot delete active files” Error

Symptoms:

Error: “Cannot delete active files that are referenced by Delta Log.”
Some files remain undeleted even after running VACUUM.
MERGE, UPDATE, or DELETE operations on the table fail afterward.

Causes:

Delta log corruption or missing metadata prevents file cleanup.
Concurrent modifications (MERGE, DELETE, UPDATE) while VACUUM is running.
Files are still referenced by active Delta transactions.

Fix:
✅ Check active transactions before running VACUUM:

DESCRIBE HISTORY delta.`/mnt/delta/table/`

✅ Manually restart the Delta log cleanup process:

dbutils.fs.rm("dbfs:/mnt/delta/table/_delta_log/", True)

✅ Ensure that no writes or schema changes occur during VACUUM.

4. Accidental Data Loss After Running VACUUM

Symptoms:

Cannot access previous versions of the Delta table.
Rollback via Time Travel fails (“Version not found” error).
Unexpected data deletion or missing records after VACUUM.

Causes:

Retention period set too low, leading to premature file deletion.
VACUUM removed files that were still referenced by old queries.
No proper backups before running VACUUM.

Fix:
✅ Check the available versions before running VACUUM:

DESCRIBE HISTORY delta.`/mnt/delta/table/`

✅ Increase the retention period to preserve old versions:

VACUUM delta.`/mnt/delta/table/` RETAIN 168 HOURS;

✅ If data is lost, restore from cloud storage backups (S3, ADLS, GCS).

Step-by-Step Troubleshooting Guide

1. Verify Which Files Are Still Referenced

SHOW FILES IN delta.`/mnt/delta/table/`;

2. Check the Delta Log for Any Corrupt Entries

ls -lh /dbfs/mnt/delta/table/_delta_log/

3. Identify and Optimize Small Files Before Running VACUUM

OPTIMIZE delta.`/mnt/delta/table/` ZORDER BY (primary_column);

4. Validate Whether Active Time Travel Queries Are Preventing Deletion

DESCRIBE HISTORY delta.`/mnt/delta/table/`;

Best Practices to Prevent VACUUM Issues

✅ Always Run OPTIMIZE Before VACUUM

Reduces small files and improves performance.

✅ Keep a Safe Retention Period (At Least 7 Days)

Prevents accidental deletion of recent data versions.

✅ Check for Active Queries Before Running VACUUM

Ensure that no long-running queries are referencing old data.

✅ Enable Auto Compaction and Optimize Write

spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.databricks.delta.autoOptimize.enabled", "true")

✅ Use Cloud Storage Backups for Safety

Keep snapshot backups of Delta tables before running VACUUM.

Real-World Example: Fixing a Slow Delta Table VACUUM Execution

Scenario:

A Delta table with 100 million rows was running VACUUM for over 5 hours and failing intermittently.

Root Cause:

The table had too many small files, causing metadata overload.
Delta log had grown too large, slowing down VACUUM execution.
Concurrent writes were preventing file deletion.

Solution:

Ran OPTIMIZE to merge small files before VACUUM:

OPTIMIZE delta.`/mnt/delta/table/` ZORDER BY (customer_id);

2. Checked active queries to ensure no conflicting reads/writes.

3. Executed VACUUM with a safe retention period:

VACUUM delta.`/mnt/delta/table/` RETAIN 168 HOURS;

4. Monitored performance in Spark UI and logs.

✅ Result: VACUUM completed successfully in 20 minutes instead of 5 hours.

Conclusion

Delta Table VACUUM is essential for storage optimization and performance improvement, but it can cause slow execution, failures, and accidental data loss if not used correctly. By following best practices, monitoring retention settings, and optimizing Delta tables, teams can safely and efficiently manage storage in Databricks.

Mohammad Gufran Jahangir

Tags: Databricks, Delta Table VACUUM Issues

Category: