In real-world data ingestion scenarios, it’s common to deal with inconsistent or malformed data. When you’re enforcing a schema in Databricks, these mismatches can cause ingestion failures or loss of data — unless you use a powerful feature called the Rescued Data Column.
This post explains what the rescued data column is, how it works, and why it’s useful.

What is the Rescued Data Column?
The rescued data column is a special column in Databricks (commonly named _rescued_data) that automatically captures any fields from incoming data that do not match the expected schema.
Instead of discarding bad or unexpected data, Databricks stores it as a JSON-formatted string, so you can inspect, clean, and reprocess it later.
How it Works: An Example

Imagine you’re ingesting data from a file into a Bronze table with the following schema:
| Column | Data Type |
|---|---|
| users | STRING |
| cost | BIGINT |
Incoming Data
| users | cost |
|---|---|
| peter | $100 |
| zebi | 300 |
What Happens During Ingestion
- Row 1 (Peter)
- The value
"$100"is not a valid BIGINT. - The
costcolumn storesnullfor this row. - The original malformed data is captured in the
_rescued_datacolumn as:{"cost": "$100", "_file_path": "<file_path>"}
- The value
- Row 2 (Zebi)
- The value
300is valid BIGINT. - It’s stored directly in the
costcolumn. _rescued_dataremainsnull.
- The value
Resulting Bronze Table
| users | cost | _rescued_data |
|---|---|---|
| peter | null | {“cost”:”$100″,”_file_path”:”<file_path>”} |
| zebi | 300 | null |
Why This is Useful
- Prevents Data Loss – You don’t lose records just because a field doesn’t match the schema.
- Easier Debugging – You can track the original malformed values and the file they came from.
- Flexible Cleaning – You can later parse
_rescued_datato fix and reprocess problematic fields. - Perfect for Bronze Layer – This approach preserves raw data for auditing while allowing schema enforcement.
Best Practices
- Enable the Rescued Data Column for Bronze layer tables where raw ingestion happens.
- Periodically review
_rescued_datacontents to detect data quality issues early. - Automate cleanup by parsing
_rescued_dataand applying transformations before moving to Silver/Gold layers.