Track and Optimize Your Databricks Costs with system.billing Tables in Unity Catalog

Mohammad Gufran Jahangir July 3, 2025 0

Table of Contents

💰 Track and Optimize Your Databricks Costs with `system.billing` Tables in Unity Catalog

Databricks has become a core component of many modern data stacks — but with flexibility and scale comes the challenge of cost visibility. Fortunately, Databricks’ Unity Catalog includes a system.billing schema designed to give you full transparency into usage and pricing.

In this blog, we’ll explore the two billing tables — list_prices and usage — and how to query them for cost optimization, budgeting, and chargeback.

🧾 What is `system.billing`?

The system.billing schema exposes two critical tables:

list_prices: Reference pricing for compute resources
usage: Actual usage data, such as DBUs consumed, job runs, workspace-level costs

📊 1. `system.billing.usage`

🔍 Purpose:

This table logs real-time usage of Databricks resources such as:

Jobs and cluster runs
SQL warehouses
DBUs (Databricks Units)
Workspace, user, and SKU (pricing tier)

🧾 2. `system.billing.list_prices`

🔍 Purpose:

Stores per-unit pricing information used to calculate DBU cost by compute type, instance family, cloud region, and pricing tier (e.g., Standard, Premium).

Absolutely! Below is the accurate, organized field documentation for the two Unity Catalog tables:
🔹 system.billing.list_prices
🔹 system.billing.usage

Perfect for technical documentation, dashboards, or audit reference.

📘 `system.billing.list_prices` – Price Reference Table

Column	Data Type	Description
`account_id`	`string`	ID of the account this pricing record belongs to
`cloud`	`string`	Cloud provider (e.g., `AWS`, `AZURE`, or `GCP`)
`currency_code`	`string`	Currency for the pricing (e.g., `USD`)
`price_start_time`	`timestamp`	When the price became effective (UTC)
`price_end_time`	`timestamp`	When the price stopped being effective (UTC)
`sku_name`	`string`	Name of the SKU (e.g., compute type, storage tier)
`usage_unit`	`string`	Unit of monetization (e.g., `DBU`)
`pricing`	`struct`	Nested pricing info with the following subfields:• `default`: Default price• `promotional`: Temporary discount• `effective_list`: Final price used for billing

🧠 How to query nested `pricing`:

SELECT 
  sku_name,
  pricing.default AS default_price,
  pricing.promotional.default AS promotional_price,
  pricing.effective_list.default AS effective_price
FROM system.billing.list_prices;

📗 `system.billing.usage` – Actual Usage Record Table

Column	Data Type	Description
`account_id`	`string`	ID of the billing account
`workspace_id`	`string`	Databricks workspace ID
`record_id`	`string`	Unique identifier for this usage record
`sku_name`	`string`	The SKU billed (e.g., `Jobs Compute - Premium`)
`cloud`	`string`	Cloud used (`AWS`, `AZURE`, or `GCP`)
`usage_start_time`	`timestamp`	Start time of the usage (UTC)
`usage_end_time`	`timestamp`	End time of the usage (UTC)
`usage_date`	`date`	Calendar date of usage (for easier aggregation)
`custom_tags`	`map<string,string>`	Tags applied by users (e.g., job name, project ID)
`usage_unit`	`string`	Measurement unit (usually `DBUs`)
`usage_quantity`	`decimal(38,18)`	Number of DBUs or other units consumed
`usage_metadata`	`struct`	Metadata about compute jobs (e.g., cluster ID, job ID)
`identity_metadata`	`struct`	Metadata about who triggered the usage
`record_type`	`string`	Indicates if this is an `ORIGINAL`, `RETRACTION`, or `RESTATEMENT`
`ingestion_date`	`date`	When the usage record was loaded
`billing_origin_product`	`string`	Source product or feature triggering usage
`product_features`	`struct`	Specific product features involved (e.g., DBFS, Photon)
`usage_type`	`string`	Billing basis (e.g., `COMPUTE_TIME`, `STORAGE_SPACE`, `GPU_TIME`, `API_CALLS`)

🔄 Sample Join of `usage` with `list_prices`

While they don’t have a foreign key relationship, you can match on sku_name + cloud + usage_unit to enrich usage with prices:

SELECT 
  u.workspace_id,
  u.sku_name,
  u.usage_quantity,
  p.pricing.effective_list.default AS unit_price,
  u.usage_quantity * p.pricing.effective_list.default AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
WHERE u.usage_date = current_date() - 1;

✅ Summary

Table	Purpose	Key Fields
`list_prices`	Published DBU pricing	`sku_name`, `cloud`, `pricing` (struct)
`usage`	Actual usage events	`usage_quantity`, `usage_unit`, `sku_name`, `workspace_id`

Cost visibility is crucial when operating scalable data workloads in Databricks. Whether you’re managing multiple workspaces, sharing environments across teams, or trying to optimize spend, Unity Catalog’s billing tables give you fine-grained cost tracking.

we’ll explore practical SQL queries that combine system.billing.usage with system.billing.list_prices to help you:

🔍 Understand DBU pricing per SKU
📊 Analyze user- and workspace-level spend
💡 Optimize your compute resources

📘 1. Query Pricing Details per SKU

Databricks stores its prices in a nested JSON format (struct) inside the pricing column. This query extracts all price types:

SELECT 
  sku_name,
  usage_unit,
  pricing.default AS default_price,
  pricing.promotional.default AS promo_price,
  pricing.effective_list.default AS effective_price
FROM system.billing.list_prices
LIMIT 50;

🧠 Use case: Compare list price, temporary discounts, and effective billing rate for any compute type.

✅ JOIN `system.billing.list_prices` WITH `system.billing.usage` AND `system.access.workspaces_latest`

This gives you workspace name and associated prices used during that usage.

✅ Full Query: Price + Usage + Workspace Name

SELECT
wl.workspace_name,
u.workspace_id,
u.sku_name,
u.usage_unit,
p.pricing.default AS default_price,
p.pricing.promotional.default AS promo_price,
p.pricing.effective_list.default AS effective_price
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
LIMIT 50;

🧠 What This Gives You:

workspace_name	sku_name	usage_unit	default_price	promo_price	effective_price
dev-analytics	Jobs Compute – Premium	DBU	0.55	0.30	0.30
prod-ws	SQL Compute – Enterprise	DBU	0.65	null	0.65

📅 2. Daily Estimated Workspace Usage Cost

Track workspace-level costs by multiplying usage quantity by effective DBU price:

SELECT 
  u.workspace_id,
  u.usage_date,
  ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
WHERE u.usage_date >= current_date() - 7
GROUP BY u.workspace_id, u.usage_date
ORDER BY u.usage_date DESC;

🧠 Use case: Build a cost dashboard that breaks down usage by date and workspace.

Query with `workspace_name` Included:

SELECT 
  wl.workspace_name,
  u.workspace_id,
  u.usage_date,
  ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
  ON u.workspace_id = wl.workspace_id
WHERE u.usage_date >= current_date() - 7
GROUP BY wl.workspace_name, u.workspace_id, u.usage_date
ORDER BY u.usage_date DESC;

🧾 Sample Output:

workspace_name	workspace_id	usage_date	estimated_cost
prod-workspace	abc-123	2025-06-30	418.74
dev-workspace	def-456	2025-06-30	113.05
staging-workspace	ghi-789	2025-06-29	256.92

This version gives you:

Human-readable workspace names
Daily cost per workspace over the last 7 days

💰 3. Identify High-Cost SKUs

Want to know which compute types cost more than expected? Use this:

SELECT 
  sku_name,
  usage_unit,
  pricing.default AS default_price
FROM system.billing.list_prices
WHERE pricing.default > 0.5
ORDER BY default_price DESC;

🧠 Use case: Identify expensive SKU configurations that may be candidates for cost optimization.

🏷️ What is `sku_name`?

sku_name stands for Stock Keeping Unit, and in Databricks billing, it refers to a specific product, feature, or compute type being billed.

🔍 Examples of `sku_name` values:

Here are typical values you might see in sku_name:

`sku_name` Example	Meaning
`Jobs Compute - Premium`	DBUs consumed by automated jobs on Premium tier
`SQL Compute - Enterprise`	Usage from SQL warehouses on Enterprise tier
`Photon Compute - Serverless`	Usage from Photon-based clusters
`Model Serving Compute`	ML model inference compute
`Delta Live Tables - Pro`	DLT jobs with Pro features
`Serverless Interactive Compute`	Serverless notebooks or dashboards

🧠 Why It Matters:

The sku_name tells you what you’re being billed for, and is key for:

Cost attribution (e.g., SQL vs. Jobs vs. Model Serving)
Pricing comparison (e.g., Enterprise vs. Premium)
Filtering by workload type in dashboards or reports

💡 If You Want to See `sku_name` by Workspace

The list_prices table does not contain workspace info because prices are global.

To get which workspaces are using which SKUs, join with system.billing.usage like this:

SELECT 
  wl.workspace_name,
  u.sku_name,
  COUNT(*) AS usage_records,
  ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name AND u.cloud = p.cloud AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
  ON u.workspace_id = wl.workspace_id
WHERE p.pricing.default > 0.5
GROUP BY wl.workspace_name, u.sku_name
ORDER BY estimated_cost DESC;

👥 4. User-Level Usage Cost (Last 7 Days)

This query shows which users are consuming the most DBUs over the past week:

SELECT 
  u.identity_metadata.created_by AS user_email,
  ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS user_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
WHERE u.usage_date >= current_date() - 7
GROUP BY u.identity_metadata.created_by
ORDER BY user_cost DESC;

🧠 Use case: Build chargeback reports or identify training needs for high-cost users.

✅ Query: User-Level Cost with Workspace Name

SELECT
wl.workspace_name,
u.workspace_id,
u.identity_metadata.created_by AS user_email,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS user_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE u.usage_date >= current_date() – 7
GROUP BY wl.workspace_name, u.workspace_id, u.identity_metadata.created_by
ORDER BY user_cost DESC;

🧾 Sample Output:

workspace_name	workspace_id	user_email	user_cost
prod-ws	abc-123	user1@databricks.com	451.23
dev-ws	def-456	service-account@app	120.78

🧾 5. Total Cost by SKU

Summarize cost across all compute and product SKUs:

SELECT 
  u.sku_name,
  ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS sku_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
GROUP BY u.sku_name
ORDER BY sku_cost DESC;

🧠 Use case: Understand which SKUs drive most of your Databricks bill.

✅ Query: SKU-Level Cost with Workspace Name

SELECT
wl.workspace_name,
u.workspace_id,
u.sku_name,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS sku_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
GROUP BY wl.workspace_name, u.workspace_id, u.sku_name
ORDER BY sku_cost DESC;

🔗 Bonus Tip: Filter by Product Features or Tags

If your organization applies custom tags (like job names, project IDs), you can filter usage by custom_tags or product_features for even deeper cost attribution.

Example:

SELECT
  u.custom_tags['project'] AS project_name,
  SUM(u.usage_quantity * p.pricing.effective_list.default) AS cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name AND u.cloud = p.cloud AND u.usage_unit = p.usage_unit
WHERE u.custom_tags['project'] IS NOT NULL
GROUP BY project_name
ORDER BY cost DESC;

✅ Suggested Enhancements for Your Query

1. Add `workspace_name` (for context across environments)

JOIN system.access.workspaces_latest wl
  ON u.workspace_id = wl.workspace_id

2. Add `usage_date` (for time-based trends)

Group by or filter by recent usage:

u.usage_date >= current_date() - 30

✅ Enhanced Query

SELECT
wl.workspace_name,
u.workspace_id,
u.custom_tags[‘project’] AS project_name,
u.sku_name,
u.identity_metadata.created_by AS created_by_user,
u.usage_date,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS total_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE u.custom_tags[‘project’] IS NOT NULL
AND u.usage_date >= current_date() – 30
GROUP BY
wl.workspace_name, u.workspace_id, project_name, u.sku_name, created_by_user, u.usage_date
ORDER BY total_cost DESC;

🧾 What You’ll Get:

workspace_name	project_name	sku_name	created_by_user	usage_date	total_cost
prod-ws	billing-app	Jobs Compute – Premium	user@app.com	2025-06-30	115.23
dev-lab	data-sync	SQL Compute – Pro	svc@datateam	2025-06-29	78.40

✅ Conclusion

With just a few SQL queries, you can turn Databricks’ system billing tables into a complete cost observability platform:

Metric	Query
Cost by workspace	Query #2
Cost by user	Query #4
Cost by SKU	Query #5
Pricing breakdown	Query #1 and #3
Tag-level tracking	Bonus tip

These queries are easily embedded into Databricks dashboards, alerting systems, or exported to BI tools like Power BI or Tableau.

Mohammad Gufran Jahangir

Tags: Databricks

Category:

Track and Optimize Your Databricks Costs with system.billing Tables in Unity Catalog

💰 Track and Optimize Your Databricks Costs with system.billing Tables in Unity Catalog

🧾 What is system.billing?

📊 1. system.billing.usage

🔍 Purpose:

🧾 2. system.billing.list_prices

🔍 Purpose:

📘 system.billing.list_prices – Price Reference Table

🧠 How to query nested pricing:

📗 system.billing.usage – Actual Usage Record Table

🔄 Sample Join of usage with list_prices

✅ Summary

📘 1. Query Pricing Details per SKU

✅ JOIN system.billing.list_prices WITH system.billing.usage AND system.access.workspaces_latest

✅ Full Query: Price + Usage + Workspace Name

🧠 What This Gives You:

📅 2. Daily Estimated Workspace Usage Cost

Query with workspace_name Included:

🧾 Sample Output:

💰 3. Identify High-Cost SKUs

🏷️ What is sku_name?

🔍 Examples of sku_name values:

🧠 Why It Matters:

💡 If You Want to See sku_name by Workspace

👥 4. User-Level Usage Cost (Last 7 Days)

🧾 Sample Output:

🧾 5. Total Cost by SKU

🔗 Bonus Tip: Filter by Product Features or Tags

✅ Suggested Enhancements for Your Query

1. Add workspace_name (for context across environments)

2. Add usage_date (for time-based trends)

🧾 What You’ll Get:

✅ Conclusion

💰 Track and Optimize Your Databricks Costs with `system.billing` Tables in Unity Catalog

🧾 What is `system.billing`?

📊 1. `system.billing.usage`

🧾 2. `system.billing.list_prices`

📘 `system.billing.list_prices` – Price Reference Table

🧠 How to query nested `pricing`:

📗 `system.billing.usage` – Actual Usage Record Table

🔄 Sample Join of `usage` with `list_prices`

✅ JOIN `system.billing.list_prices` WITH `system.billing.usage` AND `system.access.workspaces_latest`

Query with `workspace_name` Included:

🏷️ What is `sku_name`?

🔍 Examples of `sku_name` values:

💡 If You Want to See `sku_name` by Workspace

1. Add `workspace_name` (for context across environments)

2. Add `usage_date` (for time-based trends)