💰 Track and Optimize Your Databricks Costs with system.billing
Tables in Unity Catalog
Databricks has become a core component of many modern data stacks — but with flexibility and scale comes the challenge of cost visibility. Fortunately, Databricks’ Unity Catalog includes a system.billing
schema designed to give you full transparency into usage and pricing.
In this blog, we’ll explore the two billing tables — list_prices
and usage
— and how to query them for cost optimization, budgeting, and chargeback.
🧾 What is system.billing
?
The system.billing
schema exposes two critical tables:
list_prices
: Reference pricing for compute resourcesusage
: Actual usage data, such as DBUs consumed, job runs, workspace-level costs
📊 1. system.billing.usage
🔍 Purpose:
This table logs real-time usage of Databricks resources such as:
- Jobs and cluster runs
- SQL warehouses
- DBUs (Databricks Units)
- Workspace, user, and SKU (pricing tier)
🧾 2. system.billing.list_prices
🔍 Purpose:
Stores per-unit pricing information used to calculate DBU cost by compute type, instance family, cloud region, and pricing tier (e.g., Standard, Premium).
Absolutely! Below is the accurate, organized field documentation for the two Unity Catalog tables:
🔹 system.billing.list_prices
🔹 system.billing.usage
Perfect for technical documentation, dashboards, or audit reference.
📘 system.billing.list_prices
– Price Reference Table
Column | Data Type | Description |
---|---|---|
account_id | string | ID of the account this pricing record belongs to |
cloud | string | Cloud provider (e.g., AWS , AZURE , or GCP ) |
currency_code | string | Currency for the pricing (e.g., USD ) |
price_start_time | timestamp | When the price became effective (UTC) |
price_end_time | timestamp | When the price stopped being effective (UTC) |
sku_name | string | Name of the SKU (e.g., compute type, storage tier) |
usage_unit | string | Unit of monetization (e.g., DBU ) |
pricing | struct | Nested pricing info with the following subfields:• default : Default price• promotional : Temporary discount• effective_list : Final price used for billing |
🧠 How to query nested pricing
:
SELECT
sku_name,
pricing.default AS default_price,
pricing.promotional.default AS promotional_price,
pricing.effective_list.default AS effective_price
FROM system.billing.list_prices;
📗 system.billing.usage
– Actual Usage Record Table
Column | Data Type | Description |
---|---|---|
account_id | string | ID of the billing account |
workspace_id | string | Databricks workspace ID |
record_id | string | Unique identifier for this usage record |
sku_name | string | The SKU billed (e.g., Jobs Compute - Premium ) |
cloud | string | Cloud used (AWS , AZURE , or GCP ) |
usage_start_time | timestamp | Start time of the usage (UTC) |
usage_end_time | timestamp | End time of the usage (UTC) |
usage_date | date | Calendar date of usage (for easier aggregation) |
custom_tags | map<string,string> | Tags applied by users (e.g., job name, project ID) |
usage_unit | string | Measurement unit (usually DBUs ) |
usage_quantity | decimal(38,18) | Number of DBUs or other units consumed |
usage_metadata | struct | Metadata about compute jobs (e.g., cluster ID, job ID) |
identity_metadata | struct | Metadata about who triggered the usage |
record_type | string | Indicates if this is an ORIGINAL , RETRACTION , or RESTATEMENT |
ingestion_date | date | When the usage record was loaded |
billing_origin_product | string | Source product or feature triggering usage |
product_features | struct | Specific product features involved (e.g., DBFS, Photon) |
usage_type | string | Billing basis (e.g., COMPUTE_TIME , STORAGE_SPACE , GPU_TIME , API_CALLS ) |
🔄 Sample Join of usage
with list_prices
While they don’t have a foreign key relationship, you can match on sku_name
+ cloud
+ usage_unit
to enrich usage with prices:
SELECT
u.workspace_id,
u.sku_name,
u.usage_quantity,
p.pricing.effective_list.default AS unit_price,
u.usage_quantity * p.pricing.effective_list.default AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
WHERE u.usage_date = current_date() - 1;
✅ Summary
Table | Purpose | Key Fields |
---|---|---|
list_prices | Published DBU pricing | sku_name , cloud , pricing (struct) |
usage | Actual usage events | usage_quantity , usage_unit , sku_name , workspace_id |
Cost visibility is crucial when operating scalable data workloads in Databricks. Whether you’re managing multiple workspaces, sharing environments across teams, or trying to optimize spend, Unity Catalog’s billing tables give you fine-grained cost tracking.
we’ll explore practical SQL queries that combine system.billing.usage
with system.billing.list_prices
to help you:
- 🔍 Understand DBU pricing per SKU
- 📊 Analyze user- and workspace-level spend
- 💡 Optimize your compute resources
📘 1. Query Pricing Details per SKU
Databricks stores its prices in a nested JSON format (struct
) inside the pricing
column. This query extracts all price types:
SELECT
sku_name,
usage_unit,
pricing.default AS default_price,
pricing.promotional.default AS promo_price,
pricing.effective_list.default AS effective_price
FROM system.billing.list_prices
LIMIT 50;
🧠 Use case: Compare list price, temporary discounts, and effective billing rate for any compute type.
✅ JOIN system.billing.list_prices
WITH system.billing.usage
AND system.access.workspaces_latest
This gives you workspace name and associated prices used during that usage.
✅ Full Query: Price + Usage + Workspace Name
SELECT
wl.workspace_name,
u.workspace_id,
u.sku_name,
u.usage_unit,
p.pricing.default AS default_price,
p.pricing.promotional.default AS promo_price,
p.pricing.effective_list.default AS effective_price
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
LIMIT 50;
🧠 What This Gives You:
workspace_name | sku_name | usage_unit | default_price | promo_price | effective_price |
---|---|---|---|---|---|
dev-analytics | Jobs Compute – Premium | DBU | 0.55 | 0.30 | 0.30 |
prod-ws | SQL Compute – Enterprise | DBU | 0.65 | null | 0.65 |
📅 2. Daily Estimated Workspace Usage Cost
Track workspace-level costs by multiplying usage quantity by effective DBU price:
SELECT
u.workspace_id,
u.usage_date,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
WHERE u.usage_date >= current_date() - 7
GROUP BY u.workspace_id, u.usage_date
ORDER BY u.usage_date DESC;
🧠 Use case: Build a cost dashboard that breaks down usage by date and workspace.
Query with workspace_name
Included:
SELECT
wl.workspace_name,
u.workspace_id,
u.usage_date,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE u.usage_date >= current_date() - 7
GROUP BY wl.workspace_name, u.workspace_id, u.usage_date
ORDER BY u.usage_date DESC;
🧾 Sample Output:
workspace_name | workspace_id | usage_date | estimated_cost |
---|---|---|---|
prod-workspace | abc-123 | 2025-06-30 | 418.74 |
dev-workspace | def-456 | 2025-06-30 | 113.05 |
staging-workspace | ghi-789 | 2025-06-29 | 256.92 |
This version gives you:
- Human-readable workspace names
- Daily cost per workspace over the last 7 days
💰 3. Identify High-Cost SKUs
Want to know which compute types cost more than expected? Use this:
SELECT
sku_name,
usage_unit,
pricing.default AS default_price
FROM system.billing.list_prices
WHERE pricing.default > 0.5
ORDER BY default_price DESC;
🧠 Use case: Identify expensive SKU configurations that may be candidates for cost optimization.
🏷️ What is sku_name
?
sku_name
stands for Stock Keeping Unit, and in Databricks billing, it refers to a specific product, feature, or compute type being billed.
🔍 Examples of sku_name
values:
Here are typical values you might see in sku_name
:
sku_name Example | Meaning |
---|---|
Jobs Compute - Premium | DBUs consumed by automated jobs on Premium tier |
SQL Compute - Enterprise | Usage from SQL warehouses on Enterprise tier |
Photon Compute - Serverless | Usage from Photon-based clusters |
Model Serving Compute | ML model inference compute |
Delta Live Tables - Pro | DLT jobs with Pro features |
Serverless Interactive Compute | Serverless notebooks or dashboards |
🧠 Why It Matters:
The sku_name
tells you what you’re being billed for, and is key for:
- Cost attribution (e.g., SQL vs. Jobs vs. Model Serving)
- Pricing comparison (e.g., Enterprise vs. Premium)
- Filtering by workload type in dashboards or reports
💡 If You Want to See sku_name
by Workspace
The list_prices
table does not contain workspace info because prices are global.
To get which workspaces are using which SKUs, join with system.billing.usage
like this:
SELECT
wl.workspace_name,
u.sku_name,
COUNT(*) AS usage_records,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name AND u.cloud = p.cloud AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE p.pricing.default > 0.5
GROUP BY wl.workspace_name, u.sku_name
ORDER BY estimated_cost DESC;
👥 4. User-Level Usage Cost (Last 7 Days)
This query shows which users are consuming the most DBUs over the past week:
SELECT
u.identity_metadata.created_by AS user_email,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS user_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
WHERE u.usage_date >= current_date() - 7
GROUP BY u.identity_metadata.created_by
ORDER BY user_cost DESC;
🧠 Use case: Build chargeback reports or identify training needs for high-cost users.
✅ Query: User-Level Cost with Workspace Name
SELECT
wl.workspace_name,
u.workspace_id,
u.identity_metadata.created_by AS user_email,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS user_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE u.usage_date >= current_date() – 7
GROUP BY wl.workspace_name, u.workspace_id, u.identity_metadata.created_by
ORDER BY user_cost DESC;
🧾 Sample Output:
workspace_name | workspace_id | user_email | user_cost |
---|---|---|---|
prod-ws | abc-123 | user1@databricks.com | 451.23 |
dev-ws | def-456 | service-account@app | 120.78 |
🧾 5. Total Cost by SKU
Summarize cost across all compute and product SKUs:
SELECT
u.sku_name,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS sku_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
GROUP BY u.sku_name
ORDER BY sku_cost DESC;
🧠 Use case: Understand which SKUs drive most of your Databricks bill.
✅ Query: SKU-Level Cost with Workspace Name
SELECT
wl.workspace_name,
u.workspace_id,
u.sku_name,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS sku_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
GROUP BY wl.workspace_name, u.workspace_id, u.sku_name
ORDER BY sku_cost DESC;
🔗 Bonus Tip: Filter by Product Features or Tags
If your organization applies custom tags (like job names, project IDs), you can filter usage by custom_tags
or product_features
for even deeper cost attribution.
Example:
SELECT
u.custom_tags['project'] AS project_name,
SUM(u.usage_quantity * p.pricing.effective_list.default) AS cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name AND u.cloud = p.cloud AND u.usage_unit = p.usage_unit
WHERE u.custom_tags['project'] IS NOT NULL
GROUP BY project_name
ORDER BY cost DESC;
✅ Suggested Enhancements for Your Query
1. Add workspace_name
(for context across environments)
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
2. Add usage_date
(for time-based trends)
Group by or filter by recent usage:
u.usage_date >= current_date() - 30
✅ Enhanced Query
SELECT
wl.workspace_name,
u.workspace_id,
u.custom_tags[‘project’] AS project_name,
u.sku_name,
u.identity_metadata.created_by AS created_by_user,
u.usage_date,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS total_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE u.custom_tags[‘project’] IS NOT NULL
AND u.usage_date >= current_date() – 30
GROUP BY
wl.workspace_name, u.workspace_id, project_name, u.sku_name, created_by_user, u.usage_date
ORDER BY total_cost DESC;
🧾 What You’ll Get:
workspace_name | project_name | sku_name | created_by_user | usage_date | total_cost |
---|---|---|---|---|---|
prod-ws | billing-app | Jobs Compute – Premium | user@app.com | 2025-06-30 | 115.23 |
dev-lab | data-sync | SQL Compute – Pro | svc@datateam | 2025-06-29 | 78.40 |
✅ Conclusion
With just a few SQL queries, you can turn Databricks’ system billing tables into a complete cost observability platform:
Metric | Query |
---|---|
Cost by workspace | Query #2 |
Cost by user | Query #4 |
Cost by SKU | Query #5 |
Pricing breakdown | Query #1 and #3 |
Tag-level tracking | Bonus tip |
These queries are easily embedded into Databricks dashboards, alerting systems, or exported to BI tools like Power BI or Tableau.
Leave a Reply