,

Track and Optimize Your Databricks Costs with system.billing Tables in Unity Catalog

Posted by


💰 Track and Optimize Your Databricks Costs with system.billing Tables in Unity Catalog

Databricks has become a core component of many modern data stacks — but with flexibility and scale comes the challenge of cost visibility. Fortunately, Databricks’ Unity Catalog includes a system.billing schema designed to give you full transparency into usage and pricing.

In this blog, we’ll explore the two billing tables — list_prices and usage — and how to query them for cost optimization, budgeting, and chargeback.


🧾 What is system.billing?

The system.billing schema exposes two critical tables:

  • list_prices: Reference pricing for compute resources
  • usage: Actual usage data, such as DBUs consumed, job runs, workspace-level costs

📊 1. system.billing.usage

🔍 Purpose:

This table logs real-time usage of Databricks resources such as:

  • Jobs and cluster runs
  • SQL warehouses
  • DBUs (Databricks Units)
  • Workspace, user, and SKU (pricing tier)


🧾 2. system.billing.list_prices

🔍 Purpose:

Stores per-unit pricing information used to calculate DBU cost by compute type, instance family, cloud region, and pricing tier (e.g., Standard, Premium).

Absolutely! Below is the accurate, organized field documentation for the two Unity Catalog tables:
🔹 system.billing.list_prices
🔹 system.billing.usage

Perfect for technical documentation, dashboards, or audit reference.


📘 system.billing.list_prices – Price Reference Table

ColumnData TypeDescription
account_idstringID of the account this pricing record belongs to
cloudstringCloud provider (e.g., AWS, AZURE, or GCP)
currency_codestringCurrency for the pricing (e.g., USD)
price_start_timetimestampWhen the price became effective (UTC)
price_end_timetimestampWhen the price stopped being effective (UTC)
sku_namestringName of the SKU (e.g., compute type, storage tier)
usage_unitstringUnit of monetization (e.g., DBU)
pricingstructNested pricing info with the following subfields:• default: Default price• promotional: Temporary discount• effective_list: Final price used for billing

🧠 How to query nested pricing:

SELECT 
  sku_name,
  pricing.default AS default_price,
  pricing.promotional.default AS promotional_price,
  pricing.effective_list.default AS effective_price
FROM system.billing.list_prices;

📗 system.billing.usage – Actual Usage Record Table

ColumnData TypeDescription
account_idstringID of the billing account
workspace_idstringDatabricks workspace ID
record_idstringUnique identifier for this usage record
sku_namestringThe SKU billed (e.g., Jobs Compute - Premium)
cloudstringCloud used (AWS, AZURE, or GCP)
usage_start_timetimestampStart time of the usage (UTC)
usage_end_timetimestampEnd time of the usage (UTC)
usage_datedateCalendar date of usage (for easier aggregation)
custom_tagsmap<string,string>Tags applied by users (e.g., job name, project ID)
usage_unitstringMeasurement unit (usually DBUs)
usage_quantitydecimal(38,18)Number of DBUs or other units consumed
usage_metadatastructMetadata about compute jobs (e.g., cluster ID, job ID)
identity_metadatastructMetadata about who triggered the usage
record_typestringIndicates if this is an ORIGINAL, RETRACTION, or RESTATEMENT
ingestion_datedateWhen the usage record was loaded
billing_origin_productstringSource product or feature triggering usage
product_featuresstructSpecific product features involved (e.g., DBFS, Photon)
usage_typestringBilling basis (e.g., COMPUTE_TIME, STORAGE_SPACE, GPU_TIME, API_CALLS)

🔄 Sample Join of usage with list_prices

While they don’t have a foreign key relationship, you can match on sku_name + cloud + usage_unit to enrich usage with prices:

SELECT 
  u.workspace_id,
  u.sku_name,
  u.usage_quantity,
  p.pricing.effective_list.default AS unit_price,
  u.usage_quantity * p.pricing.effective_list.default AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
WHERE u.usage_date = current_date() - 1;

✅ Summary

TablePurposeKey Fields
list_pricesPublished DBU pricingsku_name, cloud, pricing (struct)
usageActual usage eventsusage_quantity, usage_unit, sku_name, workspace_id

Cost visibility is crucial when operating scalable data workloads in Databricks. Whether you’re managing multiple workspaces, sharing environments across teams, or trying to optimize spend, Unity Catalog’s billing tables give you fine-grained cost tracking.

we’ll explore practical SQL queries that combine system.billing.usage with system.billing.list_prices to help you:

  • 🔍 Understand DBU pricing per SKU
  • 📊 Analyze user- and workspace-level spend
  • 💡 Optimize your compute resources

📘 1. Query Pricing Details per SKU

Databricks stores its prices in a nested JSON format (struct) inside the pricing column. This query extracts all price types:

SELECT 
  sku_name,
  usage_unit,
  pricing.default AS default_price,
  pricing.promotional.default AS promo_price,
  pricing.effective_list.default AS effective_price
FROM system.billing.list_prices
LIMIT 50;

🧠 Use case: Compare list price, temporary discounts, and effective billing rate for any compute type.

✅ JOIN system.billing.list_prices WITH system.billing.usage AND system.access.workspaces_latest

This gives you workspace name and associated prices used during that usage.

✅ Full Query: Price + Usage + Workspace Name

SELECT
wl.workspace_name,
u.workspace_id,
u.sku_name,
u.usage_unit,
p.pricing.default AS default_price,
p.pricing.promotional.default AS promo_price,
p.pricing.effective_list.default AS effective_price
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
LIMIT 50;

🧠 What This Gives You:

workspace_namesku_nameusage_unitdefault_pricepromo_priceeffective_price
dev-analyticsJobs Compute – PremiumDBU0.550.300.30
prod-wsSQL Compute – EnterpriseDBU0.65null0.65

📅 2. Daily Estimated Workspace Usage Cost

Track workspace-level costs by multiplying usage quantity by effective DBU price:

SELECT 
  u.workspace_id,
  u.usage_date,
  ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
WHERE u.usage_date >= current_date() - 7
GROUP BY u.workspace_id, u.usage_date
ORDER BY u.usage_date DESC;

🧠 Use case: Build a cost dashboard that breaks down usage by date and workspace.

Query with workspace_name Included:

SELECT 
wl.workspace_name,
u.workspace_id,
u.usage_date,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE u.usage_date >= current_date() - 7
GROUP BY wl.workspace_name, u.workspace_id, u.usage_date
ORDER BY u.usage_date DESC;

🧾 Sample Output:

workspace_nameworkspace_idusage_dateestimated_cost
prod-workspaceabc-1232025-06-30418.74
dev-workspacedef-4562025-06-30113.05
staging-workspaceghi-7892025-06-29256.92

This version gives you:

  • Human-readable workspace names
  • Daily cost per workspace over the last 7 days

💰 3. Identify High-Cost SKUs

Want to know which compute types cost more than expected? Use this:

SELECT 
  sku_name,
  usage_unit,
  pricing.default AS default_price
FROM system.billing.list_prices
WHERE pricing.default > 0.5
ORDER BY default_price DESC;

🧠 Use case: Identify expensive SKU configurations that may be candidates for cost optimization.

🏷️ What is sku_name?

sku_name stands for Stock Keeping Unit, and in Databricks billing, it refers to a specific product, feature, or compute type being billed.


🔍 Examples of sku_name values:

Here are typical values you might see in sku_name:

sku_name ExampleMeaning
Jobs Compute - PremiumDBUs consumed by automated jobs on Premium tier
SQL Compute - EnterpriseUsage from SQL warehouses on Enterprise tier
Photon Compute - ServerlessUsage from Photon-based clusters
Model Serving ComputeML model inference compute
Delta Live Tables - ProDLT jobs with Pro features
Serverless Interactive ComputeServerless notebooks or dashboards

🧠 Why It Matters:

The sku_name tells you what you’re being billed for, and is key for:

  • Cost attribution (e.g., SQL vs. Jobs vs. Model Serving)
  • Pricing comparison (e.g., Enterprise vs. Premium)
  • Filtering by workload type in dashboards or reports

💡 If You Want to See sku_name by Workspace

The list_prices table does not contain workspace info because prices are global.

To get which workspaces are using which SKUs, join with system.billing.usage like this:

SELECT 
wl.workspace_name,
u.sku_name,
COUNT(*) AS usage_records,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name AND u.cloud = p.cloud AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE p.pricing.default > 0.5
GROUP BY wl.workspace_name, u.sku_name
ORDER BY estimated_cost DESC;

👥 4. User-Level Usage Cost (Last 7 Days)

This query shows which users are consuming the most DBUs over the past week:

SELECT 
  u.identity_metadata.created_by AS user_email,
  ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS user_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
WHERE u.usage_date >= current_date() - 7
GROUP BY u.identity_metadata.created_by
ORDER BY user_cost DESC;

🧠 Use case: Build chargeback reports or identify training needs for high-cost users.

✅ Query: User-Level Cost with Workspace Name

SELECT
wl.workspace_name,
u.workspace_id,
u.identity_metadata.created_by AS user_email,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS user_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE u.usage_date >= current_date() – 7
GROUP BY wl.workspace_name, u.workspace_id, u.identity_metadata.created_by
ORDER BY user_cost DESC;

🧾 Sample Output:

workspace_nameworkspace_iduser_emailuser_cost
prod-wsabc-123user1@databricks.com451.23
dev-wsdef-456service-account@app120.78

🧾 5. Total Cost by SKU

Summarize cost across all compute and product SKUs:

SELECT 
  u.sku_name,
  ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS sku_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name 
  AND u.cloud = p.cloud 
  AND u.usage_unit = p.usage_unit
GROUP BY u.sku_name
ORDER BY sku_cost DESC;

🧠 Use case: Understand which SKUs drive most of your Databricks bill.

✅ Query: SKU-Level Cost with Workspace Name

SELECT
wl.workspace_name,
u.workspace_id,
u.sku_name,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS sku_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
GROUP BY wl.workspace_name, u.workspace_id, u.sku_name
ORDER BY sku_cost DESC;


🔗 Bonus Tip: Filter by Product Features or Tags

If your organization applies custom tags (like job names, project IDs), you can filter usage by custom_tags or product_features for even deeper cost attribution.

Example:

SELECT
  u.custom_tags['project'] AS project_name,
  SUM(u.usage_quantity * p.pricing.effective_list.default) AS cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
  ON u.sku_name = p.sku_name AND u.cloud = p.cloud AND u.usage_unit = p.usage_unit
WHERE u.custom_tags['project'] IS NOT NULL
GROUP BY project_name
ORDER BY cost DESC;

✅ Suggested Enhancements for Your Query

1. Add workspace_name (for context across environments)

JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id

2. Add usage_date (for time-based trends)

Group by or filter by recent usage:

u.usage_date >= current_date() - 30

✅ Enhanced Query

SELECT
wl.workspace_name,
u.workspace_id,
u.custom_tags[‘project’] AS project_name,
u.sku_name,
u.identity_metadata.created_by AS created_by_user,
u.usage_date,
ROUND(SUM(u.usage_quantity * p.pricing.effective_list.default), 2) AS total_cost
FROM system.billing.usage u
JOIN system.billing.list_prices p
ON u.sku_name = p.sku_name
AND u.cloud = p.cloud
AND u.usage_unit = p.usage_unit
JOIN system.access.workspaces_latest wl
ON u.workspace_id = wl.workspace_id
WHERE u.custom_tags[‘project’] IS NOT NULL
AND u.usage_date >= current_date() – 30
GROUP BY
wl.workspace_name, u.workspace_id, project_name, u.sku_name, created_by_user, u.usage_date
ORDER BY total_cost DESC;

🧾 What You’ll Get:

workspace_nameproject_namesku_namecreated_by_userusage_datetotal_cost
prod-wsbilling-appJobs Compute – Premiumuser@app.com2025-06-30115.23
dev-labdata-syncSQL Compute – Prosvc@datateam2025-06-2978.40

✅ Conclusion

With just a few SQL queries, you can turn Databricks’ system billing tables into a complete cost observability platform:

MetricQuery
Cost by workspaceQuery #2
Cost by userQuery #4
Cost by SKUQuery #5
Pricing breakdownQuery #1 and #3
Tag-level trackingBonus tip

These queries are easily embedded into Databricks dashboards, alerting systems, or exported to BI tools like Power BI or Tableau.


Leave a Reply

Your email address will not be published. Required fields are marked *

0
Would love your thoughts, please comment.x
()
x