What are DBSQL Alerts?
Databricks SQL alerts run a SQL query on a cadence, check a condition (threshold/boolean), and notify destinations (email, Slack, Teams, PagerDuty, webhooks) when the condition is met. An alert has its own schedule; it does not reuse the query’s schedule. (Databricks Documentation)
The moving parts (2-minute mental model)
- Query: a SQL statement returning the metric/flag to evaluate (often 1 row).
- Alert: binds to the query, defines the condition, the schedule, and the destinations.
- Compute: a SQL warehouse that executes each run.
- Destinations: where the message goes (email, Slack, Teams, PagerDuty, webhook). Admins configure these centrally. (Databricks Documentation)
When you save a schedule, Databricks wires up a managed run behind the scenes so each interval executes the query and evaluates the condition. You can even jump from the alert to the underlying run history for observability. (Medium)
Create your first alert (click-ops)
- Write the query (example: daily revenue must be ≥ target).
SELECT date_trunc('day', order_ts) AS d, SUM(amount) AS revenue FROM prod.dw.fact_sales WHERE order_ts >= current_date() - interval 1 day GROUP BY 1 - Save the query in DBSQL.
- Click Create alert → pick the saved query.
- Set Condition (e.g.,
revenue < 100000). - Choose a SQL warehouse to run on and set SCHEDULE (for example, every 15 minutes in UTC).
- Add Destinations (team email, Slack webhook, etc.). (Databricks Documentation, Microsoft Learn)
Note: An alert’s schedule always runs its query and checks the condition independently of any schedule set on the query itself. (Databricks Documentation)
Typical alert patterns (copy-paste)
SLO / KPI breach
-- If P95 latency (ms) > 2000 then alert
SELECT percentile_approx(latency_ms, 0.95) AS p95
FROM prod.ops.request_metrics
WHERE event_ts >= now() - interval 5 minutes;
-- Condition: p95 > 2000
Nulls / data quality
SELECT COUNT(*) AS bad_rows
FROM prod.silver.orders
WHERE order_id IS NULL OR amount IS NULL;
-- Condition: bad_rows > 0
Freshness / staleness
SELECT
(unix_timestamp() - unix_timestamp(MAX(ingested_at))) AS seconds_since_last
FROM prod.bronze.source_feed;
-- Condition: seconds_since_last > 600
Destinations (Slack, Teams, PagerDuty, webhooks)
Workspace admins register destinations once; alert creators just pick them from a list. Supported: email, Slack, Microsoft Teams, PagerDuty, webhooks (anything that accepts HTTPS). (Databricks Documentation, Microsoft Learn, Terraform Registry)
Scheduling tips
- Use
EVERY N MINUTES/HOURS/DAYS/WEEKS AT TIME ZONE '…'. - Pick a warehouse size aligned to the query’s cost; alerts are lightweight, but give them stable compute.
- Keep schedules near data freshness (e.g., 5–15 min for near-real-time, hourly for batch). (Databricks Documentation)
Permissions you actually need
- To query data: normal Unity Catalog privileges (
USE CATALOG/SCHEMA,SELECTon referenced objects). (Databricks Documentation, Microsoft Learn) - To send to Slack/Teams/PagerDuty: a registered notification destination (admin-managed). (Databricks Documentation)
- To manage alerts via CLI/IaC: appropriate workspace perms + SQL access; see resources below. (Azure Documentation, Terraform Registry)
Automate alerts (CLI & IaC)
CLI (alerts command group)
# Create (JSON payload defines query_id, condition, schedule, destinations)
databricks alerts create --json @alert.json
databricks alerts list
databricks alerts get --alert-id <id>
databricks alerts delete --alert-id <id>
Terraform (destinations + alerts)
resource "databricks_notification_destination" "slack" {
name = "alerts-slack"
type = "SLACK"
config = jsonencode({ url = var.slack_webhook })
}
resource "databricks_sql_alert" "dq_nulls" {
name = "DQ: Nulls in orders"
alert_type = "GREATER_THAN"
value = "0"
query_id = databricks_sql_query.orders_nulls.id
destinations = [databricks_notification_destination.slack.id]
schedule {
cron = "*/15 * * * *" # every 15 min
timezone_id = "UTC"
}
}
Operations & troubleshooting
- No notifications? Verify the destination exists and test it (Slack/Teams webhooks often fail on missing perms or firewalls). (Microsoft Learn)
- “Alert didn’t run”: Check alert run history; an alert schedule creates a backing run you can inspect (logs, duration). (Medium)
- Query permissions: if the alert can’t read the table, fix UC grants (
USE+SELECT). (Databricks Documentation, Microsoft Learn) - Prefer materialized views? For heavy aggregates, precompute with an MV and alert on the MV; this trades freshness for speed. (MV scheduling works similarly.) (Databricks Documentation)
Quick checklist (copy to your runbook)
- Create SQL warehouse for alerts.
- Admin registers notification destinations (Slack/Teams/webhook). (Databricks Documentation)
- Write & save the query (returns one row/metric).
- Create alert → set condition, schedule, destinations. (Databricks Documentation)
- Validate with a forced breach (e.g., set very low/high thresholds).
- Add CLI/Terraform for repeatable environments. (Azure Documentation, Terraform Registry)
Sources
- Databricks docs: SQL alerts, schedules, and destinations. (Databricks Documentation)
- Azure doc mirrors for alerts & destinations (same concepts). (Microsoft Learn)
- Engineering write-up on alert scheduling & run visibility. (Medium)
- CLI & Terraform references for automation. (Azure Documentation, Terraform Registry)