Z-ORDER in Databricks: Magic or Myth?
When it helps, when it hurts, how to pick columns, and what KPIs to measure. Introduction: Why the Hype Around Z-ORDER? If you’ve spent time in Databricks, you’ve heard: “Just…
When it helps, when it hurts, how to pick columns, and what KPIs to measure. Introduction: Why the Hype Around Z-ORDER? If you’ve spent time in Databricks, you’ve heard: “Just…
Optimal file sizes, compaction strategies, and how to keep your Delta tables lightning-fast Why Small Files Are a Big Problem Delta Lake is powerful, but it inherits one common “data…
Here’s a simple line-by-line summary of the Serverless compute plane networking (08/04/2025): General idea Serverless egress control (outbound connections) Network Connectivity Configuration (NCC) What NCC enables: Extra note 👉 In…
Here’s a simple, line-by-line summary of the Serverless compute limitations (09/29/2025): General limitations Streaming limitations Machine learning limitations Notebook limitations Job limitations Compute-specific limitations Caching limitations Hive limitations Supported data…
Best practices for serverless compute Big picture Before you migrate Ingesting data (getting data in) Querying external data (without moving it) Spark configurations Watch your costs Quick checklist (copy/paste) Mohammad…
Here’s a simple line-by-line summary of the important points from the Serverless Compute release notes (09/24/2025): Perfect 👍 Here’s a one-page cheat sheet table for the Serverless Compute release notes…
How to detect skew, leverage AQE, use repartitioning patterns, and tune the shuffle service for blazing-fast jobs Why Shuffle Is a Big Deal In Spark (and therefore in Databricks), shuffle…
Benchmarking SQL/Delta workloads, common pitfalls, and a practical migration checklist Why Photon Exists Databricks introduced Photon, a vectorized query engine built in C++ and tightly integrated with Delta Lake and…
How to set min/max nodes, use termination settings, mix spot/preemptible nodes, and avoid “yo-yo” scaling Why Autoscaling Is Tricky Autoscaling is one of Databricks’ most powerful features—but many teams misuse…
How to pick nodes, cores, memory, and disk for ETL vs. ML vs. SQL—and when to scale up vs. out Why Cluster Right-Sizing Matters Databricks gives us the power of…
Lakehouse Federation lets you query external databases directly from Databricks—without copying the data into your lake. You point Unity Catalog at a source (PostgreSQL, SQL Server, Redshift, Snowflake, BigQuery, another…
What is a streaming table in Databricks? A streaming table is a Delta table that Databricks keeps up-to-date automatically as new data arrives. It uses Structured Streaming under the hood…
Topic View Materialized View What it is A stored query; results are computed each time you query it A precomputed table of the query’s results, stored on disk Performance Same…
What is Databricks SQL (DBSQL)? Databricks SQL is the lakehouse analytics experience for running SQL, building dashboards, alerts, and jobs on Delta tables. You connect to a SQL warehouse—a managed…
Delta Sharing lets you publish live, governed data to other teams or external partners—without copying files or building custom APIs. Recipients can query the latest data directly from their tools…
Why workspace‑catalog binding? By default, any workspace attached to the same Unity Catalog metastore can see and access catalogs (subject to object‑level grants). Workspace‑catalog binding lets you restrict which workspaces…
What is column‑level masking? Column masks hide or transform sensitive values (PII/PCI/PHI) at query time. Every read of a masked column is replaced by the output of a masking function,…
What you’ll learn Prereqs: Unity Catalog enabled; DBR 12.2 LTS+ for reads (see notes for dedicated compute), and a SQL warehouse or UC‑enabled cluster. Concepts in 30 seconds When to…
What you’ll learn Prereqs: Unity Catalog enabled, a catalog/schema you can write to, and a SQL Warehouse or notebook attached to a UC‑enabled cluster. Quick definitions Create your first scalar…
Secrets (passwords, keys, tokens) should never live in notebooks or job configs. Databricks gives you a built-in, governed place to put them—secret scopes—and safe ways to read them at runtime.…