Databricks

Mohammad Gufran Jahangir October 2, 2025 0

Z-ORDER in Databricks: Magic or Myth?

When it helps, when it hurts, how to pick columns, and what KPIs to measure. Introduction: Why the Hype Around Z-ORDER? If you’ve spent time in Databricks, you’ve heard: “Just…

READ MORE +

Mohammad Gufran Jahangir October 1, 2025 0

Taming Small Files in Delta Lake

Optimal file sizes, compaction strategies, and how to keep your Delta tables lightning-fast Why Small Files Are a Big Problem Delta Lake is powerful, but it inherits one common “data…

READ MORE +

Mohammad Gufran Jahangir September 30, 2025 0

Databrick Serverless compute plane networking

Here’s a simple line-by-line summary of the Serverless compute plane networking (08/04/2025): General idea Serverless egress control (outbound connections) Network Connectivity Configuration (NCC) What NCC enables: Extra note 👉 In…

READ MORE +

Mohammad Gufran Jahangir September 30, 2025 0

Databrick Serverless compute limitations

Here’s a simple, line-by-line summary of the Serverless compute limitations (09/29/2025): General limitations Streaming limitations Machine learning limitations Notebook limitations Job limitations Compute-specific limitations Caching limitations Hive limitations Supported data…

READ MORE +

Mohammad Gufran Jahangir September 30, 2025 0

Best practices for Databrick serverless compute

Best practices for serverless compute Big picture Before you migrate Ingesting data (getting data in) Querying external data (without moving it) Spark configurations Watch your costs Quick checklist (copy/paste) Mohammad…

READ MORE +

Mohammad Gufran Jahangir September 30, 2025 0

Databrick Serverless compute release

Here’s a simple line-by-line summary of the important points from the Serverless Compute release notes (09/24/2025): Perfect 👍 Here’s a one-page cheat sheet table for the Serverless Compute release notes…

READ MORE +

Mohammad Gufran Jahangir September 30, 2025 0

Shuffle Without Tears in Databricks

How to detect skew, leverage AQE, use repartitioning patterns, and tune the shuffle service for blazing-fast jobs Why Shuffle Is a Big Deal In Spark (and therefore in Databricks), shuffle…

READ MORE +

Mohammad Gufran Jahangir September 29, 2025 0

Photon vs. Non-Photon: When It Matters

Benchmarking SQL/Delta workloads, common pitfalls, and a practical migration checklist Why Photon Exists Databricks introduced Photon, a vectorized query engine built in C++ and tightly integrated with Delta Lake and…

READ MORE +

Mohammad Gufran Jahangir September 28, 2025 0

Autoscaling That Actually Saves Money in Databricks

How to set min/max nodes, use termination settings, mix spot/preemptible nodes, and avoid “yo-yo” scaling Why Autoscaling Is Tricky Autoscaling is one of Databricks’ most powerful features—but many teams misuse…

READ MORE +

Mohammad Gufran Jahangir September 28, 2025 0

Right-Sizing Clusters Like a Pro in Databricks

How to pick nodes, cores, memory, and disk for ETL vs. ML vs. SQL—and when to scale up vs. out Why Cluster Right-Sizing Matters Databricks gives us the power of…

READ MORE +

Mohammad Gufran Jahangir August 30, 2025 0

Lakehouse Federation (Query Federation) in Databricks — What, Why, How

Lakehouse Federation lets you query external databases directly from Databricks—without copying the data into your lake. You point Unity Catalog at a source (PostgreSQL, SQL Server, Redshift, Snowflake, BigQuery, another…

READ MORE +

Mohammad Gufran Jahangir August 30, 2025 0

What is a streaming table in Databricks?

What is a streaming table in Databricks? A streaming table is a Delta table that Databricks keeps up-to-date automatically as new data arrives. It uses Structured Streaming under the hood…

READ MORE +

Mohammad Gufran Jahangir August 30, 2025 0

Materialized view vs. (logical) view

Topic View Materialized View What it is A stored query; results are computed each time you query it A precomputed table of the query’s results, stored on disk Performance Same…

READ MORE +

Mohammad Gufran Jahangir August 30, 2025 0

Data Warehousing on Databricks with Databricks SQL & SQL Warehouses

What is Databricks SQL (DBSQL)? Databricks SQL is the lakehouse analytics experience for running SQL, building dashboards, alerts, and jobs on Delta tables. You connect to a SQL warehouse—a managed…

READ MORE +

Mohammad Gufran Jahangir August 18, 2025 0

Delta Sharing in Databricks — A Practical How-To (Provider & Recipient)

Delta Sharing lets you publish live, governed data to other teams or external partners—without copying files or building custom APIs. Recipients can query the latest data directly from their tools…

READ MORE +

Mohammad Gufran Jahangir August 18, 2025 0

Workspace‑Catalog Binding in Unity Catalog (Limit Catalog Access by Workspace)

Why workspace‑catalog binding? By default, any workspace attached to the same Unity Catalog metastore can see and access catalogs (subject to object‑level grants). Workspace‑catalog binding lets you restrict which workspaces…

READ MORE +

Mohammad Gufran Jahangir August 17, 2025 0

Column Level Masking in Unity Catalog — Simple, Practical Guide

What is column‑level masking? Column masks hide or transform sensitive values (PII/PCI/PHI) at query time. Every read of a masked column is replaced by the output of a masking function,…

READ MORE +

Mohammad Gufran Jahangir August 17, 2025 0

Row‑Level Filters & Column Masks in Unity Catalog

What you’ll learn Prereqs: Unity Catalog enabled; DBR 12.2 LTS+ for reads (see notes for dedicated compute), and a SQL warehouse or UC‑enabled cluster. Concepts in 30 seconds When to…

READ MORE +

Mohammad Gufran Jahangir August 17, 2025 0

Functions in Unity Catalog with Databricks SQL (Scalar & Table UDFs)

What you’ll learn Prereqs: Unity Catalog enabled, a catalog/schema you can write to, and a SQL Warehouse or notebook attached to a UC‑enabled cluster. Quick definitions Create your first scalar…

READ MORE +

Mohammad Gufran Jahangir August 17, 2025 0

Databricks Secret Management & Secret Scopes — the Practical Guide

Secrets (passwords, keys, tokens) should never live in notebooks or job configs. Databricks gives you a built-in, governed place to put them—secret scopes—and safe ways to read them at runtime.…

READ MORE +