Databricks Archives - Page 3 of 10

Mohammad Gufran Jahangir August 7, 2025 0

Salting, Repartitioning, and Broadcast joins in Spark Databrick

Here’s a clear and structured explanation of salting, repartitioning, and broadcast joins in Spark — including how they work and when to use them — with simple examples. 🔹 1.…

READ MORE +

Mohammad Gufran Jahangir August 7, 2025 0

What is spark.sql.shuffle.partitions?

🔍 What is spark.sql.shuffle.partitions? spark.sql.shuffle.partitions is a Spark SQL configuration parameter that controls the number of output partitions created during shuffling operations, such as: 🧠 Why is it important? Shuffling…

READ MORE +

Mohammad Gufran Jahangir August 7, 2025 0

What is Dynamic Allocation in Spark (Databricks)

🔄 What is Dynamic Allocation in Spark (Databricks)? Dynamic Allocation is a feature that automatically adjusts the number of executors (worker nodes) based on your job’s needs. Instead of using…

READ MORE +

Mohammad Gufran Jahangir August 7, 2025 0

What is Garbage Collection in Databricks

GC stands for Garbage Collection — it’s a process in the Java Virtual Machine (JVM) (which Apache Spark runs on) that automatically frees up memory by removing data (objects) that…

READ MORE +

Mohammad Gufran Jahangir August 5, 2025 0

How to reduce the number of shuffle partitions in Databrick

in Databricks to reduce the number of shuffle partitions during wide transformations (like groupBy, join, distinct, repartition) so the driver and executors don’t get overwhelmed with too many small shuffle…

READ MORE +

Mohammad Gufran Jahangir August 5, 2025 0

Fixing Databricks Error: “Driver is up but is not responsive, likely due to GC

Fixing Databricks Error: “Driver is up but is not responsive, likely due to GC” When running a notebook or scheduled job in Databricks, you might encounter an error like this…

READ MORE +

Mohammad Gufran Jahangir August 4, 2025 0

50 interview questions and answers for an Azure Databricks Platform Engineer

Here are 50 interview questions and answers for an Azure Databricks Platform Engineer, divided by skill level and covering key areas (core, advanced, and scenario-based): 🔹 Essential Level (Core –…

READ MORE +

Mohammad Gufran Jahangir August 3, 2025 0

Databricks Unity Catalog: Volume – Full Explanation

📦 Databricks Unity Catalog: Volume – Full Explanation 🔹 What is a Volume? A Volume in Databricks Unity Catalog is a secure, governed folder used to store non-tabular data like:…

READ MORE +

Mohammad Gufran Jahangir July 18, 2025 0

Create a Job Compute Cluster in Databricks

Creating a Job Compute (also called a Job Cluster) in Databricks allows you to define a dedicated compute environment that is spun up only when your job runs — and…

READ MORE +

Mohammad Gufran Jahangir July 18, 2025 0

Understanding Databricks Compute Options: Serverless, Pro, Classic & SQL Warehouses

💡 Understanding Databricks Compute Options: Serverless, Pro, Classic & SQL Warehouses As more teams migrate data workloads to Databricks Unity Catalog, one question frequently arises: “What’s the difference between Serverless,…

READ MORE +

Mohammad Gufran Jahangir July 18, 2025 0

What Is SCIM? A Beginner’s Guide to User & Group Syncing in the Cloud

🔐 What Is SCIM? A Beginner’s Guide to User & Group Syncing in the Cloud In today’s cloud-first world, managing user access across tools like Databricks, Azure, Slack, and Zoom…

READ MORE +

Mohammad Gufran Jahangir July 17, 2025 0

How to Fetch User and Group Assignments Across Unity Catalog Workspaces in Databricks

🔐 How to Fetch User and Group Assignments Across Unity Catalog Workspaces in Databricks 📌 Introduction As organizations move toward centralized data governance with Databricks Unity Catalog (UC), understanding user…

READ MORE +

Mohammad Gufran Jahangir July 17, 2025 0

What Is a Service Principal in Databricks?

✅ What Is a Service Principal in Databricks? A service principal in Databricks represents a non-human identity — like an application, automation tool, or CI/CD pipeline — used to securely…

READ MORE +

Mohammad Gufran Jahangir July 16, 2025 0

How to Train and Track ML Models with MLflow in Databricks (Beginner to Advanced Guide)

How to Train and Track ML Models with MLflow in Databricks (Beginner to Advanced Guide) MLflow is the open-source standard for managing the end-to-end machine learning lifecycle, and Databricks integrates…

READ MORE +

Mohammad Gufran Jahangir July 13, 2025 0

Auditing User Access in Databricks with System Tables: From Basics to Advanced

🔍 Auditing User Access in Databricks with System Tables: From Basics to Advanced Managing and auditing user access is a critical part of maintaining a secure and compliant data platform.…

READ MORE +

Mohammad Gufran Jahangir July 13, 2025 0

How to Manage Data Governance with Unity Catalog and Privilege Models

How to Manage Data Governance with Unity Catalog and Privilege Models As organizations scale their data platforms, managing data access, security, and compliance becomes increasingly complex. Databricks’ Unity Catalog offers…

READ MORE +

Mohammad Gufran Jahangir July 7, 2025 0

Getting Started with Unity Catalog: A Complete Guide

Getting Started with Unity Catalog: A Complete Guide Databricks Unity Catalog is a unified governance solution for all data and AI assets in the Lakehouse. Whether you’re an administrator, data…

READ MORE +

Mohammad Gufran Jahangir July 7, 2025 0

Optimize, ZORDER, and Vacuum in Databricks: What You Must Know

Optimize, ZORDER, and Vacuum in Databricks: What You Must Know In the world of big data, performance is everything. Databricks, with its powerful Delta Lake engine, offers three key features—Optimize,…

READ MORE +

Mohammad Gufran Jahangir July 3, 2025 0

Track and Optimize Your Databricks Costs with system.billing Tables in Unity Catalog

💰 Track and Optimize Your Databricks Costs with system.billing Tables in Unity Catalog Databricks has become a core component of many modern data stacks — but with flexibility and scale…

READ MORE +

Mohammad Gufran Jahangir July 3, 2025 0

Understanding system.access Tables in Unity Catalog (Azure Databricks)

🔍 Understanding system.access Tables in Unity Catalog (Azure Databricks) Databricks’ Unity Catalog provides a powerful centralized data governance solution. One of its key features is the system.access schema — a…

READ MORE +