π What is spark.sql.shuffle.partitions?
spark.sql.shuffle.partitions is a Spark SQL configuration parameter that controls the number of output partitions created during shuffling operations, such as:
- Joins
- Aggregations (e.g.,
groupBy) - Window functions
distinct- Sorting
π§ Why is it important?
Shuffling is one of the most expensive operations in Spark, involving:
- Reading data across partitions
- Writing to disk
- Sending over the network
- Repartitioning data for processing
So, the number of shuffle partitions has a direct impact on:
| Aspect | Impact of Shuffle Partitions |
|---|---|
| Performance | Too many β task overhead; Too few β large partitions (slow) |
| Memory usage | Too many β memory pressure due to task proliferation |
| Parallelism | More partitions β better parallelism if cluster has enough cores |
| Disk + GC pressure | High number β more shuffle files, possible GC pressure |
| Resource utilization | Must match your executor/core settings for efficiency |
βοΈ How does it work?
Letβs say you do a groupBy or a join:
df.groupBy("country").count()
This triggers a shuffle, and Spark will use the number set in spark.sql.shuffle.partitions to divide the intermediate output.
- Default is 200 (in most Spark/Databricks setups)
- Each output partition becomes a task in the next stage
Example:
If set to 200:
- Spark will create 200 output partitions from the shuffle stage
- Next stage will process these 200 tasks
- If cluster has 16 cores, 16 tasks run in parallel
π οΈ When and How to Tune It
| Scenario | Recommendation |
|---|---|
| Small Dataset (few MB/GB) | π½ Reduce partitions (e.g., spark.conf.set("spark.sql.shuffle.partitions", 10)) to avoid too many empty/small tasks |
| Large Dataset (100+ GB or TB scale) | πΌ Increase partitions (e.g., 500β1000) to ensure no partition is too large |
| Job has skewed performance | β
Tune this alongside salting, repartition, or broadcast join strategies |
| You see long-running tasks (especially post-shuffle) | β³ Could be too few partitions β increase |
| You see high scheduling delay or too many tasks pending | π Could be too many partitions β decrease or match with cores |
π‘ Rule of Thumb
- Shuffle partitions β 2x to 4x the number of executor cores
- For 20 executor cores β set it to 40β80
- Donβt blindly increase to 1000+ unless youβre handling very large data
β How to Check and Set It
# Check current value
spark.conf.get("spark.sql.shuffle.partitions")
# Set a new value (effective for current Spark session)
spark.conf.set("spark.sql.shuffle.partitions", 100)
To set permanently for a cluster, define in:
- Cluster Spark Config
- Init script
- Pipeline settings
π Example Before/After Optimization
| Stage | Before Tuning | After Tuning |
|---|---|---|
| Shuffle partitions | 200 | 50 |
| Execution time | 10 min | 4 min |
| GC time | 2 min | 0.5 min |
| Failed tasks | 6 | 0 |
π¦ Related Parameters
spark.default.parallelism: Used for RDD operationsspark.sql.files.maxPartitionBytes: Controls partition size before shufflingspark.dynamicAllocation.*: Can help tune executor count
spark.sql.shuffle.partitions can be defined at multiple levels
πΉ 1. Notebook Level (Session Scope) β
You can define it within a notebook using the Spark configuration API. This change will apply only to the current Spark session (i.e., until the cluster restarts or notebook ends).
β Example:
spark.conf.set("spark.sql.shuffle.partitions", 50)
π To verify:
spark.conf.get("spark.sql.shuffle.partitions")
πΉ 2. Job/Script Level (e.g., PySpark or Scala jobs)
If you’re submitting a job via a pipeline or job scheduler, you can set this in your job script before any operation that triggers a shuffle.
Python:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("MyApp") \
.config("spark.sql.shuffle.partitions", "100") \
.getOrCreate()
πΉ 3. Cluster Level (Default for All Jobs/Notebooks on the Cluster)
Set this in the cluster configuration to make it the default for all notebooks and jobs running on the cluster.
On Databricks:
- Go to Compute β choose your cluster β Configuration
- In Spark Config, add:
spark.sql.shuffle.partitions=200 - Restart the cluster
πΉ 4. Job Cluster / Job Settings (Databricks Jobs)
If you are scheduling a job (via UI or REST API), you can include this config under the spark_conf block in the job definition.
{
"spark_conf": {
"spark.sql.shuffle.partitions": "300"
}
}
πΉ 5. Init Script Level (Advanced)
For production-level tuning, you can use init scripts to set this during cluster startup. This ensures every new cluster created from a pool or job template has the setting.
π Summary Table
| Scope | Where to Set | Applies To |
|---|---|---|
| Notebook | spark.conf.set(...) | That notebook only (session) |
| Script/Code | .config(...) in SparkSession builder | That specific job or script |
| Cluster | Spark Config section in Databricks UI | All jobs & notebooks on that cluster |
| Job Definition | spark_conf in job config | That specific scheduled job |
| Init Script | Startup script for cluster | All clusters using that init script |