What is Shuffle Read & Shuffle Write?

Mohammad Gufran Jahangir August 7, 2025 0

Table of Contents

🔄 What is Shuffle Read & Shuffle Write?

▶️ Shuffle is Spark’s mechanism to redistribute data across partitions, typically during wide transformations like:

groupBy()
join()
distinct()
reduceByKey()

🔵 Shuffle Read:

Amount of data an executor read from other executors’ outputs across the network.
Indicates how much inter-node data transfer was needed.
High values suggest expensive operations like large joins or groupBy.

🔴 Shuffle Write:

Amount of data an executor wrote out for other executors to read during a shuffle.
Happens when Spark has to rearrange data for joins, aggregations, etc.

✅ When is it GOOD or BAD?

Metric	Good Sign	Bad Sign / When to Investigate
Shuffle Read	Low and evenly distributed across executors	One executor has most of the read (data skew)
Shuffle Write	Low and balanced	High & unbalanced → possible data skew or large joins
GC Time	Low % (GC < 10–15% of task time)	GC > 20% of task time → consider memory tuning
Total Tasks	Evenly distributed	One executor does a lot more → load imbalance

🧠 How to Investigate Shuffle Problems

Go to Spark UI → Stages:
- Look for stages with high “Shuffle Read Size” or long durations.
- Hover over task distribution to check skewed partitions.
Use .explain() or Spark UI SQL / DAG tab:
- Identify if a join, groupBy, or similar triggered the shuffle.
Apply fixes like:
- broadcast() small tables
- repartition() or salting for skewed keys

📄 What is `stdout` and `stderr`?

Log Type	Description
stdout	Standard output: All `print()` or log statements written in the notebook or your code (normal logs).
stderr	Standard error: Logs for warnings, stack traces, and errors (e.g., Python exceptions, Spark warnings).

Click these links to view the executor logs for debugging failed or slow tasks.

🔎 From Your Screenshot

Executor	Input	Shuffle Read	Shuffle Write	GC Time
0	80.7 GiB	48.4 GiB	46.5 GiB	17 min
1	71.7 GiB	43.7 GiB	40.2 GiB	21 min
2	84.3 GiB	56.4 GiB	51.8 GiB	17 min

✔️ Observation:

Shuffle is relatively evenly distributed → good sign (no obvious skew).
GC Time is within limits (~2–3% of task time) → healthy memory use.
Total Tasks also fairly balanced.

📌 Summary

Term	Meaning	Healthy Sign
Shuffle Read	Data read from other executors	Balanced and minimal
Shuffle Write	Data written for shuffle	Balanced and not excessive
stdout	Debug / print logs	Used for progress/debug info
stderr	Errors and warnings	Should be reviewed if job fails

Here is a cheat sheet in a normal table format to help you understand and monitor Spark Executor metrics in Databricks:

Metric	Description	What to Check / Action
Executor ID	Unique identifier for each executor (driver is separate)	Identify which executor is the driver vs. workers
Address	IP and port of the executor	Use for identifying node location or debugging IP-specific issues
Status	Executor state (Active/Dead)	Investigate dead executors—possible memory or disk issues
RDD Blocks	Number of RDD blocks cached	High number = memory pressure, consider checkpointing or persisting with storage level
Storage Memory	Memory used vs. allocated	If usage is close to max, consider increasing executor memory
Disk Used	Temporary disk storage used	Investigate high usage, especially with spills or shuffles
Cores	Number of cores allocated to executor	Too low = less parallelism; adjust based on workload
Active Tasks	Tasks currently running on executor	Uneven distribution = possible skew
Failed Tasks	Count of failed tasks	High failure = investigate logs, GC, or data issues
Completed Tasks	Number of tasks successfully completed	Use for performance trend analysis
Task Time (GC Time)	Time spent on tasks, with GC (Garbage Collection) duration	High GC = memory pressure; consider tuning memory or caching strategy
Input / Output	Input size, shuffle read/write, output	Imbalance may indicate skew or inefficient transformations
Shuffle Read/Write	Data read/written across nodes during shuffle	High = expensive joins/repartitioning; consider broadcast join or reduce shuffle partitions
Logs (stdout/stderr)	Standard output and error logs per executor	Use to debug stack traces, memory errors, etc.
Thread Dump	Capture of current threads running	Use to diagnose hanging tasks, driver not responding, etc.

Mohammad Gufran Jahangir

Tags: Databricks

Category:

Databricks