Job Failures with External Libraries in Databricks: Causes and Solutions

Mohammad Gufran Jahangir January 30, 2025 1

Table of Contents

Introduction

Databricks allows users to install external libraries (JARs, Python wheels, PyPI packages) to extend functionality in notebooks and jobs. However, job failures due to library issues are common and can be caused by dependency conflicts, network connectivity issues, incorrect library versions, or missing permissions.

In this guide, we’ll explore the common causes of external library-related job failures, how to diagnose these issues, and best practices to ensure smooth execution.

How External Libraries Work in Databricks

Databricks supports various types of external libraries:

PyPI Packages (e.g., pandas, numpy, scikit-learn)
Maven or JAR Dependencies (e.g., spark-avro, delta-core_2.12)
Custom Python Wheels (.whl files)
Custom JAR Files uploaded to DBFS or cloud storage

💡 Libraries can be installed via:

Databricks UI (Cluster Libraries)
Notebook %pip install or %maven install
Databricks Jobs API (install_library method)

🚨 Common issues include version mismatches, missing dependencies, and incompatible environments.

Common Causes of External Library Job Failures

1. Version Conflicts Between Installed Libraries

Symptoms:

Error: “ModuleNotFoundError: No module named ‘package_name'”
Error: “ImportError: cannot import name ‘XYZ’ from ‘package_name'”
Unexpected behavior due to different versions of installed libraries

Causes:

Conflicting package versions installed at cluster-level vs. notebook-level
Incompatible versions between Databricks Runtime and library requirements
Automatic package resolution installing unexpected versions

Fix:
✅ Use Notebook-Scoped Libraries (%pip install instead of UI installation)

%pip install pandas==1.3.3

✅ Manually resolve dependency conflicts using pip check:

!pip check

✅ Use conda-environment.yaml for consistent dependencies across environments

name: myenv
dependencies:
  - python=3.9
  - pandas=1.3.3
  - numpy=1.21

2. Network Connectivity Issues Preventing Library Installation

Symptoms:

Error: “Connection timed out while installing library”
Error: “Could not find a version that satisfies the requirement”
Jobs fail intermittently due to package retrieval failures

Causes:

No internet access on Databricks clusters (firewalled environment)
Private PyPI or Maven repositories not accessible
Cloud VPC/VNet blocking outbound traffic to package repositories

Fix:
✅ Enable cloud networking to allow package downloads (AWS VPC, Azure Private Link)
✅ Use a private PyPI repository instead of internet-based sources

%pip config set global.index-url https://pypi.yourcompany.com/simple/

✅ Preinstall libraries in a custom Databricks ML Runtime Image

3. Missing Required Libraries or Incorrect Import Paths

Symptoms:

Error: “ModuleNotFoundError: No module named ‘xyz’”
Libraries work in interactive notebooks but fail in scheduled jobs

Causes:

Library is installed only in notebook scope (%pip install) and not available in the job
Incorrect Python environment paths in scheduled jobs
Missing dependencies not automatically installed

Fix:
✅ Ensure libraries are installed at the correct scope:

Use Cluster Libraries for persistent installation
Use %pip install inside the job script for runtime installs

📌 Example: Installing in a Job Notebook Before Execution

%pip install requests
import requests

✅ For JAR-based dependencies, install via Maven:

%scala
import org.apache.spark.sql.functions._

4. Job Failures Due to Incompatible Java JARs or Missing Dependencies

Symptoms:

Error: “ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe”
Error: “java.lang.NoClassDefFoundError: org.apache.spark.sql.delta.DeltaLog”
Spark jobs fail when interacting with Delta Lake, Hadoop, or JDBC drivers

Causes:

Incorrect JAR versions conflicting with Spark runtime
Missing JAR files in classpath
Conflicting versions of delta-core or hadoop-common JARs

Fix:
✅ Install the correct JAR versions via Maven:

%scala
%library add mvn:io.delta:delta-core_2.12:1.2.0

✅ Ensure JARs are installed at the cluster level for shared execution
✅ For custom JARs, upload to DBFS and reference correctly:

dbutils.fs.cp("dbfs:/FileStore/libs/mycustom.jar", "file:/databricks/jars/")

5. PyPI Package Installations Failing Due to Missing System Dependencies

Symptoms:

Error: “OSError: libGL.so.1: cannot open shared object file”
Error: “Failed building wheel for XYZ”
Job fails but works fine in local Python execution

Causes:

Some Python packages require underlying OS dependencies (C++, OpenCV, TensorFlow)
Databricks does not allow direct system package installations

Fix:
✅ Use Databricks ML Runtime if using deep learning libraries
✅ For complex dependencies, install via conda instead of pip

%sh
conda install -c conda-forge opencv

Step-by-Step Troubleshooting Guide

1. Check Installed Libraries

%pip list

2. Verify If Any Dependency Conflicts Exist

!pip check

3. Check Cluster Logs for Library Installation Errors

Go to Databricks UI → Clusters → Libraries → Event Log

4. Debug Library Paths for JAR Issues

import sys
print(sys.path)

5. Test Library Installation in an Interactive Notebook

import pandas as pd
print(pd.__version__)

Best Practices to Prevent Library-Related Job Failures

✅ Use Cluster Libraries Instead of Notebook `%pip install` for Jobs

Notebook-scoped %pip install does not persist across job runs.
Install at cluster-level for jobs to ensure consistent availability.

✅ Use Requirements.txt or Conda for Dependency Management

name: myenv
dependencies:
  - python=3.9
  - pandas=1.3.3
  - numpy=1.21

✅ Use Databricks ML Runtime for Machine Learning Dependencies

Avoid installing large ML libraries manually (tensorflow, pytorch).
Use Databricks ML Runtimes that come pre-installed with ML packages.

✅ Monitor Job and Library Logs

Set up Databricks Alerts for failed installations.
Monitor DBFS logs for missing dependency issues.

Real-World Example: Fixing a Job Failure Due to Pandas Version Conflict

Scenario:

A Databricks job running an ETL pipeline failed with a pandas version mismatch.

Root Cause:

The Databricks runtime used pandas 1.2, but the job required pandas 1.3.
A mix of cluster-scoped and notebook-scoped installations caused conflicts.

Solution:

Uninstalled existing versions and reinstalled the correct one:

%pip uninstall pandas -y
%pip install pandas==1.3.3

Updated job environment to use the correct version.

✅ Impact:

The ETL job ran successfully with consistent library versions.

Conclusion

Library-related job failures in Databricks often stem from dependency conflicts, network issues, missing system packages, or incompatible JARs. By ensuring proper package management, leveraging Databricks ML Runtimes, and using cluster-level installations, teams can prevent failures and maintain stable job execution.

Mohammad Gufran Jahangir

Tags: Databricks, Job Failures with External Libraries

Category:

Job Failures with External Libraries in Databricks: Causes and Solutions

Introduction

How External Libraries Work in Databricks

Common Causes of External Library Job Failures

1. Version Conflicts Between Installed Libraries

2. Network Connectivity Issues Preventing Library Installation

3. Missing Required Libraries or Incorrect Import Paths

4. Job Failures Due to Incompatible Java JARs or Missing Dependencies

5. PyPI Package Installations Failing Due to Missing System Dependencies

Step-by-Step Troubleshooting Guide

1. Check Installed Libraries

2. Verify If Any Dependency Conflicts Exist

3. Check Cluster Logs for Library Installation Errors

4. Debug Library Paths for JAR Issues

5. Test Library Installation in an Interactive Notebook

Best Practices to Prevent Library-Related Job Failures

✅ Use Cluster Libraries Instead of Notebook %pip install for Jobs

✅ Use Requirements.txt or Conda for Dependency Management

✅ Use Databricks ML Runtime for Machine Learning Dependencies

✅ Monitor Job and Library Logs

Real-World Example: Fixing a Job Failure Due to Pandas Version Conflict

Scenario:

Root Cause:

Solution:

Conclusion

✅ Use Cluster Libraries Instead of Notebook `%pip install` for Jobs