Introduction
NETWORK001 – Connection timeout in Databricks is a network-related error that occurs when Databricks cannot establish a connection to an external service, such as cloud storage (S3, ADLS, GCS), external databases, APIs, or metastore services. This error can interrupt data ingestion, job execution, and storage access, impacting workflows.
🚨 Common scenarios where NETWORK001 occurs:
- Accessing external storage like S3, Azure Data Lake, or Google Cloud Storage.
- Connecting to external databases (MySQL, PostgreSQL, etc.).
- Accessing third-party APIs from a Databricks notebook.
- Timeouts during Unity Catalog operations or metastore queries.
Common Causes and Fixes for NETWORK001 – Connection Timeout
1. Network Misconfiguration
Symptoms:
- Timeouts when accessing S3, ADLS, or GCS.
- Cannot connect to external databases or APIs.
- Error occurs immediately or after a long wait.
Causes:
- VPC/VNet misconfiguration blocking outbound traffic.
- DNS resolution failures for external endpoints.
- Firewall rules blocking traffic to external services.
Fix:
✅ Verify network connectivity using Databricks CLI:
curl -I https://s3.amazonaws.com # For AWS S3
curl -I https://<your-database-endpoint> # For external databases
✅ Ensure VPC/VNet and firewall rules allow traffic:
- AWS: Configure VPC endpoints or enable public access.
- Azure: Use Azure Private Link for secure connectivity.
- GCP: Ensure firewall rules allow traffic on the required ports.
✅ Check and correct DNS resolution settings:
- Ensure external endpoints (e.g., s3.amazonaws.com) are resolvable.
2. Cloud Storage Connectivity Issues
Symptoms:
- Cannot read or write to cloud storage (S3, ADLS, GCS).
- Jobs fail with connection timeout errors.
Causes:
- Network restrictions or firewall rules blocking cloud storage access.
- Incorrect IAM roles or credentials for accessing storage.
Fix:
✅ Verify IAM roles and permissions for cloud storage:
- AWS S3: Ensure your role has
s3:GetObject
,s3:PutObject
, ands3:ListBucket
permissions. - Azure ADLS: Ensure your service principal has
Storage Blob Data Contributor
access. - GCS: Verify that your service account has
Storage Admin
permissions.
✅ Test cloud storage connectivity:
dbutils.fs.ls("s3://mybucket/")
dbutils.fs.ls("abfss://my-container@myaccount.dfs.core.windows.net/")
✅ For secure connections, use AWS PrivateLink or Azure Private Endpoint.
3. Database Connection Timeout
Symptoms:
- Databricks cannot connect to external databases (MySQL, PostgreSQL, SQL Server).
- Queries fail after a timeout period.
Causes:
- Database server firewall blocks Databricks access.
- Incorrect JDBC connection string or credentials.
- Network latency or server unavailability.
Fix:
✅ Check database connectivity using JDBC:
jdbc_url = "jdbc:mysql://<hostname>:3306/<database>"
properties = {"user": "myuser", "password": "mypassword"}
df = spark.read.jdbc(jdbc_url, "my_table", properties=properties)
df.show()
✅ Verify firewall settings to allow traffic from Databricks IP ranges.
✅ Use Private Link or VNet service endpoints for secure database access.
4. Unity Catalog or Metastore Connectivity Issues
Symptoms:
- Timeouts when querying Unity Catalog or external metastores (Hive, AWS Glue).
- Slow responses or intermittent failures.
Causes:
- Network issues between Databricks and the metastore service.
- Misconfigured metastore endpoints.
Fix:
✅ Test metastore connectivity:
SHOW DATABASES IN catalog_name;
✅ Ensure Unity Catalog is properly configured and the metastore is reachable.
- AWS Glue: Check VPC connectivity to AWS Glue endpoints.
- Azure: Use Private Link for Azure SQL or Key Vault-backed metastore.
5. Third-Party API Connection Timeout
Symptoms:
- Timeouts when accessing external APIs from Databricks notebooks.
- HTTP 504 Gateway Timeout errors.
Causes:
- API rate limits or server unavailability.
- Network restrictions in Databricks workspace.
Fix:
✅ Implement retry logic with exponential backoff for API calls:
import requests
import time
url = "https://api.example.com/data"
for i in range(5):
response = requests.get(url)
if response.status_code == 200:
break
time.sleep(2 ** i)
✅ Check API rate limits and quotas.
✅ Verify network access to external APIs.
Step-by-Step Troubleshooting Guide
Step 1: Verify Network Connectivity
curl -I https://s3.amazonaws.com
curl -I https://<your-database-endpoint>
Step 2: Test Cloud Storage and Database Connections
dbutils.fs.ls("s3://mybucket/")
jdbc_url = "jdbc:mysql://<hostname>:3306/<database>"
Step 3: Check IAM Roles and Firewall Rules
- Ensure IAM roles have appropriate permissions.
- Configure firewall rules to allow traffic from Databricks.
Step 4: Enable Private Connectivity
- Use AWS PrivateLink, Azure Private Link, or GCP VPC Peering for secure access.
Best Practices to Prevent NETWORK001 Errors
✅ Ensure Proper Network Configuration
- Configure VPC endpoints and firewall rules.
✅ Use Private Connectivity for Secure Access
- Use AWS PrivateLink or Azure Private Endpoint to avoid public internet traffic.
✅ Monitor Cloud and Database Connectivity
- Use Databricks logs and cloud monitoring tools to track network issues.
✅ Implement Retry Logic for External Connections
- Prevent intermittent failures by using retries with exponential backoff.
Conclusion
NETWORK001 – Connection timeout in Databricks can occur due to network misconfigurations, storage access issues, or external API failures. By verifying network connectivity, checking IAM roles, and using private endpoints, you can prevent and resolve connection timeouts. Implementing retry logic and monitoring network performance will ensure reliable connectivity and seamless job execution.