Introduction
If AWS S3 storage location cannot be accessed from your Databricks workspace, jobs may fail with errors such as:
🚨 Common Errors:
- “Access Denied (403 Forbidden)”
- “NoSuchBucket: The specified bucket does not exist”
- “S3EndpointConnectionError: Unable to reach S3”
- “Permission denied: Cannot read/write to S3”
This issue can occur due to misconfigured IAM roles, incorrect bucket policies, network restrictions, or incorrect access keys.
This guide provides step-by-step troubleshooting and fixes to resolve S3 access issues in Databricks.
1. Verify IAM Role Permissions
Symptoms:
- 403 Forbidden – Access Denied when reading/writing to S3.
- S3 access works in some notebooks but fails in others.
- Job fails when running on a cluster but succeeds manually.
Causes:
- The IAM role assigned to Databricks is missing required S3 permissions.
- The IAM role is not attached to the cluster.
- The S3 bucket has a restrictive bucket policy.
Fix:
✅ Check if your IAM role has the correct permissions:
- The IAM role should have at least the following permissions:
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
✅ Attach the IAM role to the Databricks cluster:
- Go to Databricks UI → Clusters
- Edit your cluster → AWS IAM Role → Attach your IAM role
- Restart the cluster
✅ Test IAM permissions manually using AWS CLI:
aws s3 ls s3://your-bucket-name --profile your-profile
If you get “Access Denied”, update the IAM policy with the correct permissions.
2. Verify S3 Bucket Policy Settings
Symptoms:
- IAM role permissions are correct, but access to S3 still fails.
- Only certain users can access the bucket, but others cannot.
Causes:
- The S3 bucket policy is blocking Databricks IAM role access.
- The bucket restricts access to specific VPCs or accounts.
- Public access block settings prevent access.
Fix:
✅ Check your S3 bucket policy in the AWS Console:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_DATABRICKS_ROLE"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}
✅ Ensure that the bucket policy allows your Databricks role.
✅ Check for explicit “Deny” statements in the policy.
✅ Ensure that public access restrictions are not blocking access:
aws s3api get-public-access-block --bucket your-bucket-name
✅ If needed, disable strict public access settings:
aws s3api put-public-access-block --bucket your-bucket-name --public-access-block-configuration \
"BlockPublicAcls=false,BlockPublicPolicy=false,IgnorePublicAcls=false,RestrictPublicBuckets=false"
3. Verify Databricks Cluster Configuration
Symptoms:
- S3 access works on some clusters but fails on others.
- A newly created cluster cannot access S3, but old clusters can.
- Job failures when switching between jobs and interactive notebooks.
Causes:
- The IAM role is not properly assigned to all clusters.
- Job clusters do not inherit the IAM role from interactive clusters.
Fix:
✅ Attach IAM roles to all clusters in Databricks:
- Go to Databricks UI → Clusters → Edit Cluster
- Attach the IAM role under Advanced Options
- Restart the cluster
✅ For job clusters, specify IAM roles explicitly in the job definition:
{
"new_cluster": {
"aws_attributes": {
"instance_profile_arn": "arn:aws:iam::YOUR_ACCOUNT_ID:instance-profile/YOUR_DATABRICKS_ROLE"
}
}
}
✅ Verify the cluster can access S3 using Python:
dbutils.fs.ls("s3://your-bucket-name/")
If access is denied, review the IAM role, bucket policy, and cluster settings.
4. Check AWS VPC and PrivateLink Settings
Symptoms:
- S3 access works from a local machine but not from Databricks.
- Access is blocked when using AWS PrivateLink.
- DNS resolution fails for S3 endpoints.
Causes:
- VPC settings block outbound access to S3.
- AWS PrivateLink does not allow public S3 access.
- S3 endpoint DNS is misconfigured in VPC settings.
Fix:
✅ Check if a VPC Endpoint is blocking S3 access:
aws ec2 describe-vpc-endpoints --query "VpcEndpoints[*].ServiceName"
✅ Ensure S3 VPC endpoint is configured for private access:
aws ec2 create-vpc-endpoint --vpc-id <your-vpc-id> --service-name com.amazonaws.<region>.s3
✅ If using AWS PrivateLink, allow direct access to S3:
aws s3 cp test-file.txt s3://your-bucket-name/
✅ Check if Databricks has access to the internet via NAT Gateway.
5. Verify Databricks Credential Passthrough (If Using User Access Mode)
Symptoms:
- S3 access works for some users but not others.
- Notebook users see “Access Denied” when accessing S3.
Causes:
- Databricks Credential Passthrough is not enabled for the workspace.
- The user’s IAM role does not have access to S3.
Fix:
✅ Enable Credential Passthrough in Databricks UI:
- Go to Databricks Admin Console → Advanced Settings
- Enable Credential Passthrough
- Restart clusters
✅ Assign IAM role permissions for each user:
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": "arn:aws:s3:::your-bucket-name"
}
✅ Verify the user can list the S3 bucket using AWS CLI:
aws s3 ls s3://your-bucket-name/
Step-by-Step Troubleshooting Guide
1. Verify IAM Role Permissions
aws iam get-role --role-name YOUR_DATABRICKS_ROLE
2. Check S3 Bucket Policy
aws s3api get-bucket-policy --bucket your-bucket-name
3. Test Cluster Access to S3
dbutils.fs.ls("s3://your-bucket-name/")
4. Check AWS VPC Endpoint Settings
aws ec2 describe-vpc-endpoints --filters "Name=service-name,Values=com.amazonaws.us-east-1.s3"
5. Enable Debug Logging for S3 Access Issues
export AWS_DEBUG=1
Best Practices to Avoid S3 Access Issues in Databricks
✅ Use IAM Roles Instead of Access Keys
- Avoid storing AWS Access Keys in notebooks.
✅ Grant Minimum Required Permissions
- Use fine-grained IAM policies to control S3 access.
✅ Use Databricks Cluster Policies for Consistency
- Attach IAM roles to all clusters that need S3 access.
✅ Monitor Access Logs in AWS CloudTrail
- Check CloudTrail logs for failed S3 requests.
Conclusion
If AWS S3 storage cannot be accessed from Databricks, check:
✅ IAM role permissions for S3 access.
✅ S3 bucket policy to allow Databricks access.
✅ Cluster settings to attach the correct IAM role.
✅ VPC and network settings to ensure connectivity.
✅ Databricks Credential Passthrough settings (if using User Access Mode).
By following this guide, you can troubleshoot and resolve S3 access issues in Databricks efficiently.