,

AWS S3 Storage Location Cannot Be Accessed in Databricks

Posted by

Introduction

If AWS S3 storage location cannot be accessed from your Databricks workspace, jobs may fail with errors such as:

🚨 Common Errors:

  • “Access Denied (403 Forbidden)”
  • “NoSuchBucket: The specified bucket does not exist”
  • “S3EndpointConnectionError: Unable to reach S3”
  • “Permission denied: Cannot read/write to S3”

This issue can occur due to misconfigured IAM roles, incorrect bucket policies, network restrictions, or incorrect access keys.

This guide provides step-by-step troubleshooting and fixes to resolve S3 access issues in Databricks.


1. Verify IAM Role Permissions

Symptoms:

  • 403 Forbidden – Access Denied when reading/writing to S3.
  • S3 access works in some notebooks but fails in others.
  • Job fails when running on a cluster but succeeds manually.

Causes:

  • The IAM role assigned to Databricks is missing required S3 permissions.
  • The IAM role is not attached to the cluster.
  • The S3 bucket has a restrictive bucket policy.

Fix:

Check if your IAM role has the correct permissions:

  • The IAM role should have at least the following permissions:
{
  "Effect": "Allow",
  "Action": [
    "s3:ListBucket",
    "s3:GetObject",
    "s3:PutObject",
    "s3:DeleteObject"
  ],
  "Resource": [
    "arn:aws:s3:::your-bucket-name",
    "arn:aws:s3:::your-bucket-name/*"
  ]
}

Attach the IAM role to the Databricks cluster:

  1. Go to Databricks UI → Clusters
  2. Edit your cluster → AWS IAM Role → Attach your IAM role
  3. Restart the cluster

Test IAM permissions manually using AWS CLI:

aws s3 ls s3://your-bucket-name --profile your-profile

If you get “Access Denied”, update the IAM policy with the correct permissions.


2. Verify S3 Bucket Policy Settings

Symptoms:

  • IAM role permissions are correct, but access to S3 still fails.
  • Only certain users can access the bucket, but others cannot.

Causes:

  • The S3 bucket policy is blocking Databricks IAM role access.
  • The bucket restricts access to specific VPCs or accounts.
  • Public access block settings prevent access.

Fix:

Check your S3 bucket policy in the AWS Console:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_DATABRICKS_ROLE"
      },
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

Ensure that the bucket policy allows your Databricks role.
Check for explicit “Deny” statements in the policy.
Ensure that public access restrictions are not blocking access:

aws s3api get-public-access-block --bucket your-bucket-name

If needed, disable strict public access settings:

aws s3api put-public-access-block --bucket your-bucket-name --public-access-block-configuration \
  "BlockPublicAcls=false,BlockPublicPolicy=false,IgnorePublicAcls=false,RestrictPublicBuckets=false"

3. Verify Databricks Cluster Configuration

Symptoms:

  • S3 access works on some clusters but fails on others.
  • A newly created cluster cannot access S3, but old clusters can.
  • Job failures when switching between jobs and interactive notebooks.

Causes:

  • The IAM role is not properly assigned to all clusters.
  • Job clusters do not inherit the IAM role from interactive clusters.

Fix:

Attach IAM roles to all clusters in Databricks:

  1. Go to Databricks UI → Clusters → Edit Cluster
  2. Attach the IAM role under Advanced Options
  3. Restart the cluster

For job clusters, specify IAM roles explicitly in the job definition:

{
  "new_cluster": {
    "aws_attributes": {
      "instance_profile_arn": "arn:aws:iam::YOUR_ACCOUNT_ID:instance-profile/YOUR_DATABRICKS_ROLE"
    }
  }
}

Verify the cluster can access S3 using Python:

dbutils.fs.ls("s3://your-bucket-name/")

If access is denied, review the IAM role, bucket policy, and cluster settings.


4. Check AWS VPC and PrivateLink Settings

Symptoms:

  • S3 access works from a local machine but not from Databricks.
  • Access is blocked when using AWS PrivateLink.
  • DNS resolution fails for S3 endpoints.

Causes:

  • VPC settings block outbound access to S3.
  • AWS PrivateLink does not allow public S3 access.
  • S3 endpoint DNS is misconfigured in VPC settings.

Fix:

Check if a VPC Endpoint is blocking S3 access:

aws ec2 describe-vpc-endpoints --query "VpcEndpoints[*].ServiceName"

Ensure S3 VPC endpoint is configured for private access:

aws ec2 create-vpc-endpoint --vpc-id <your-vpc-id> --service-name com.amazonaws.<region>.s3

If using AWS PrivateLink, allow direct access to S3:

aws s3 cp test-file.txt s3://your-bucket-name/

Check if Databricks has access to the internet via NAT Gateway.


5. Verify Databricks Credential Passthrough (If Using User Access Mode)

Symptoms:

  • S3 access works for some users but not others.
  • Notebook users see “Access Denied” when accessing S3.

Causes:

  • Databricks Credential Passthrough is not enabled for the workspace.
  • The user’s IAM role does not have access to S3.

Fix:

Enable Credential Passthrough in Databricks UI:

  1. Go to Databricks Admin Console → Advanced Settings
  2. Enable Credential Passthrough
  3. Restart clusters

Assign IAM role permissions for each user:

{
  "Effect": "Allow",
  "Action": "s3:*",
  "Resource": "arn:aws:s3:::your-bucket-name"
}

Verify the user can list the S3 bucket using AWS CLI:

aws s3 ls s3://your-bucket-name/

Step-by-Step Troubleshooting Guide

1. Verify IAM Role Permissions

aws iam get-role --role-name YOUR_DATABRICKS_ROLE

2. Check S3 Bucket Policy

aws s3api get-bucket-policy --bucket your-bucket-name

3. Test Cluster Access to S3

dbutils.fs.ls("s3://your-bucket-name/")

4. Check AWS VPC Endpoint Settings

aws ec2 describe-vpc-endpoints --filters "Name=service-name,Values=com.amazonaws.us-east-1.s3"

5. Enable Debug Logging for S3 Access Issues

export AWS_DEBUG=1

Best Practices to Avoid S3 Access Issues in Databricks

Use IAM Roles Instead of Access Keys

  • Avoid storing AWS Access Keys in notebooks.

Grant Minimum Required Permissions

  • Use fine-grained IAM policies to control S3 access.

Use Databricks Cluster Policies for Consistency

  • Attach IAM roles to all clusters that need S3 access.

Monitor Access Logs in AWS CloudTrail

  • Check CloudTrail logs for failed S3 requests.

Conclusion

If AWS S3 storage cannot be accessed from Databricks, check:
IAM role permissions for S3 access.
S3 bucket policy to allow Databricks access.
Cluster settings to attach the correct IAM role.
VPC and network settings to ensure connectivity.
Databricks Credential Passthrough settings (if using User Access Mode).

By following this guide, you can troubleshoot and resolve S3 access issues in Databricks efficiently.

guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x