How to Setup Unity Catalog in Azure Databricks

Posted by

To Setup Unity Catalog you should have account global admin privilege’s to the subscription.

At Subscription level it should have Service administration access

To user you should Owner access to this subscription

Than Users can setup unity Catalog

Before setting up unity catalog you should having storage account because metastore should connect to the storage account to store the data

Create Azure databricks

Create access connector for databricks – access connector will be access to storage account and this storage account will linked to databricks workspace

Grant storage blob data contributor access to access connector on storage account

Go to Azure databricks workspace and launch it — Go to manage account

Create metastore ,This should be done by Terraform scripts as well

Now assign the metastore to workspace

To create catalog –> create schema instead catalog–> create table

  1. UC Account Configuration: Only global account administrators are authorized to configure Unity Catalog accounts.
  2. Regional Limitation: Each region can host only one Unity Catalog, ensuring distinct management per geographical location.
  3. Workspace Attachment: Each workspace is limited to attachment to only one Unity Catalog Metastore, which prevents cross-metastore configurations within the same workspace.
  4. Metastore Flexibility: A single Unity Catalog Metastore can be linked to multiple workspaces, allowing for centralized data governance across different projects or teams within the same region.
  5. Regional Restriction on Metastore Assignment: Unity Catalog Metastores are region-specific; a Metastore created in one region (Region X) cannot be assigned to a workspace in a different region (Region Y), highlighting the importance of regional data governance and storage compliance.

SQL Warehouses and Clusters:

  1. SQL Warehouses:
    • These are specialized environments within Databricks designed to run SQL workloads, which include running queries, powering dashboards, and generating visualizations.
    • SQL warehouses provide access to data stored in the Unity Catalog and can execute commands specific to the Unity Catalog by default, assuming the workspace is linked to a Unity Catalog metastore.
  2. Clusters:
    • Clusters in Databricks are used to run workloads for data science and engineering as well as for machine learning applications. These workloads can be managed through notebooks or automated jobs.
    • To access the Unity Catalog from a cluster, the cluster must be part of a workspace that is connected to a Unity Catalog metastore and configured with an access mode that supports Unity Catalog interactions. These access modes include “shared” or “single user” settings.
    • Unity Catalog enforces security by requiring that clusters are configured with these specific access modes to interact with the catalog. If a cluster is not configured correctly, it will not have access to the Unity Catalog data.

Computer where unity catalog is enabled

it should be single user or shared access mode to enable the unity catalog . it is not available at no isolation shared

Access ModeVisibilityUnity Catalog SupportSupported LanguagesNotes
Single UserAlwaysYesPython, SQL, Scala, RAssigned to one user. Requires SELECT on all referenced tables/views. Does not support credential pass-through.
SharedAlways (Premium plan required)YesPython (Databricks Runtime 11.3 LTS and above), SQLUsed by multiple users with data isolation. Restrictions apply.
No Isolation SharedAdmins can hide this typeNoPython, SQL, Scala, RAccount-level settings affect the visibility and use of this type.
CustomHidden (For all new clusters)NoPython, SQL, Scala, RVisible only for existing clusters without a specified access mode.

Create and Manage Catalog, Schema, Tables in UNITY CATALOGĀ 

Create catalog and work with specific catalog

To grant access

To grant access via portal

to create table

Assign a catalog to specific workspaces in UNITY CATALOG-workspace catalog binding

  • Workspace Isolation for Data Access: To isolate user data access, the concept of workspace-catalog binding allows organizations to restrict catalog access to specific workspaces within their account. This method is contrary to the default setting, where a catalog is accessible by all workspaces connected to the current metastore.
  • Use Cases for Catalog Binding:
    1. Production Data Access: By binding a catalog to a production workspace, you ensure that users within that workspace can only access production data. This restricts access to production data from non-production environments.
    2. Sensitive Data Processing: Ensuring that sensitive data is only processed within a dedicated workspace. This workspace is configured to handle such data, ensuring that sensitive information is not exposed in less secure or developmental environments.

The diagram in the image visually illustrates the concept, showing two workspaces (Prod ETL Workspace and Prod Analytics Workspace) tied to a ‘prod_catalog’, indicating that they have access only to production data. Conversely, the Dev Workspace is shown to have a separate binding to a ‘dev_catalog’, indicating its access is limited to development data.

This workspace-catalog binding strategy is essential for maintaining data integrity and compliance with data governance policies by controlling who can access and process different types of data based on their workspace designation.

To keep catalog to shared within specific workspace if many workspaces sharing same metashare, you can follow below steps

Go to catalog –> workspace–>uncheck –>assign to workspace on which we want access

Create external location table in unity catalogĀ 

These are manage tables where data files resides in unity catalog metashore

\If want to create table whose data files reside outside the metastore – we can do by external tables

To create external table

Table has dropped but files will be in storage account

Databricks-to-Databricks Delta Sharing in unity catalog

Sharing data between different workspace either in share subscription or in another subscription. Data share will work only if both workspace has unity catalog enabled

recipient – to whom want to share data

configure recipient –> go to data share –> new recipient

Run below command to get unique udentifier

Add the code

recipient created

To share to provident –> create new share

Add table which we want to share

Add schema, tables or all tables

Add recipient to whom we want to share

Grant share once added recipient

can see details in share with me

create the catalog after creating it will under catalog

Open delta Sharing in unity catalog

How to share from databricks workspace to non databricks users

Provider – who providing data

recipient – to receiving the data

create recipient without id

send activation link to user

copy and hit enter

guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x