To Setup Unity Catalog you should have account global admin privilege’s to the subscription.
At Subscription level it should have Service administration access
To user you should Owner access to this subscription
Than Users can setup unity Catalog
Before setting up unity catalog you should having storage account because metastore should connect to the storage account to store the data
Create Azure databricks
Create access connector for databricks – access connector will be access to storage account and this storage account will linked to databricks workspace
Grant storage blob data contributor access to access connector on storage account
Go to Azure databricks workspace and launch it — Go to manage account
Create metastore ,This should be done by Terraform scripts as well
Now assign the metastore to workspace
To create catalog –> create schema instead catalog–> create table
- UC Account Configuration: Only global account administrators are authorized to configure Unity Catalog accounts.
- Regional Limitation: Each region can host only one Unity Catalog, ensuring distinct management per geographical location.
- Workspace Attachment: Each workspace is limited to attachment to only one Unity Catalog Metastore, which prevents cross-metastore configurations within the same workspace.
- Metastore Flexibility: A single Unity Catalog Metastore can be linked to multiple workspaces, allowing for centralized data governance across different projects or teams within the same region.
- Regional Restriction on Metastore Assignment: Unity Catalog Metastores are region-specific; a Metastore created in one region (Region X) cannot be assigned to a workspace in a different region (Region Y), highlighting the importance of regional data governance and storage compliance.
SQL Warehouses and Clusters:
- SQL Warehouses:
- These are specialized environments within Databricks designed to run SQL workloads, which include running queries, powering dashboards, and generating visualizations.
- SQL warehouses provide access to data stored in the Unity Catalog and can execute commands specific to the Unity Catalog by default, assuming the workspace is linked to a Unity Catalog metastore.
- Clusters:
- Clusters in Databricks are used to run workloads for data science and engineering as well as for machine learning applications. These workloads can be managed through notebooks or automated jobs.
- To access the Unity Catalog from a cluster, the cluster must be part of a workspace that is connected to a Unity Catalog metastore and configured with an access mode that supports Unity Catalog interactions. These access modes include “shared” or “single user” settings.
- Unity Catalog enforces security by requiring that clusters are configured with these specific access modes to interact with the catalog. If a cluster is not configured correctly, it will not have access to the Unity Catalog data.
Computer where unity catalog is enabled
it should be single user or shared access mode to enable the unity catalog . it is not available at no isolation shared
Access Mode | Visibility | Unity Catalog Support | Supported Languages | Notes |
---|---|---|---|---|
Single User | Always | Yes | Python, SQL, Scala, R | Assigned to one user. Requires SELECT on all referenced tables/views. Does not support credential pass-through. |
Shared | Always (Premium plan required) | Yes | Python (Databricks Runtime 11.3 LTS and above), SQL | Used by multiple users with data isolation. Restrictions apply. |
No Isolation Shared | Admins can hide this type | No | Python, SQL, Scala, R | Account-level settings affect the visibility and use of this type. |
Custom | Hidden (For all new clusters) | No | Python, SQL, Scala, R | Visible only for existing clusters without a specified access mode. |
Create and Manage Catalog, Schema, Tables in UNITY CATALOGÂ
Create catalog and work with specific catalog
To grant access
To grant access via portal
to create table
Assign a catalog to specific workspaces in UNITY CATALOG-workspace catalog binding
- Workspace Isolation for Data Access: To isolate user data access, the concept of workspace-catalog binding allows organizations to restrict catalog access to specific workspaces within their account. This method is contrary to the default setting, where a catalog is accessible by all workspaces connected to the current metastore.
- Use Cases for Catalog Binding:
- Production Data Access: By binding a catalog to a production workspace, you ensure that users within that workspace can only access production data. This restricts access to production data from non-production environments.
- Sensitive Data Processing: Ensuring that sensitive data is only processed within a dedicated workspace. This workspace is configured to handle such data, ensuring that sensitive information is not exposed in less secure or developmental environments.
The diagram in the image visually illustrates the concept, showing two workspaces (Prod ETL Workspace and Prod Analytics Workspace) tied to a ‘prod_catalog’, indicating that they have access only to production data. Conversely, the Dev Workspace is shown to have a separate binding to a ‘dev_catalog’, indicating its access is limited to development data.
This workspace-catalog binding strategy is essential for maintaining data integrity and compliance with data governance policies by controlling who can access and process different types of data based on their workspace designation.
To keep catalog to shared within specific workspace if many workspaces sharing same metashare, you can follow below steps
Go to catalog –> workspace–>uncheck –>assign to workspace on which we want access
Create external location table in unity catalogÂ
These are manage tables where data files resides in unity catalog metashore
\If want to create table whose data files reside outside the metastore – we can do by external tables
To create external table
Table has dropped but files will be in storage account
Databricks-to-Databricks Delta Sharing in unity catalog
Sharing data between different workspace either in share subscription or in another subscription. Data share will work only if both workspace has unity catalog enabled
recipient – to whom want to share data
configure recipient –> go to data share –> new recipient
Run below command to get unique udentifier
Add the code
recipient created
To share to provident –> create new share
Add table which we want to share
Add schema, tables or all tables
Add recipient to whom we want to share
Grant share once added recipient
can see details in share with me
create the catalog after creating it will under catalog
Open delta Sharing in unity catalog
How to share from databricks workspace to non databricks users
Provider – who providing data
recipient – to receiving the data
create recipient without id
send activation link to user
copy and hit enter