UNITY CATALOG in Azure Databricks

Posted by

It is generally administrate the access and also audit the database access. It is also know as Data Governance.

  1. Simplifies Security and Governance: Offers a central location for managing and auditing data access, enhancing security.
  2. Unified Catalog: Stores all data, machine learning models, and analytics artifacts, integrating seamlessly with existing data management systems like Hive metastore.
  3. Unified Data Access Controls: Provides a consistent permissions model across all data assets and environments, including cloud platforms, safeguarding personally identifiable information (PII) through attribute-based access control (ABAC).
  4. Data Isolation: Achieves data isolation at multiple levels—environment, storage location, and data objects—without compromising the ability to centrally manage access and audits.
  5. Data Auditing: Centralizes data access auditing with enhanced alerts and monitoring to ensure accountability.
  6. Data Quality Management: Implements robust data quality controls, including testing and monitoring, to ensure the integrity and utility of data for business intelligence, analytics, and machine learning workloads.
  7. Data Lineage: Provides end-to-end visibility of data flow from source to consumption, enabling better management and traceability.
  8. Data Discovery: Facilitates quick and easy discovery of data by data scientists, analysts, and engineers, helping to reduce time to value.
  9. Data Sharing: Supports secure data sharing across different clouds and platforms using Delta Sharing, allowing collaboration within and across organizations regardless of the computing platforms used.

Unity Catalog Architecture

It can connect with multiples databricks workspace.

In Unity Catalog it has User management and Metastore

Metastore – where we can store schema , objects and information about your objects.

need to create manually the Metastore , it can be sahred multiples wokspaces, it will capture your audit log, data lineage.

User management – to store set of Users, group of users and also import users from azure directory

if any azure service principle , or manage identity try to access any workspace, unity catalog user management will check and only after authorization user able to see tables or view.

At the top metastore –> has catalog–> has Schema –> has managed table, external table, view etc.

Schema you can think as database

Share , recipient and provider –> for share anything we use delta sharing

Provider is who shared the data

External Data -data or tables reside in different storage account -it has storage crendential and external location

Definitions for various terms related to a metadata management system called the Unity Catalog, which uses a hierarchical structure for organizing data assets:

  1. METASTORE: The top layer in the Unity Catalog, holding metadata for data assets like tables and views, including their access permissions.
  2. CATALOG: The highest level of the data asset hierarchy in Unity Catalog, organizing data assets into groups that users can access if they have the appropriate permissions.
  3. SCHEMA: Known also as databases, schemas form the second layer of the hierarchy, holding tables and views.
  4. TABLE: The most granular level in the hierarchy, where data is stored. Tables can be either stored internally within the system or externally in cloud storage.
  5. VIEW: A stored query that is represented as a table within schemas, providing a dynamic view of the data based on SQL queries.
  6. EXTERNAL LOCATION: An object that includes references to storage credentials and paths within cloud storage, linked in the Unity Catalog.
  7. STORAGE CREDENTIAL: An object that encapsulates credentials needed for long-term access to cloud storage, managed within the Unity Catalog.
  8. FUNCTION: A user-defined function stored within a schema that can be used to manipulate or query data.
  9. REGISTERED MODEL: An MLflow registered model that is managed within the Unity Catalog.
  10. SHARE: A logical grouping within the Unity Catalog used for organizing tables intended for sharing through Delta Sharing.
  11. RECIPIENT: Represents an organization or user group that receives data shared via Delta Sharing.
  12. PROVIDER: Represents an organization that has made data available for sharing, with these details managed within the Unity Catalog.
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x