- Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence.
- It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place.
- With Fabric, you don’t need to piece together different services from multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-use product that is designed to simplify your analytics needs.
- The platform is built on a foundation of Software as a Service (SaaS), which takes simplicity and integration to a whole new level.
Microsoft Fabric has SaaS foundation
- Microsoft Fabric brings together new and existing components from Power BI, Azure Synapse, and Azure Data Factory into a single integrated environment.
- These components are then presented in various customized user experiences.
- Fabric brings together experiences such as Data Engineering, Data Factory, Data Science, Data Warehouse, Real-Time Analytics, and Power BI onto a shared SaaS foundation.
This integration provides the following advantages:
- Access to an extensive range of deeply integrated analytics in the industry.
- Shared experiences across experiences that are familiar and easy to learn.
- Developers can easily access and reuse all assets.
- A unified data lake that allows you to retain the data where it is while using your preferred analytics tools.
- Centralized administration and governance across all experiences.
Components of Microsoft Fabric
Fabric includes industry-leading experiences in the following categories for an end-to-end analytical need.
Data Engineering – Data Engineering experience provides a world class Spark platform with great authoring experiences, enabling data engineers to perform large scale data transformation and democratize data through the lakehouse. Microsoft Fabric Spark’s integration with Data Factory enables notebooks and spark jobs to be scheduled and orchestrated.
Data Factory – Azure Data Factory combines the simplicity of Power Query with the scale and power of Azure Data Factory. You can use more than 200 native connectors to connect to data sources on-premises and in the cloud.
Data Science – Data Science experience enables you to build, deploy, and operationalize machine learning models seamlessly within your Fabric experience. It integrates with Azure Machine Learning to provide built-in experiment tracking and model registry. Data scientists are empowered to enrich organizational data with predictions and allow business analysts to integrate those predictions into their BI reports.
Data Warehouse – Data Warehouse experience provides industry leading SQL performance and scale. It fully separates compute from storage, enabling independent scaling of both the components. Additionally, it natively stores data in the open Delta Lake format.
Real-Time Analytics – Observational data, which is collected from various sources such as apps, IoT devices, human interactions, and so many more. It’s currently the fastest growing data category. This data is often semi-structured in formats like JSON or Text. It comes in at high volume, with shifting schemas. These characteristics make it hard for traditional data warehousing platforms to work with. Real-Time Analytics is best in class engine for observational data analytics.
Power BI – Power BI is the world’s leading Business Intelligence platform. It ensures that business owners can access all the data in Fabric quickly and intuitively to make better decisions with data.
what is OneLake and lakehouse – the unification of lakehouses
The Microsoft Fabric platform unifies the OneLake and lakehouse architecture across the enterprises.
OneLake
- The data lake is the foundation on which all the Fabric services are built. Microsoft Fabric Lake is also known as onelake.
- It’s built into the Fabric service and provides a unified location to store all organizational data where the experiences operate.
- OneLake is built on top of ADLS (Azure Data Lake Storage) Gen2. It provides a single SaaS experience and a tenant-wide store for data that serves both professional and citizen developers.
- The OneLake SaaS experience simplifies the experiences, eliminating the need for users to understand any infrastructure concepts such as resource groups, RBAC (Role-Based Access Control), Azure Resource Manager, redundancy, or regions.
- Additionally it doesn’t require the user to even have an Azure account.
OneLake:
- Concept: OneLake acts as a logical data lake, offering a unified view and governance layer across various physical data storage locations within Azure.
- Benefits:
- Simplified Data Management: OneLake eliminates the need to manage and access data scattered across different Azure storage accounts.
- Improved Security and Governance: It provides centralized access control and security policies for all your data in the Fabric environment.
- Flexibility: OneLake can integrate with various data sources, including on-premises data lakes using Azure Data Share.
- Example: Imagine having sales data in one Azure Blob Storage account and customer data in another. OneLake provides a single entry point to access and manage both datasets within Fabric, even though they reside in separate physical locations.
Data Lake:
- Concept: Datalake refers to the physical storage location for your raw, semi-structured, or unstructured data within Azure. While OneLake provides a logical view, the actual data resides in underlying data lake storage solutions.
- Storage Options:
- Azure Data Lake Storage (ADLS Gen2): This is the primary data lake storage used by Fabric. It offers scalability, security, and compatibility with various data processing tools.
- Azure Blob Storage: While less common for primary data lake storage in Fabric, Blob Storage can also be used within OneLake for specific data needs.
- Example: Your sales data might be stored in a dedicated ADLS Gen2 account within Azure. This ADLS Gen2 account acts as the physical data lake where the raw data resides. OneLake then provides a logical view and access point to this data lake within the Fabric environment.
Steps involved (using the example above):
- Data Ingestion: You use Azure Data Factory or other data movement tools to transfer your sales data from its source (e.g., database) to the ADLS Gen2 account.
- OneLake Integration: Data Factory can register the data location within the ADLS Gen2 account with OneLake. This creates a logical representation of the data in OneLake’s metadata layer.
- Data Processing and Analysis: Tools like Azure Databricks or Azure Synapse Analytics can access the data through OneLake, even though it physically resides in the ADLS Gen2 account.
Key Takeaway:
Think of OneLake as a library catalog. It tells you what books (data) are available and where to find them (physical data lake storage like ADLS Gen2) within the Fabric ecosystem. This simplifies data management and streamlines access for various data processing and analysis tools.
The following image shows the various Fabric items where data is stored. It’s an example of how various items within Fabric would store data inside OneLake. As displayed, you can create multiple workspaces within a tenant, create multiple lakehouses within each workspace. A lakehouse is a collection of files, folders, and tables that represents a database over a data lake.
Every developer and business unit in the tenant can instantly create their own workspaces in OneLake. They can ingest data into their own lakehouses, start processing, analyzing, and collaborating on the data, just like OneDrive in Office. The experiences such as Data Engineering, Data Warehouse, Data Factory, Power BI, and Real-Time Analytics use OneLake as their native store. They don’t need any extra configuration.
OneLake is designed to allow instant mounting of existing PaaS storage accounts into OneLake with the shortcut feature. There’s no need to migrate or move any of the existing data. Using shortcuts, you can access the data stored in Azure Data Lake Storage.
Additionally, shortcuts allow you to easily share data between users and applications without moving or duplicating information.
How one lake and Data lake works?
OneLake: The Logical Organizer
Imagine OneLake as a central library catalog in Microsoft Fabric. It doesn’t physically store the data itself, but rather provides a logical view and management layer for your data stored across various Azure data lake locations.
Benefits of OneLake:
- Simplified Data Management: OneLake eliminates the need to manage and access data scattered across different Azure storage accounts. You have a single point of reference for all your data within Fabric.
- Improved Security and Governance: OneLake allows you to define centralized access control and security policies for all your data in the Fabric environment. This ensures consistent data security regardless of its physical location.
- Flexibility: OneLake can integrate with various data sources, including on-premises data lakes using Azure Data Share. This allows you to manage and govern data from diverse sources within a unified platform.
Datalake: The Physical Storage
Datalake, on the other hand, refers to the physical storage location for your raw, semi-structured, or unstructured data within Azure. This is where the actual data resides, serving as the “books” in the library analogy.
Common Datalake Storage Options:
- Azure Data Lake Storage (ADLS) Gen2: This is the primary data lake storage used by Fabric. It offers scalability, security, and compatibility with various data processing tools. Think of it as the main bookshelf in the library.
- Azure Blob Storage: While less common for primary data lake storage in Fabric, Blob Storage can also be used within OneLake for specific data needs. Imagine it as an additional storage area in the library for specific data types.
How They Work Together (Example):
- Data Ingestion: You use Azure Data Factory (or other data movement tools) to transfer your sales data from its source (e.g., database) to a dedicated ADLS Gen2 account within Azure. This act of placing the data on the shelf.
- OneLake Registration: Data Factory can register the data location within the ADLS Gen2 account with OneLake. This creates a logical representation of the data in OneLake’s metadata layer. It’s like adding a library catalog entry for the sales data, specifying its location on the shelf (ADLS Gen2).
- Data Processing and Analysis: Tools like Azure Databricks or Azure Synapse Analytics can now access the data through OneLake. Even though the data physically resides in the ADLS Gen2 account (the shelf), OneLake provides a convenient entry point for these tools. It’s like using the library catalog to find the sales data (book) on the shelf (ADLS Gen2).
Key Points to Remember:
- OneLake acts as a virtual layer on top of the physical data lake storage (ADLS Gen2 or Blob Storage).
- OneLake simplifies data management by providing a unified view and access point for all your data in Fabric.
- Security and governance policies are applied through OneLake, ensuring consistent data protection across diverse storage locations.