Definitions of terms used in Microsoft Fabric, including terms specific to Synapse Data Warehouse, Synapse Data Engineering, Synapse Data Science, Synapse Real-Time Analytics, Data Factory, and Power BI.
General Terms in Microsoft Fabric:
- Capacity: Refers to the dedicated resources available in Fabric for processing data and performing tasks. Different services require different amounts of capacity depending on their complexity. Fabric offers various capacity options through SKUs (service tiers) or trials.
- Example: You might use a lower-capacity tier for development and testing, then scale up to a higher tier for production workloads handling large datasets.
- Experience: Represents a collection of capabilities within Fabric focused on specific data functionalities. Each experience acts as a module, offering tools and services for various data tasks. Core Fabric experiences include:
- Synapse Data Warehouse: Building and managing data warehouses for large-scale data analysis.
- Synapse Data Engineering: Moving, transforming, and orchestrating data within pipelines.
- Synapse Data Science: Exploring, analyzing data, and building machine learning models.
- Synapse Real-Time Analytics: Processing and analyzing data streams as they are generated.
- Data Factory: Orchestrating and automating data movement and transformation across sources.
- Power BI: Creating interactive reports and dashboards for data visualization and business intelligence.
- Example: If you’re building a customer data warehouse, you’d primarily use the Synapse Data Warehouse experience.
- Item: Represents a specific set of capabilities within a chosen Fabric experience. Each experience offers various item types allowing you to perform specific tasks. Users create, edit, and delete items to customize their Fabric environment and workflows. Here are some examples:
- Synapse Data Warehouse: Data Warehouse (storage structure), Linked Service (connection to an external data source).
- Synapse Data Engineering: Notebook (interactive coding environment), Spark Job Definition (definition for running Apache Spark jobs).
- Data Factory: Pipeline (sequence of data movement and transformation activities), Data Flow (visual representation of data transformation logic).
- Example: Within Data Engineering, you might create a Notebook to explore data and a Spark Job Definition to process and transform it.
- Tenant: A single instance of Microsoft Fabric dedicated to serving a specific organization. A tenant typically aligns with an Azure Active Directory for secure access and management. Each organization has a dedicated Fabric environment for their data and analytics projects.
- Example: Your company would have a separate Fabric tenant compared to another organization.
- Workspace: Provides a collaborative environment within Fabric. Users bring together different functionalities (items) from various experiences to create a centralized space for data analysis projects. Teams can work seamlessly on data pipelines, exploration, and reporting within a single workspace.
- Example: You might create a workspace for your marketing team to access customer data reports, build dashboards, and collaborate on data-driven marketing campaigns.
Deep Dive into Specific Services:
1. Synapse Data Engineering:
- Purpose: Provides tools and services to design, build, and manage data pipelines that move and transform data at scale.
- Key Features:
- Apache Spark Integration: Leverages Apache Spark, a powerful engine for distributed data processing.
- Notebooks: Offers interactive coding environments for data exploration and transformation logic development.
- Data Flows: Provides a visual interface for building data pipelines with drag-and-drop functionality.
- Trigger Management: Schedules and automates data pipeline execution.
2. Data Factory:
- Purpose: Acts as an orchestration engine to automate data movement and transformation workflows across various data sources.
- Key Features:
- Data Pipelines: Defines sequences of activities for moving and transforming data.
- Connectors: Provides pre-built connectors to connect to various data sources and destinations (databases, cloud storage, etc.).
- Scheduling and Monitoring: Schedules data pipeline execution and monitors their progress for troubleshooting.
3. Synapse Data Science:
- Purpose: Offers a workspace for data scientists to explore, analyze data, and build and deploy machine learning models.
- Key Features:
- Jupyter Notebooks: Provides a familiar environment for data exploration, model development, and experimentation.
- Integration with Azure Machine Learning: Enables deploying and managing machine learning models within the Fabric ecosystem.
- Collaboration Tools: Allows data scientists to share code, models, and notebooks for teamwork.
4. Synapse Data Warehousing:
- Purpose: Provides tools and services for building and managing data warehouses for large-scale data analysis.
- Key Features:
- Dedicated SQL pools: Offers dedicated resources for running complex data warehouse queries.
- Serverless SQL pools: Provides a cost-effective option for running ad-hoc queries or smaller workloads.
- Integration with Synapse Analytics: Allows seamless integration with other Fabric experiences for a unified data analysis environment.
5. Synapse Real-Time Analytics:
- Purpose: Enables processing and analyzing data streams as they are generated (real-time).
- Key Features:
- Apache Spark Streaming Integration: Leverages Apache Spark Streaming for real-time data processing.
- Stream Analytics Jobs: Defines logic for processing and analyzing real-time data streams.
- Integration with Azure Event Hubs: Enables receiving and processing real-time data streams from various sources.
6. OneLake:
- Purpose: Acts as a logical data lake layer within Microsoft Fabric.
- Key Features:
- Unified View: Provides a single entry point to access and manage data stored across different Azure data lake storage locations.
- Improved Security and Governance: Allows centralized control over access and security policies for all your data in Fabric.
- Flexibility: Integrates with various data sources, including on-premises data lakes using Azure Data Share.