- Data pipelines are like organized groups of tasks for moving data.
- They help with activities like extracting, transforming, and loading data (ETL).
- Instead of doing each task separately, pipelines let you manage them all together.
Create a data pipeline
- To create a new pipeline, go to your workspace and click the +New button, then select Data pipeline.
- In the New pipeline dialog, give your pipeline a name and click Create. You’ll then land in the pipeline canvas area.

- There are three options to start building your pipeline:
- “Add pipeline activity” lets you create new pipelines from scratch using pipeline activities.
- “Copy data” guides you through selecting a data source and destination, configuring options, and creating a Copy Data task.
- “Choose a task to start” provides predefined templates for different scenarios.

- Choose “Copy data” to launch the Copy assistant.
- In the Copy data assistant, select your data source (e.g., COVID-19 Data Lake sample) and destination workspace (e.g., Data Warehouse).
- Configure the destination table by selecting the warehouse, providing a table name, and configuring column mappings.

- Review the options for staging and advanced settings without changing them.
- Review the summary of the copy activity and select “Start data transfer immediately” then click Save + Run.
- The pipeline starts running automatically, and you can monitor its status in the Output pane.
- After completion, navigate back to your warehouse to preview the copied data.

How to Ingest data into your Warehouse using the COPY statement
- The COPY statement is used to bring data into Warehouse tables efficiently.
- It can pull in a lot of data quickly from an Azure storage account.
- You can customize how it handles things like file formats and where to put rejected rows.
- The tutorial demonstrates how to use the COPY statement to load data into a Warehouse table.
Create a table
- First, you need to create the destination table before using the COPY statement.
- To create the table, follow these steps:
- Go to your Microsoft Fabric workspace and open your warehouse.
- Switch to the Home tab and click on “New SQL query.”
- To create the table used as the destination in this tutorial, run the following code.

Ingest Parquet data using the COPY statement
- First, we’re loading data from a Parquet source.
- This data is public and doesn’t need any login info.
- You can copy this data by saying where it’s from and where it’s going.
- You only need to mention the FILE_TYPE.
COPY INTO [dbo].[bing_covid-19_data]
FROM 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet'
WITH (
FILE_TYPE = 'PARQUET'
);
Ingest CSV data using the COPY statement and skipping a header row
COPY INTO [dbo].[bing_covid-19_data]
FROM 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.csv'
WITH (
FILE_TYPE = 'CSV',
FIRSTROW = 2
);
Ingest data into your Warehouse using Transact-SQL
Creating a new table with the result of a query by using CREATE TABLE AS SELECT (CTAS)
The CREATE TABLE AS SELECT (CTAS) statement allows you to create a new table in your warehouse from the output of a SELECT
statement.
CREATE TABLE [dbo].[bing_covid-19_data_2023]
AS
SELECT *
FROM [dbo].[bing_covid-19_data]
WHERE DATEPART(YEAR,[updated]) = '2023';
Ingesting data into existing tables with T-SQL queries
INSERT INTO [dbo].[bing_covid-19_data_2023]
SELECT * FROM [dbo].[bing_covid-19_data]
WHERE [updated] > '2023-02-28';
Ingesting data from tables on different warehouses and lakehouses
- You can use either “CREATE TABLE AS SELECT” or “INSERT…SELECT” to make a new table based on the results of a SELECT statement.
- With these commands, you can reference tables from different warehouses (places where data is stored) by using a special naming convention.
- This can be achieved by using the three-part naming convention
[warehouse_or_lakehouse_name.][schema_name.]table_name
. - For example, if you have a lakehouse called “cases_lakehouse” and warehouses called “reference_warehouse” and “research_warehouse,” you can create a new table that combines data from all these places using the three-part naming convention.
CREATE TABLE [research_warehouse].[dbo].[cases_by_continent]
AS
SELECT
FROM [cases_lakehouse].[dbo].[bing_covid-19_data] cases
INNER JOIN [reference_warehouse].[dbo].[bing_covid-19_data] reference
ON cases.[iso3] = reference.[countrycode];