,

Different ways to load data into Microsoft Fabric Lakehouse

Posted by

There are several ways to load data into your Lakehouse from the explorer page:

  • Local file/folder upload: Easily upload data from your local machine directly to the File Section of your Lakehouse.
  • Notebook code: Utilize available Spark libraries to connect to data sources and load data into dataframes, then save it in your Lakehouse.
  • Copy tool in pipelines: Connect to various data sources and land the data in its original format or convert it into a Delta table.
  • Dataflows Gen 2: Create dataflows to import data, transform it, and publish it into your Lakehouse.
  • Shortcut Creating shortcuts to connect to existing data into your Lakehouse without having to directly copy it.
  • Samples: Quickly ingest sample data to jump-start your exploration of semantic models and tables.

How to Use a notebook to load data into your Lakehouse?

Load data with an Apache Spark API

In the notebook, there’s a code section. Use the example code given there to read data from a source and put it into either the Files or Tables part of your lakehouse. You can tell the notebook where to read from by using either a nearby path (if the data is in the same place as the notebook) or a full path (if it’s in a different place). You can copy this path from the right-click menu on the data.

Copy ABFS path : this return the absolute path of the file

Copy relative path for Spark : this return the relative path of the file in the default lakehouse

df = spark.read.parquet("location to read from") 

# Keep it if you want to save dataframe as CSV files to Files section of the default Lakehouse

df.write.mode("overwrite").format("csv").save("Files/ " + csv_table_name)

# Keep it if you want to save dataframe as Parquet files to Files section of the default Lakehouse

df.write.mode("overwrite").format("parquet").save("Files/" + parquet_table_name)

# Keep it if you want to save dataframe as a delta lake, parquet table to Tables section of the default Lakehouse

df.write.mode("overwrite").format("delta").saveAsTable(delta_table_name)

# Keep it if you want to save the dataframe as a delta lake, appending the data to an existing table

df.write.mode("append").format("delta").saveAsTable(delta_table_name)

How to copy data using copy activity

In Data Pipeline, you can use the Copy activity to copy data among data stores located in the cloud.

To copy data from a source to a destination, the service that runs the Copy activity performs these steps:

  1. Reads data from a source data store.
  2. Performs serialization/deserialization, compression/decompression, column mapping, and so on. It performs these operations based on the configuration.
  3. Writes data to the destination data store.

Add a copy activity using copy assistant
Configure your source

Configure your destination

Once finished, the copy activity will then be added to your data pipeline canvas. All settings, including advanced settings to this copy activity, are available under the tabs when it’s selected.

How to Create your first dataflow to get and transform data

Create a dataflow– Switch to the Data factory experience.

Navigate to your Microsoft Fabric workspace – Select New, and then select Dataflow Gen2.

Get data- select Get data and then select MoreChoose data source, select View more – In New source, select Other > OData as the data source – Enter the URL https://services.odata.org/v4/northwind/northwind.svc/, and then select Next – Select the Orders and Customers tables, and then select Create

Apply transformations and publish – apply a couple of transformations in order to bring this data into the desired shape.

Creating shortcuts to connect to existing data into your Lakehouse without having to directly copy it.

To create a shortcut, open Lakehouse Explorer and select where to place the shortcut under Tables or Files. Creating a shortcut to Delta formatted table under Tables in Lakehouse Explorer will automatically register it as a table, enabling data access through Spark, SQL endpoint, and default semantic model. Spark can access shortcuts in Files for data science projects or for transformation into structured data.

For more details go to below link

https://www.cloudopsnow.in/wp-admin/post.php?post=548&action=edit

guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x