50 Azure Data Factory (ADF) interview questions

Mohammad Gufran Jahangir August 5, 2025 0

Here’s a comprehensive list of 50 Azure Data Factory (ADF) interview questions across essential, advanced, and scenario-based levels with answers. This will help you prepare effectively:

Table of Contents

✅ Essential Level (Basic Concepts)

What is Azure Data Factory?
ADF is a cloud-based ETL and data integration service for orchestrating and automating data movement and transformation.
What are the core components of ADF?
Pipelines, Activities, Datasets, Linked Services, Triggers, Integration Runtime.
What is a pipeline in ADF?
A logical group of activities for data movement and transformation.
What is a dataset in ADF?
Metadata that defines the schema and location of the data to use in activities.
What is a linked service?
Similar to a connection string; defines the connection to data sources.
What are the types of triggers?
Schedule, Tumbling Window, Event-based.
What is Integration Runtime (IR)?
The compute infrastructure for data movement and transformation in ADF.
Difference between Azure IR and Self-hosted IR?
Azure IR runs in the cloud; SHIR is installed on-prem or in a VM for accessing on-prem data.
What is the use of parameters in ADF?
Used to make pipelines dynamic and reusable.
What is a control activity?
Controls pipeline flow, e.g., If Condition, ForEach, Until, Wait.

🔁 Intermediate to Advanced Level

What is the difference between mapping data flows and wrangling data flows?
Mapping flows use a UI for transformations; wrangling uses Power Query.
How do you handle failures in pipelines?
Retry policies, On Failure path, activity dependency conditions.
What is a tumbling window trigger?
Executes pipeline at periodic intervals with a fixed size window and no overlap.
How to pass parameters between pipelines?
Using Execute Pipeline activity with parameter key-value pairs.
How does ADF support CI/CD?
Through Git integration (collaboration & publish branches) and Azure DevOps pipelines.
What is staging in copy activity?
Used in copying between incompatible stores using interim blob storage.
What is the purpose of global parameters?
Shared constants accessible across all pipelines.
What is debug mode in ADF?
Allows testing pipeline runs without triggering real execution or logs.
How do you monitor pipeline executions?
Using Monitor tab, Activity Run Output, or Log Analytics.
How to encrypt sensitive data in ADF?
Use Key Vault for secrets, and secure input/output settings.

💡 Advanced/Architectural Level

ADF vs. SSIS – Key differences?
ADF is cloud-native, supports serverless execution and modern cloud stores.
How to use Git integration in ADF?
Connect Git repo, work in collaboration branch, publish to adf_publish.
Can we use stored procedures in ADF?
Yes, using Stored Procedure activity with parameters.
What is the role of Data Flows in ADF?
For scalable transformation of data using Spark clusters.
How do you optimize ADF performance?
Use parallel copies, partitioning, efficient IR location, reduce data volume.
What is the maximum pipeline concurrency in ADF?
Default is 20, but configurable per pipeline.
Difference between pipeline parameters, variables, and expressions?

Parameters: values passed in
Variables: store runtime values
Expressions: compute values dynamically

ADF pricing model?
Based on pipeline activity execution and IR usage (vCore-hours for Data Flows).
How to handle schema drift?
Use Data Flows’ “allow schema drift” and “auto-mapping” features.
Can we invoke REST APIs in ADF?
Yes, using Web activity or REST connector in Copy/Data Flow.

📘 Scenario-Based Questions

How would you orchestrate ETL jobs with dependency?
Use control flow activities (If, Wait, Until) and activity dependencies.
You need to load 1000 files in parallel, how would you do it?
Use ForEach with batch setting and parallelism set.
How do you copy data from on-prem SQL Server to Azure Data Lake?
Use Self-hosted IR with Copy Activity.
How do you secure ADF pipeline secrets?
Use Azure Key Vault linked service for credentials and secrets.
How to restart only failed activities in a pipeline?
Use checkpointing logic with control activities and variable flags.
You need to implement different logic for dev, test, and prod. How?
Use global parameters or configuration files with environment variables.
Can you deploy ADF pipelines using Azure DevOps? How?
Yes, export ARM templates or use Git repo, build and release pipelines.
Data flow is running slow, what steps will you take?
Enable staging, optimize source queries, tune partitioning, increase core count.
You need to process files only if they land between 2AM–3AM daily?
Use event trigger with time window filtering and validation checks.
How would you handle incremental loads in ADF?
Use watermark columns and Lookup/Filter activities to track delta.

🧠 Bonus Conceptual Questions

What’s the use of REST connector in Copy Activity?
To extract data from RESTful web services.
Difference between lookup and get metadata activity?

Lookup fetches data (e.g., SQL query output)
Get Metadata retrieves structure information.

How do you validate pipeline before publishing?
Use Validate All option in the UI.
What happens during pipeline publishing in Git mode?
It copies ARM JSON definitions from collaboration to publish branch.
What are sink and source in copy activity?

Source: data origin
Sink: destination

Can ADF trigger Azure Functions?
Yes, using the Azure Function activity.
Can we schedule pipelines with dependencies?
Yes, using Tumbling Window trigger or chained activities.
How do you deal with changing file names dynamically?
Use wildcards and dynamic content with expressions.
How to ensure data is not processed multiple times?
Implement watermarking, status flags, or control tables.
How would you troubleshoot pipeline failure?
Use Monitor → Activity Run output, logs, retry settings, and debug mode.

Mohammad Gufran Jahangir

Category:

Azure