Here’s a comprehensive list of 50 Azure Data Factory (ADF) interview questions across essential, advanced, and scenario-based levels with answers. This will help you prepare effectively:
✅ Essential Level (Basic Concepts)
- What is Azure Data Factory?
ADF is a cloud-based ETL and data integration service for orchestrating and automating data movement and transformation. - What are the core components of ADF?
Pipelines, Activities, Datasets, Linked Services, Triggers, Integration Runtime. - What is a pipeline in ADF?
A logical group of activities for data movement and transformation. - What is a dataset in ADF?
Metadata that defines the schema and location of the data to use in activities. - What is a linked service?
Similar to a connection string; defines the connection to data sources. - What are the types of triggers?
Schedule, Tumbling Window, Event-based. - What is Integration Runtime (IR)?
The compute infrastructure for data movement and transformation in ADF. - Difference between Azure IR and Self-hosted IR?
Azure IR runs in the cloud; SHIR is installed on-prem or in a VM for accessing on-prem data. - What is the use of parameters in ADF?
Used to make pipelines dynamic and reusable. - What is a control activity?
Controls pipeline flow, e.g., If Condition, ForEach, Until, Wait.
🔁 Intermediate to Advanced Level
- What is the difference between mapping data flows and wrangling data flows?
Mapping flows use a UI for transformations; wrangling uses Power Query. - How do you handle failures in pipelines?
Retry policies, On Failure path, activity dependency conditions. - What is a tumbling window trigger?
Executes pipeline at periodic intervals with a fixed size window and no overlap. - How to pass parameters between pipelines?
Using Execute Pipeline activity withparameterkey-value pairs. - How does ADF support CI/CD?
Through Git integration (collaboration & publish branches) and Azure DevOps pipelines. - What is staging in copy activity?
Used in copying between incompatible stores using interim blob storage. - What is the purpose of global parameters?
Shared constants accessible across all pipelines. - What is debug mode in ADF?
Allows testing pipeline runs without triggering real execution or logs. - How do you monitor pipeline executions?
Using Monitor tab, Activity Run Output, or Log Analytics. - How to encrypt sensitive data in ADF?
Use Key Vault for secrets, and secure input/output settings.
💡 Advanced/Architectural Level
- ADF vs. SSIS – Key differences?
ADF is cloud-native, supports serverless execution and modern cloud stores. - How to use Git integration in ADF?
Connect Git repo, work in collaboration branch, publish to adf_publish. - Can we use stored procedures in ADF?
Yes, using Stored Procedure activity with parameters. - What is the role of Data Flows in ADF?
For scalable transformation of data using Spark clusters. - How do you optimize ADF performance?
Use parallel copies, partitioning, efficient IR location, reduce data volume. - What is the maximum pipeline concurrency in ADF?
Default is 20, but configurable per pipeline. - Difference between pipeline parameters, variables, and expressions?
- Parameters: values passed in
- Variables: store runtime values
- Expressions: compute values dynamically
- ADF pricing model?
Based on pipeline activity execution and IR usage (vCore-hours for Data Flows). - How to handle schema drift?
Use Data Flows’ “allow schema drift” and “auto-mapping” features. - Can we invoke REST APIs in ADF?
Yes, using Web activity or REST connector in Copy/Data Flow.
📘 Scenario-Based Questions
- How would you orchestrate ETL jobs with dependency?
Use control flow activities (If, Wait, Until) and activity dependencies. - You need to load 1000 files in parallel, how would you do it?
Use ForEach with batch setting and parallelism set. - How do you copy data from on-prem SQL Server to Azure Data Lake?
Use Self-hosted IR with Copy Activity. - How do you secure ADF pipeline secrets?
Use Azure Key Vault linked service for credentials and secrets. - How to restart only failed activities in a pipeline?
Use checkpointing logic with control activities and variable flags. - You need to implement different logic for dev, test, and prod. How?
Use global parameters or configuration files with environment variables. - Can you deploy ADF pipelines using Azure DevOps? How?
Yes, export ARM templates or use Git repo, build and release pipelines. - Data flow is running slow, what steps will you take?
Enable staging, optimize source queries, tune partitioning, increase core count. - You need to process files only if they land between 2AM–3AM daily?
Use event trigger with time window filtering and validation checks. - How would you handle incremental loads in ADF?
Use watermark columns and Lookup/Filter activities to track delta.
🧠 Bonus Conceptual Questions
- What’s the use of REST connector in Copy Activity?
To extract data from RESTful web services. - Difference between lookup and get metadata activity?
- Lookup fetches data (e.g., SQL query output)
- Get Metadata retrieves structure information.
- How do you validate pipeline before publishing?
Use Validate All option in the UI. - What happens during pipeline publishing in Git mode?
It copies ARM JSON definitions from collaboration to publish branch. - What are sink and source in copy activity?
- Source: data origin
- Sink: destination
- Can ADF trigger Azure Functions?
Yes, using the Azure Function activity. - Can we schedule pipelines with dependencies?
Yes, using Tumbling Window trigger or chained activities. - How do you deal with changing file names dynamically?
Use wildcards and dynamic content with expressions. - How to ensure data is not processed multiple times?
Implement watermarking, status flags, or control tables. - How would you troubleshoot pipeline failure?
Use Monitor → Activity Run output, logs, retry settings, and debug mode.
Category: