50 interview questions and answers for Azure Synapse SQL Dedicated Pool

Mohammad Gufran Jahangir August 4, 2025 0

List of 50 interview questions and answers for Azure Synapse SQL Dedicated Pool, categorized by level and type:

Table of Contents

🔹 Essential Level (1–15)

What is Azure Synapse SQL Dedicated Pool?
It’s a distributed MPP (massively parallel processing) system used for high-performance analytics and data warehousing.
How is data distributed in a dedicated SQL pool?
Data is distributed across 60 distributions using hash, round-robin, or replicated distribution methods.
Difference between Serverless and Dedicated Pool?
Serverless is pay-per-query; Dedicated is provisioned and billed by performance tier (DWU).
What is DWU?
Data Warehousing Unit — a measure of resources (CPU, memory, IO) allocated.
What is a Control Node vs Compute Node?
Control Node handles query orchestration; Compute Nodes process data in parallel.
What is PolyBase?
A technology to query and load data from external sources like ADLS or Blob storage.
How to load data into Synapse Dedicated Pool?
Using PolyBase, COPY INTO, or CTAS (Create Table As Select).
What is the best practice for loading large datasets?
Use PolyBase or COPY INTO with proper distribution and file format (e.g., Parquet).
How to monitor performance in Synapse?
Use DMVs like sys.dm_pdw_exec_requests, sys.dm_pdw_request_steps.
How does partitioning work in Synapse?
You can define partitioning on large tables to improve query performance.
What types of indexes exist in Synapse SQL Pool?
Clustered, non-clustered, columnstore, and heap.
What is the default recovery model?
Simple recovery — point-in-time restore is not supported.
What are statistics in Synapse?
Metadata that helps the query optimizer make informed decisions.
Can you truncate a table in Synapse?
Yes, using TRUNCATE TABLE.
How do you update statistics manually?
UPDATE STATISTICS table_name(column_name) WITH FULLSCAN.

🔸 Advanced Level (16–35)

What is CTAS and its benefit?
CTAS (Create Table As Select) is used to quickly materialize results.
When to use replicated table distribution?
For small dimension tables used in many joins — to avoid data shuffling.
What causes data movement in Synapse?
Joins on non-distributed or incompatible columns lead to data movement.
How to avoid data skew?
Choose a distribution column with high cardinality and even spread.
Best practices for distribution column?
Avoid NULLs, skewed values; prefer unique identifiers.
Difference between materialized views and tables?
Materialized views store computed results and refresh on demand; tables store raw data.
What is resource class?
Defines memory allocation per query (smallrc, largerc, xlargerc, etc.).
What is a workload group?
A logical container to define resource limits and priorities for queries.
Can you use stored procedures in Synapse SQL Pool?
Yes, but with limitations (no transaction control like rollback).
How does workload isolation work?
By assigning queries to different workload groups and classes.
How to perform incremental loads?
Use staging tables and compare with existing ones using watermark columns.
How to implement Slowly Changing Dimension (SCD)?
Use merge pattern or hashing to detect changes.
How does columnstore index compression help?
Reduces storage and improves I/O throughput by compressing similar values.
What is data skew and how to fix it?
When a few distributions hold most data. Fix by changing distribution column.
How to handle temp tables and concurrency?
Use session-specific temp tables or appropriately scoped table variables.
How to check query execution steps?
Use sys.dm_pdw_request_steps.
Can you use T-SQL features like MERGE?
Yes, but with caveats; error handling is limited.
How to enable row-level security in Synapse?
Currently limited — need to implement via custom views or external tools.
Difference between external table and PolyBase?
External tables define schema on external storage; PolyBase is the engine that accesses it.
How to reduce cost in Synapse?
Pause unused pools, optimize queries, minimize data movement.

🧩 Scenario-Based Questions (36–50)

How would you design a star schema in Synapse?
Fact tables with hash distribution, dimension tables replicated or round-robin.
You see frequent data movement in your query plan — how do you resolve?
Align distribution keys or use replicated tables for dimensions.
A query is running slow — what are your steps to optimize it?

Check for skew
Review join conditions
Review statistics
Use CTAS
Partition large tables

Business requires hourly refresh — how do you manage it?
Use ADF pipelines with incremental loads using watermark logic.
How would you design access control for Synapse?
Use RBAC + SQL security groups + Azure AD integration.
You get “Out of Memory” errors — what do you do?
Change resource class or break query into smaller steps.
How do you back up data from Synapse?
Use COPY INTO to export data to ADLS or use CTAS and export files.
A user complains their report is missing data — how to debug?
Check ETL logs, load times, data filters, and Synapse views.
How to optimize large fact table load?

Use PolyBase or COPY INTO
Partition data
Load in parallel

You have a multi-terabyte dataset. How do you maintain performance?
Distribute and partition wisely, compress with columnstore, use CTAS.
How do you migrate from SQL Server to Synapse?

Export data using Azure Data Factory
Convert schema
Optimize distribution

How to perform CDC (Change Data Capture)?
Use timestamps or surrogate keys with hash comparison.
Security audit required lineage — how do you show it?
Use Azure Purview or external metadata tagging.
How do you integrate Power BI with Synapse?
Connect directly using Synapse Dedicated Pool SQL endpoint.
How to enable monitoring?
Use Synapse Studio monitor tab + Log Analytics + sys.dm_pdw_exec_requests.

Mohammad Gufran Jahangir

Category:

Azure Synapse Analytics