List of 50 interview questions and answers for Azure Synapse SQL Dedicated Pool, categorized by level and type:
πΉ Essential Level (1β15)
- What is Azure Synapse SQL Dedicated Pool?
It’s a distributed MPP (massively parallel processing) system used for high-performance analytics and data warehousing. - How is data distributed in a dedicated SQL pool?
Data is distributed across 60 distributions using hash, round-robin, or replicated distribution methods. - Difference between Serverless and Dedicated Pool?
Serverless is pay-per-query; Dedicated is provisioned and billed by performance tier (DWU). - What is DWU?
Data Warehousing Unit β a measure of resources (CPU, memory, IO) allocated. - What is a Control Node vs Compute Node?
Control Node handles query orchestration; Compute Nodes process data in parallel. - What is PolyBase?
A technology to query and load data from external sources like ADLS or Blob storage. - How to load data into Synapse Dedicated Pool?
Using PolyBase, COPY INTO, or CTAS (Create Table As Select). - What is the best practice for loading large datasets?
Use PolyBase or COPY INTO with proper distribution and file format (e.g., Parquet). - How to monitor performance in Synapse?
Use DMVs likesys.dm_pdw_exec_requests,sys.dm_pdw_request_steps. - How does partitioning work in Synapse?
You can define partitioning on large tables to improve query performance. - What types of indexes exist in Synapse SQL Pool?
Clustered, non-clustered, columnstore, and heap. - What is the default recovery model?
Simple recovery β point-in-time restore is not supported. - What are statistics in Synapse?
Metadata that helps the query optimizer make informed decisions. - Can you truncate a table in Synapse?
Yes, usingTRUNCATE TABLE. - How do you update statistics manually?
UPDATE STATISTICS table_name(column_name) WITH FULLSCAN.
πΈ Advanced Level (16β35)
- What is CTAS and its benefit?
CTAS (Create Table As Select) is used to quickly materialize results. - When to use replicated table distribution?
For small dimension tables used in many joins β to avoid data shuffling. - What causes data movement in Synapse?
Joins on non-distributed or incompatible columns lead to data movement. - How to avoid data skew?
Choose a distribution column with high cardinality and even spread. - Best practices for distribution column?
Avoid NULLs, skewed values; prefer unique identifiers. - Difference between materialized views and tables?
Materialized views store computed results and refresh on demand; tables store raw data. - What is resource class?
Defines memory allocation per query (smallrc,largerc,xlargerc, etc.). - What is a workload group?
A logical container to define resource limits and priorities for queries. - Can you use stored procedures in Synapse SQL Pool?
Yes, but with limitations (no transaction control like rollback). - How does workload isolation work?
By assigning queries to different workload groups and classes. - How to perform incremental loads?
Use staging tables and compare with existing ones using watermark columns. - How to implement Slowly Changing Dimension (SCD)?
Use merge pattern or hashing to detect changes. - How does columnstore index compression help?
Reduces storage and improves I/O throughput by compressing similar values. - What is data skew and how to fix it?
When a few distributions hold most data. Fix by changing distribution column. - How to handle temp tables and concurrency?
Use session-specific temp tables or appropriately scoped table variables. - How to check query execution steps?
Usesys.dm_pdw_request_steps. - Can you use T-SQL features like MERGE?
Yes, but with caveats; error handling is limited. - How to enable row-level security in Synapse?
Currently limited β need to implement via custom views or external tools. - Difference between external table and PolyBase?
External tables define schema on external storage; PolyBase is the engine that accesses it. - How to reduce cost in Synapse?
Pause unused pools, optimize queries, minimize data movement.
π§© Scenario-Based Questions (36β50)
- How would you design a star schema in Synapse?
Fact tables with hash distribution, dimension tables replicated or round-robin. - You see frequent data movement in your query plan β how do you resolve?
Align distribution keys or use replicated tables for dimensions. - A query is running slow β what are your steps to optimize it?
- Check for skew
- Review join conditions
- Review statistics
- Use CTAS
- Partition large tables
- Business requires hourly refresh β how do you manage it?
Use ADF pipelines with incremental loads using watermark logic. - How would you design access control for Synapse?
Use RBAC + SQL security groups + Azure AD integration. - You get βOut of Memoryβ errors β what do you do?
Change resource class or break query into smaller steps. - How do you back up data from Synapse?
UseCOPY INTOto export data to ADLS or use CTAS and export files. - A user complains their report is missing data β how to debug?
Check ETL logs, load times, data filters, and Synapse views. - How to optimize large fact table load?
- Use PolyBase or COPY INTO
- Partition data
- Load in parallel
- You have a multi-terabyte dataset. How do you maintain performance?
Distribute and partition wisely, compress with columnstore, use CTAS. - How do you migrate from SQL Server to Synapse?
- Export data using Azure Data Factory
- Convert schema
- Optimize distribution
- How to perform CDC (Change Data Capture)?
Use timestamps or surrogate keys with hash comparison. - Security audit required lineage β how do you show it?
Use Azure Purview or external metadata tagging. - How do you integrate Power BI with Synapse?
Connect directly using Synapse Dedicated Pool SQL endpoint. - How to enable monitoring?
Use Synapse Studio monitor tab + Log Analytics +sys.dm_pdw_exec_requests.
Category: