What all Design decisions and coding techniques for Synapse SQL features in Azure Synapse Analytics

Posted by

You’ll find a list of resources for dedicated SQL pool and serverless SQL pool functions of Synapse SQL

Key design decisions

The articles below highlight concepts and design decisions for Synapse SQL development:


 Specific coding techniques, tips, and recommendations for development:

Best practices for dedicated SQL pools in Azure Synapse Analytics

Reduce cost with pause and scale

Maintain statistics

Tune query performance

  • Performance tuning with materialized views
  • Performance tuning with ordered clustered columnstore index
  • Performance tuning with result set caching

Group INSERT statements into batches

Use PolyBase to load and export data quickly

Load then query external tables

Hash distribute large tables

Do not over-partition

Minimize transaction sizes

Reduce query result sizes

Use the smallest possible column size

Use temporary heap tables for transient data

Optimize clustered columnstore tables

Use larger resource class to improve query performance

Use smaller resource class to increase concurrency

Use DMVs to monitor and optimize your queries

Dedicated SQL pools have several DMVs that can be used to monitor query execution. 

  • Monitor your workload using DMVs
  • sys.dm_exec_sessions
  • sys.dm_pdw_exec_requests
  • sys.dm_pdw_request_steps
  • sys.dm_pdw_sql_requests
  • sys.dm_pdw_dms_workers
  • sys.dm_pdw_waits

Best practices for serverless SQL pool in Azure Synapse Analytics

Use appropriate data types

Filter optimization

Optimize repeating queries

Use CETAS to enhance query performance and joins

Query Azure data

Query CSV data

Query Parquet data

Query Delta Lake

Query Azure Cosmos DB data

Query JSON data

Create views, tables, and other database objects

Copy and transform data (CETAS)

Distribution Advisor in Azure Synapse SQL


Performance tuning with materialized views using dedicated SQL pool in Azure Synapse Analytics

A standard view computes its data each time when the view is used. There’s no data stored on disk. 

A materialized view pre-computes, stores, and maintains its data in dedicated SQL pool just like a table.

Performance tuning with ordered clustered columnstore index

  • By default, for each table created without an index option, an internal component (index builder) creates a non-ordered clustered columnstore index (CCI) on it. 
  • Ordered clustered columnstore indexes by enabling efficient segment elimination, resulting in much faster performance by skipping large amounts of ordered data that don’t match the query predicate.
  • When creating an ordered CCI, the dedicated SQL pool engine sorts the existing data in memory by the order key(s) before the index builder compresses them into index segments. 

Query performance

Queries with all these patterns typically run faster with ordered CCI.

  1. The queries have equality, inequality, or range predicates
  2. The predicate columns and the ordered CCI columns are the same.

Data loading performance

The performance of data loading into an ordered CCI table is similar to a partitioned table. Loading data into an ordered CCI table can take longer than a non-ordered CCI table because of the data sorting operation

Reduce segment overlapping

The number of overlapping segments depends on the size of data to sort, the available memory, and the maximum degree of parallelism (MAXDOP) setting during ordered CCI creation.

Inline Feedbacks
View all comments
Would love your thoughts, please comment.x