In Azure SQL Server, statistics are database objects that contain information about the distribution of data in one or more columns of a table or indexed view. These statistics provide valuable information to the query optimizer, helping it make informed decisions when generating query execution plans. Here are key points about statistics in Azure SQL Server:
- Purpose: Statistics are used by the query optimizer to estimate the number of rows that will be returned by a query and to determine the most efficient way to access data.
- Automatic Creation: Azure SQL Server can automatically create and maintain statistics for columns when needed. It does so based on the workload and data distribution in your database.
- Manual Creation: You can also create statistics manually using the CREATE STATISTICS statement. This allows you to control which columns have statistics and how they are maintained.
- Column-Level: Statistics can be created on one or more columns, and they provide information about the distribution of data in those columns. This is essential for the query optimizer to choose the right query plan.
- Histogram: Statistics include a histogram, which is a data distribution profile that divides the data into steps or ranges. The histogram helps the optimizer make decisions about query plan choices.
- Cardinality Estimation: Statistics are critical for cardinality estimation, which is the process of estimating the number of rows returned by a query. Accurate cardinality estimation leads to better query performance.
- Update and Maintenance: Statistics can become stale over time due to data changes, and they may need to be updated. SQL Server automatically recompiles and updates statistics when needed, but you can also manually update them.
- Indexes: Statistics are closely related to indexes. In many cases, indexes have associated statistics to help with query optimization.
- Query Performance: Well-maintained and up-to-date statistics are essential for optimal query performance. Poor statistics can lead to suboptimal query plans and slower execution.
Benefits for SQL Server Statistics
Statistics in Azure SQL Server provide several benefits, primarily related to query optimization and performance. Here are the key benefits of using statistics in Azure SQL Server:
- Improved Query Performance: Statistics help the query optimizer make informed decisions about the best execution plan for a query. With accurate statistics, the optimizer can choose the most efficient way to access and retrieve data, resulting in improved query performance.
- Accurate Cardinality Estimation: Statistics are essential for estimating the number of rows that will be returned by a query. Accurate cardinality estimation is critical for selecting the right query plan, as it ensures that SQL Server allocates the appropriate resources for query execution.
- Index Efficiency: Statistics are closely related to indexes. When an index is created, SQL Server often creates associated statistics. These statistics are used to determine whether using the index is the most efficient way to retrieve data, further enhancing query performance.
- Data Distribution Insights: Statistics provide insights into the distribution of data in one or more columns. This information is valuable for understanding the data’s characteristics and helps the optimizer make better choices.
- Histogram Data: Statistics include a histogram that divides data into steps or ranges. The histogram allows the optimizer to understand the distribution of values within a column, which is crucial for making effective query plans.
- Query Plan Optimization: By enabling the query optimizer to choose the optimal query plan, statistics reduce the chances of suboptimal execution plans that might lead to slow query performance.
- Automatic Maintenance: Azure SQL Server can automatically create and maintain statistics for you, reducing the need for manual intervention in many cases. This ensures that statistics stay up-to-date as the data changes.
- Manual Control: You have the option to create statistics manually, allowing you to control which columns have statistics and how they are maintained. This can be particularly useful for specific scenarios.
- Resource Efficiency: With better query plans, resources are used more efficiently. This can lead to reduced query execution time and lower resource consumption, which is crucial in cloud environments.
What happen if Azure SQL Server are not collected and updated
If statistics in Azure SQL Server are not collected and updated, it can lead to several performance-related issues and query optimization problems. Here’s what can happen when statistics are not maintained:
- Stale Statistics: Over time, statistics can become outdated as data in the database changes. Outdated statistics may no longer accurately represent the data distribution in the tables. This can lead to poor query performance as the query optimizer relies on inaccurate information to create execution plans.
- Suboptimal Query Plans: The query optimizer uses statistics to estimate the number of rows that will be returned by a query. If the statistics are stale or missing, the optimizer may generate suboptimal execution plans. These plans can lead to slower query performance and resource inefficiency.
- Resource Contention: Inaccurate statistics can cause query plans that consume more resources than necessary, leading to resource contention in a shared database environment. This can affect the performance of other queries running concurrently on the same server.
- Slow Query Performance: Queries that rely on outdated statistics may experience slow performance, resulting in delayed report generation, application response times, and overall database responsiveness.
- Increased Maintenance Overhead: In situations where statistics are not updated automatically, database administrators may need to manually maintain statistics. This can introduce additional administrative overhead and the potential for human error.
- Inconsistent Query Performance: With outdated statistics, query performance may become inconsistent. Sometimes queries may perform well, while other times they may perform poorly, depending on the specific data distribution.
- Difficulty in Troubleshooting: When queries underperform, troubleshooting can become challenging. Identifying that outdated statistics are the root cause may require in-depth analysis and monitoring.
- Resource Wastage: Suboptimal query plans may consume more resources than necessary, leading to wastage of computing resources in the cloud environment, potentially incurring higher costs.
Options to view SQL Server statistics
We can view SQL Server statistics for an existing object using both SSMS and the T-SQL method.
View SQL Server Statistics from SSMS
Connect to a SQL Server instance in SSMS and expand the particular database. Expand the objectand we can view all available statistics under the STATISTICS tab.
We can get details about any particular statistics as well. Right-click on the statistics and go to properties.
It opens the statistics properties and shows the statistics columns and last update date for the particular statistics.
Click on the Details, and it shows the distribution of values and the frequency of each distinct value occurrence (histogram) for the specified object.
View SQL Server Statistics from T-SQL
We can use DMV sys.dm_db_stats_properties to view the properties of statistics for a specified object in the current database.
Execute the following query to check the statistics for any tables
SELECT sp.stats_id,
name,
filter_definition,
last_updated,
rows,
rows_sampled,
steps,
unfiltered_rows,
modification_counter
FROM sys.stats AS stat
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
WHERE stat.object_id = OBJECT_ID('audit.env_conn_config');
- Stats_ID: It is the unique ID of the statistics object
- Name: It is the statistics name
- Last_updated: It is the date and time of the last statistics update
- Rows: It shows the total number of rows at the time of the last statistics update
- Rows_sampled: It gives the total number of sample rows for the statistics
- Unfiltered_rows: In the screenshot, you can see both rows_sampled and unfiltered_rows value the same because we did not use any filter in the statistics
- Modification_counter: It is a vital column to look. We get the total number of modifications since the last statistics update
Different methods to perform SQL Server update Statistics
SQL Server provides different methods at the database level to update SQL Server Statistics.
Right-click on the database and go to properties. In the database properties, we can view statistics options under the Automatic tab.
Auto Create Statistics
In SQL Server, statistics are critical for optimizing query performance, and SQL Server provides an automatic mechanism for creating and updating statistics. Automatic statistics creation is enabled by default, but it’s essential to understand how it works and when it occurs.
- Auto Create Statistics name starts with _WA
Below Query to identify statistics auto-created by SQL Server.
SELECT sp.stats_id,
name,
filter_definition,
last_updated,
rows,
rows_sampled,
steps,
unfiltered_rows,
modification_counter
FROM sys.stats AS stat
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
WHERE stat.object_id = OBJECT_ID('audit.env_conn_config')
and name like '_WA%';
Auto-created SQL Server statistics are always single column statistics
SQL Server creates auto statistics for the columns that do not have a histogram for the existing statistics object
Auto Create Incremental Statistics
Auto Create Incremental Statistics is a feature in SQL Server that was introduced to improve query performance by reducing the overhead of updating and maintaining statistics for large tables. It’s available in SQL Server 2014 and later versions. Incremental statistics are a subset of statistics that are updated incrementally rather than entirely recomputed when data changes. This can be particularly beneficial for large tables with minimal changes between updates.
Key points about Auto Create Incremental Statistics:
- Reduced Overhead: Traditional statistics updates involve recomputing statistics for the entire table, which can be resource-intensive for large tables. With Auto Create Incremental Statistics, only the affected partitions of a table are updated when data changes occur, reducing the computational overhead.
- Partitioned Tables: Auto Create Incremental Statistics is typically used with partitioned tables. When a table is partitioned, it’s divided into smaller, manageable segments called partitions. Each partition may represent a distinct range of data (e.g., a date range), and data changes are often limited to specific partitions.
- Efficient for Large Tables: Incremental statistics are especially efficient for very large tables where the majority of data remains unchanged. Instead of updating statistics for the entire table, only the partitions with data changes are updated.
- Automatic Creation: When you enable Auto Create Incremental Statistics for a partitioned table, SQL Server automatically manages the creation and maintenance of incremental statistics.
- Query Performance: By having up-to-date statistics that are only updated where necessary, query performance can be improved. The query optimizer can make more accurate decisions about query plans based on the latest statistics.
- Manual Control: While Auto Create Incremental Statistics is automated, database administrators can also manually create and manage incremental statistics using T-SQL if needed.
To enable Auto Create Incremental Statistics for a partitioned table, you can use the following T-SQL statement:
ALTER TABLE YourTableName
ENABLE AUTO_CREATE_INCREMENTAL_STATISTICS;
Auto Update Statistics
“Auto Update Statistics” is a feature in SQL Server that helps maintain the accuracy and relevance of query optimization by automatically updating statistics when certain conditions are met. Statistics in SQL Server are metadata that describe the distribution of data in tables and indexes. These statistics are crucial for the query optimizer to make informed decisions about the most efficient query execution plans.
Key points about Auto Update Statistics:
- Automatic Process: When Auto Update Statistics is enabled for a database, SQL Server monitors the number of data modifications (inserts, updates, deletes) that occur in a table or index. If the number of changes crosses a predefined threshold, SQL Server automatically triggers an update of the statistics for that table or index.
- Threshold for Updates: The threshold for automatic statistics updates is determined by SQL Server based on the number of rows in the table. It’s designed to ensure that statistics are updated when they become out-of-date due to data changes. The exact threshold can vary based on the SQL Server version and configuration.
- Performance Maintenance: Automatic statistics updates help ensure that query optimization remains accurate and efficient. Out-of-date statistics can lead to suboptimal query plans and poor performance. By keeping statistics up-to-date, SQL Server can make better decisions about query execution plans.
- Database-Level Setting: Auto Update Statistics is enabled or disabled at the database level. It can be set as the default behavior for all tables and indexes within the database.
- Manual Update Control: While automatic updates are beneficial in most cases, database administrators can also manually update statistics using the
UPDATE STATISTICS
statement to have more control over when and how statistics are updated. - Asynchronous Process: Automatic statistics updates occur asynchronously, meaning they don’t interrupt the execution of queries. Updates happen in the background, helping to minimize the impact on concurrent database operations.
To enable or disable Auto Update Statistics for a database, you can use the following commands:
To enable Auto Update Statistics:
ALTER DATABASE YourDatabaseName SET AUTO_UPDATE_STATISTICS ON;
To disable Auto Update Statistics:
ALTER DATABASE YourDatabaseName SET AUTO_UPDATE_STATISTICS OFF;
Auto Update Statistics Asynchronously
SQL Server uses synchronous mode to update the statistics. If query optimizer finds out of date statistics, it updates the SQL Server Statistics first and then prepares the execution plan according to the recently updated statistics.
If we enable the Auto Update Statistics Asynchronously, SQL Server does not wait to update the statistics. Instead, it executes the query with existing statistics and initiates update statistics requests’ in parallel. The next executed query takes the benefit of the updated statistics. Since SQL Server does not wait for the updated statistics, we also call it Asynchronous mode statistics.
Manually Update Statistics
We can also manually update the statistics for improving the query execution plan and performance on a requirement basis. We can use the UPDATE STATISTICS or Sp_Update stored procedure to update SQL Server statistics.
Example 1: SQL Server UPDATE STATISTICS for all statistics in an object
Execute the following query to update SQL Server Statistics on HumanResources.Employee table.
Update STATISTICS HumanResources.Employee
Example 2: SQL Server UPDATE STATISTICS for specific statistics
update SQL Server statistics for statistics IX_Employee_OrganizationNode. Execute the following code.
Update STATISTICS HumanResources.Employee IX_Employee_OrganizationNode
It updates only specific statistics. In the following screenshot, we can verify this.
Example 3: SQL Server UPDATE STATISTICS with FULL Scan
We use FULL SCAN in the UPDATE STATISTICS to scan all rows of a table. In the previous examples, we did not specify the FULL SCAN parameter. Therefore, SQL Server automatically decides whether it requires FULL SCAN or not.
The following query does a full scan and updates the statistics for specific statistics in the specified object.
Update STATISTICS HumanResources.Employee IX_Employee_OrganizationNode WITH FULLSCAN
We can use the WITH SAMPLE 100 PERCENT clause instead of WITH FULLSCAN and both returns the same result.
Update STATISTICS HumanResources.Employee IX_Employee_OrganizationNode WITH SAMPLE 100 PERCENT
Example 4: UPDATE STATISTICS with SAMPLE
We can use WITH SAMPLE CLAUSE to specify the percentage or number of rows for the query optimizer to update statistics.
Update STATISTICS HumanResources.Employee IX_Employee_OrganizationNode WITH SAMPLE 10 PERCEN
We can use the following queries to do the task for all statistics on a specified object.
Update STATISTICS HumanResources.Employee WITH FULLSCAN, COLUMNS
Note:
- We should not specify 0 PERCENT or 0 Rows to update the statistics because it just updates the statistics object, but it does not contain statistics data
- We cannot use FULL SCAN and SAMPLE together
- We should use SAMPLE under specific requirements only. We might take less sample size, and it might not be suitable for the query optimizer to choose the appropriate plan
- We should not disable the auto-update statistics even we are regularly updating the SQL Server Statistics. Auto Update Statistics allows SQL Server to automatically update stats according to the predefined threshold
- Updating statistics with FULL SCAN might take longer for an extensive data set object. We should plan it and do it off business hours
Updating All Statistics with sp_updatestats
We can update use sp_updatestats to update all statistics in the database.
If the update is not required, we get the following message.
0 index(es)/statistic(s) have been updated, 1 did not require update.
If it updates the statistics, we get the following message.
1 index(es)/statistic(s) have been updated.
exec sp_updatestats
Conclusion
In conclusion, statistics in SQL Server play a vital role in query optimization and performance tuning. They provide the query optimizer with valuable information about the distribution of data in tables and indexes. Here are the key takeaways about statistics in SQL Server:
- Importance of Statistics: Statistics are critical for the query optimizer to generate efficient execution plans. They help the optimizer make informed decisions about join methods, index usage, and data access paths.
- Types of Statistics: SQL Server maintains two primary types of statistics: column statistics and index statistics. Column statistics describe the distribution of values in a single column, while index statistics are generated for index keys.
- Automatic Statistics Management: SQL Server provides mechanisms for automatic statistics creation and updates. These mechanisms ensure that statistics are up-to-date and accurate, minimizing the need for manual intervention.
- Auto Create Statistics: Automatically creates statistics for columns that are not part of an index. This helps the query optimizer make better decisions for queries involving non-indexed columns.
- Auto Update Statistics: Automatically updates statistics when data changes occur and the statistics become out-of-date. This ensures that the query optimizer has the latest information for query optimization.
- Incremental Statistics: Incremental statistics, introduced in later SQL Server versions, update only the affected partitions of a table when data changes occur. This feature is useful for large partitioned tables.
- Manual Statistics Control: Database administrators can manually create, update, or delete statistics using T-SQL statements. This allows for fine-tuned control in specific scenarios.
- Statistics Maintenance: Regularly maintaining statistics is essential to ensure optimal query performance. Stale or missing statistics can lead to suboptimal query plans and decreased performance.
- Balance Between Automation and Manual Control: While automatic statistics management is suitable for most databases, there are cases where manual control is necessary to address specific performance issues or fine-tune optimization.