SQL Server Statistics Information – How do I Get It

 

When doing maintenance on a SQL Server there are number of things that must be completed.  These include backups, index optimization, DBCC Checkdb and update statistics.  This post really isn’t about how to update statistics, it is more about how do I gather information about statistics.  This information is important when troubleshooting a performance issue as well as a way to check to see if your statistics update strategy is effective in updating the statistics properly.  Remember, if statistics are not updated properly or not created at all, this could lead to performance issues.

There are a few ways to gather statistics information, but before getting into those, let’s go over that information about statistics is important.  This is not all the items, but they are the key ones.

Important Statistics Information

  1. What statistics exist
  2. Date they were last updated
  3. Is there a filter?
  4. Number of rows and the number of rows used for the sample
  5. How many modifications since the last statistics update

These are all important in one way or another.  Here is why each of these is important to look at.

What Statistics Exist – This is important to know because sometimes what statistics we think exist may not actually exist.  If the “Auto Create Statistics” database setting is turned off, then of course statistics would not be created other than manually or by creating an index.

Date They were last updates – This will help us understand if the statistics are being updated appropriately and as expected.

Is there a filter? – This will help us understand if the statistics are for the all the rows of the table or just a subset.

Number of Rows and the number of Rows used for the Sample – We need to review this to see if SQL Server is using an appropriate number of rows to update the statistics.

Number of Modifications since the last Statistics update – Understanding how frequently the column is modified will help us determine if the statistics update is happening appropriately.  It is important to keep this in perspective.  You can’t just look at this number, it is also important to look at the date last updated and the number of rows.  If there are a high number of modifications but the statistics were last updated six months ago, there may not be an issue.  Although I might want to look to see why the statistics haven’t been updated for 6 months.

How do I get this information

There are a couple of different ways to get this information.

SQL Server Management Studio

You can simply right click on the statistics object you want to review and click Properties. This will open up the properties window.

On the first screen you will see this.  You will be able to identify the date of the last update. It is highlighted below.

The number of rows and rows used in the sample can be found in the “Details” tab, while the filter information can be found by clicking the “Filter” tab on the left.

What cannot be found by using the Properties window is the number of modifications since the last statistics update.  This is kind of limiting.

There are two ways to get all the information above.  One is to use DBCC SHOW_STATISTICS the other is to query the SYS.STATS table.

DBCC SHOW_STATISTICS

This statement will return all the above information in three data sets, the header, the density vector and the histogram.  The syntax is below.  We are not going to get into all the syntax, just the basics.  If you want more information you can go here.

This is the basic syntax.  The statement expects and table or view name and a statistics object name.  The output is below.

DBCC SHOW_STATISTICS(‘person.person’, [IX_Person_LastName_FirstName_MiddleName])

The three sessions, Header, Density Vector and Histogram are identified by the arrows. While the green box labels that date last updated, red is the number of rows and rows used as a sample. The final box is the blue box on the left, this identifies if there is a filer or not.  Notice that to the left of that there is a column named “Filter Expression”.  This will identify the expression used for the filter if one exists.

You can only return one of the sections by run one of the code snippets below.

DBCC SHOW_STATISTICS(‘production.product’,ReorderPoint)WITH STAT_HEADER

DBCC SHOW_STATISTICS(‘production.product’,ReorderPoint)WITH DENSITY_VECTOR

DBCC SHOW_STATISTICS(‘person.person’,[IX_Person_LastName_FirstName_MiddleName]) WITH HISTOGRAM

SYS.Stats

This is from Microsoft: “Contains a row for each statistics object that exists for the tables, indexes, and indexed views in the database“. In reality this really needs to be used with a few other statistics related objects.  These related objects are sys.stats_columns and sys.dm_db_stats_properties. Below is the code that can be used to gather all the information and then some.  This code can be found here.

I just took the code that was provided by Microsoft and added the sysobjects table to allow for the data set to include the column name.

Notice that the dm_db_stats_properties obect is a function that has two parameters, objectID and StatsID.

Below are the results of the above statement.  Notice that it includes all the items mentioned at the beginning of this post.

 

All the above methods return much of the information to review statistics.  What I like about the last option, I can see all the statistics in one data set.

Here are a couple of good links to learn more about SQL Server Statistics.

Microsoft

Kendra Little

dm_db_stats_properties

Thanks for visiting and I hope you learned something!

How can I tell if Statistics are created by the system

Column statistics are used by the optimizer to help create the best possible execution plan for a query.  There are a few ways these statistics get created.  The first method is when an index is created, which are updated when the index is optimized. The second method of creating statistics is by using the CREATE STATISTICS statement.  This statement allows the DBA to create statistics on a column that doesn’t have an index on it and allows the creator to provide a name of the statistics object.  The last method is to let SQL Server create the statistics when the column is read.

Using the AdventureWorks2014 database, if you run the code below you will see that there are not any statistics on the MakeFlag column in the Production.Project table.

SELECT OBJECT_NAME(object_id) AS [ObjectName]
,[name] AS [StatisticName]
,STATS_DATE([object_id], [stats_id]) AS [StatisticUpdateDate]
FROM sys.stats
WHERE OBJECT_NAME(object_id) = ‘Product’

I get the result set below.  But in looking at it, how can I tell that there really isn’t statistics on the MakeFlag column?

You can run the statement below that will list all the columns that have statistics.  This statement can be found at here on Microsoft’s web site.

SELECT s.name AS statistics_name
,c.name AS column_name
,sc.stats_column_id
FROM sys.stats AS s
INNER JOIN sys.stats_columns AS sc
ON s.object_id = sc.object_id AND s.stats_id = sc.stats_id
INNER JOIN sys.columns AS c
ON sc.object_id = c.object_id AND c.column_id = sc.column_id
WHERE s.object_id = OBJECT_ID(‘Production.product’);

Now that I have been able to check the columns, what are the objects that have the funny names?  Those are system generated statistics.  Let’s break down the name.

Using this name as an example: _WA_Sys_00000007_75A278F5

The WA really just stands for Washington…of course we know where that came from

The sys just means system

00000007 means it is the 7th column in the table. in this case the SafetyStockLevel column.

Below is a list of the column in the Production.Product table.  Notice that the SafetyStockLevel is the seventh column.

But what about this statistics object, _WA_Sys_0000000A_75A278F5.  Where is column A?  The means it is the 10th column as indicated below.  The A means that it is a hexadecimal value.

That takes care of the first three parts of the name.  The last item, 75A278F5 is a hexadecimal number of the object ID.  The image below you can see that the object ID is the same as 75A278F5 converted to a decimal number.


Now that we have broken down the name, let’s talk about how these actually get created.

If you run the statement below, it will create the statistics object on the MakeFlag column. Before showing you the results, let’s predict the name.

SELECT Name
, MakeFlag
FROM [Production].[Product]
WHERE Makeflag = 2

We know it will start with _WA_SYS_.  Since MakeFlag is the 4th column, we know the next part of the name is going to be 00000004.  This just leaves the last part, the hexadecimal version of the object id.  Using a converter at https://www.rapidtables.com/convert/number/decimal-to-hex.html.  we can take the object ID and convert it to 75A278F5.

Put all the parts together and you get _WA_Sys_00000004_75A278F5 as confirmed by the screenshot below.

After all that, the easiest way to find the system generated statistics is the run this statement:

SELECT *
FROM sys.stats

This statement will return a column named, auto_created.  This is another method of finding the system generated statistics.

Hope this helps clear up the names of system generate statistics.  Thanks for stopping by and hope you learned a little something.