Select Distinct and XML data

As you know, the DISTINCT key word can be used to eliminate duplicate rows from the result set of a query.  

For example:

If you run this query against the AdventureWorks2014 database you will get 19614 rows with many duplicates.

SELECT City
FROM [AdventureWorks2014].[Person].[Address]

As you can see in the above results set there are duplicates.  Albany appears  4 times. However, my goal is to simply see what cities the database has addresses in.

I can address this by using the DISTINCT keyword.  The code below show that you simply add it just after the SELECT keyword.  Here is a link for more information on DISTINCT.

SELECT DISTINCT City
FROM [AdventureWorks2014].[Person].[Address]
ORDER BY City

The result set below shows that Albany only appears one time, duplicates have been removed.

Well, what if the result set contains XML data?  Will the DISTINCT work the same way?  The answer is no.  

Since I was working on an example for my next performance class, I decided to use an example from that session.  In the query below the qp.* part of the column list will return a column named query_plan which is an XML data type.

SELECT DISTINCT pa.,cp.,qp.*
FROM
(SELECT plan_handle
FROM sys.dm_exec_cached_plans) cp
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle ) qp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) As pa
WHERE is_cache_key = 1 AND pa.attribute = ‘set_options’

However, when I add the DISTINCT keyword I get the error below.

So what do I do about.  You can simply use the CONVERT function to convert that datatype from XML to VARCHAR(MAX) like below.

SELECT Distinct qp.dbid
, qp.encrypted
, CONVERT(VARCHAR(MAX),qp.query_plan)
FROM
(SELECT plan_handle
FROM sys.dm_exec_cached_plans) cp
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle ) qp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) As pa
WHERE is_cache_key = 1 AND pa.attribute = ‘set_options’


Just a disclaimer, I didn’t analyze this for performance because the class is about an unrelated topic. If you plan on using this in production code, please assess the impact on performance.

I will put this into the “You learn something new everyday” category!   As always, it is always fun to learn and I hope you did as well.  Thank you for visiting my blog.