UNION vs UNION ALL

Before we get into which is better to use, let’s talk about some UNION basics.  First of all, both UNION and UNION ALL are set based operations and the purpose of both of these statements is to combine two distinctly different result sets into a single set.  Let’s say I wanted to send a invitation to both employees and customers for an event, UNION or UNION ALL would work perfectly to allow me to get the list I am looking for.

Requirements

The requirements for both UNION and UNION ALL are pretty simple.  Both queries must have the same number of columns and the columns must have similar data types.

Below is an example of a UNION query using the AdventureWorks2014 database.

SELECT Name
FROM Purchasing.Vendor
UNION
SELECT Firstname + ‘ ‘ + LastName
FROM person.person

Even though the second query is using two columns from the Person.Person table, they are being concatenated into a single column.

So what happens if the number of columns do not match?  You will get this error.

Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

This is a pretty easy error to fix, simply make sure the number of columns match.

Now on to the second requirement, the columns must be similar data types.  Notice that I didn’t say exact same data types.  If the data types are different, SQL server will use the data type precedence and do a convert to the same data types.  If you go here, you can see the data type precedence.  If you look at the image below, you will see that the BusinessEntityID is INT while the LastName is NVARCHAR.  According to the precedence list, INT is higher up the list than NVARCHAR so SQL Server will attempt to convert the LastName column data to INT resulting in this error.

Msg 245, Level 16, State 1, Line 8
Conversion failed when converting the nvarchar value ‘Abbas’ to data type int.

If you get this error, you will need to decide if you want to use the CONVERT or CAST functions to convert the column with the higher data type to the lower data type.  Obviously, the two columns mentioned is something you probably wouldn’t see in the real world, but I wanted to use it for demonstration purposes.

UNION vs UNION ALL

The differences between these two is pretty simple, when using just the UNION keyword, duplicates records will be eliminated while the UNION ALL will include any duplicates.

If you execute this statement against the AdventureWorks2014 database you will see that it will return 20,077 rows and will include duplicates.  If you run the same two queries with UNION instead of UNION ALL you will only get 19,622 rows.  Andrew Cencini will appear two times in the first query and only one time in the second.

Includes duplicates:

SELECT Name
FROM Purchasing.Vendor
UNION ALL
SELECT Firstname + ‘ ‘ + LastName
FROM person.person

Does not include duplicates:

SELECT Name
FROM Purchasing.Vendor
UNION
SELECT Firstname + ‘ ‘ + LastName
FROM person.person

How can I tell what query each row is coming from?   In the query below you will see the text in pink, this is simply a string with a column alias.  In this case the string is the name of the table the row comes from with an alias of Source.

SELECT Name, ‘Vendor’ AS ‘Source’
FROM Purchasing.Vendor
UNION ALL
SELECT Firstname + ‘ ‘ + LastName, ‘Person’ AS ‘Person’
FROM person.person

When doing this, the actual name of the column will come from the first query.  In the query above there are two aliases, Source in the first query and Person in the second.  The below image shows where the column name will come from.  Also notice the values in the column identify the source table for that row.

ORDER BY

Of course in order to guarantee that the result set is to use the ORDER BY keywords.  When using UNION or UNION ALL, the ORDER BY must be after the last query.  If you try to use it in an earlier query, you will get this error.

Msg 156, Level 15, State 1, Line 4
Incorrect syntax near the keyword ‘UNION’.

What about performance?

Generally the UNION ALL will be faster than the UNION.  Here are a few links to some great posts that give more information about the impact on performance when using these statements.

Pinal Dave

SQLPerformance.com

Thanks for visiting my blog and please contact me if you have any questions.