Lessons Learned From Three Months of Blogging

As the three month anniversary of my first blog post approaches, I thought I would share my thoughts on what I have learned so far.

Here is a list of what I have learned.

  • Have someone read your post before publishing
  • Blog about what is important or of interest to you
  • It is a great way to learn about SQL Server
  • Double check or even triple check your facts and code
  • Blogging isn’t a competition
  • Blogging can be easy or difficult
  • Be yourself
  • Have fun

Have someone read your post before publishing

I have found that this is a great idea.  There have been several times in which I have read my post the next day and found something in my post the I felt needed to be fixed.

These items include a number of areas including the following.

  1. Typos – this sadly happened far more than I would like to admit and included misspelled words and grammatical errors.
  2. Something missing – this is when I mention something in the post that I felt required some prerequisite knowledge that I didn’t mention earlier in the post. Sometimes, this also included an important detail that I simply forgot to include in the post.
  3. Out of order – This happened several times and it happened when I inadvertently placed something in the post earlier than it should have been.  This usually happened when I moved a part of the post to a new location using copy and paste.
  4. Sounded funny – There are times when I wrote something and it sounded great in my head but when I read it after I published it, it just sounded funny or awkward

These are all things that can be found by having someone else read your post.  That person does really need to know SQL Server to be able to help in many of these areas.  When something did slip through and I found it the next day,  I just hoped that I would be able to fix it before anyone read it.

Blog about what is important or of interest to you

When I first decided to start my blog, I talked to several experienced bloggers about what I should blog about.  My first thought was to blog only about topics that really haven’t been blogged about before.  I felt that with so many great bloggers out there, I didn’t think I could do it better.  However, I had several people tell me that I was looking at it wrong.  They suggested that I should pick a topic that is of interest to me and blog about that.  Even though there are many great SQL Server blogs out there, they told me that I would present a topic in a way that no one else has and someone just might connect with the way I say it.  This doesn’t mean that I would say anything better than anyone else, just in a different way.  If even one person has benefited by what I have written, it was worth the effort.  My goal is to help people become better at working with SQL Server.

It is a great way to Learn about SQL Server

Even though I have been teaching SQL Server for almost 20 years, there is always something to learn.  When I started my blog, I decided to blog about SQL Server basics to work on my blogging skills.  However, I found that I learned a lot more than just how to blog.  As I wrote my first blog post on the order of processing of the commands in a SELECT statement, I quickly realized that it is a great idea to go back to the basics sometimes to refresh your memory. Of course when blogging about something new, you will find that there is so much to learn.  Every blog doesn’t need to be very long or cover a complex topic.  Every day there are now SQL Server professionals who want to get better and learn more.

Double or triple check your facts and sample code

For years I found myself frustrated from time to time because of incorrect information or sample code that didn’t work.  This obviously requires knowledge of SQL Server to verify that your post is technically accurate.  Before releasing your post, my suggestion is to confirm the facts and run through your sample code to make sure it works as intended.

Blogging is not a Competition

The SQL Server community is a great community that is really about sharing and helping fellow data professionals.  If you attended PASS Summit, a SQL Saturday or a local user group you have seen this first hand.  The idea of sharing extends to blogging as well.  I believe that most, if not all of the blogs that exist are there to help others in the community.  Luckily there are many great blogs that I have used on a regular basis in my own career.  My goal isn’t to compete with any one, it is merely to put information out there and hopefully help as many people as I can.  As I said earlier, if I can help just one person, it is worth the time and effort.

Blogging can be easy or difficult

The beauty of having my own blog is that I get to decide if my blog posts are easy or challenging.  Some days, I feel like posting something that is more basic and others days I want to go down the more challenging route.  It is my choice.  What I have found for me is that I like to alternate between basic and more challenging.  This allows me to post a bit more frequently. Since there are always new people in the SQL Server community, there is always a need for the most basic topics.  You just need to decide what is best for you to blog about.  Keep in mind, there are always new people entering the SQL Server world and they could benefit from even the most basic blog posts.

Be Yourself

I feel that this is really important.  I am asked quite frequently for advice on teaching.  The first thing is always say is to be yourself.  That advice extends to blogs as well. Your blog should represent who you are.  If you are a bit more on the serious side, then let you blog show that.  However, if you consider yourself to be somewhat of a comedian, then let your blog show that as well.

Have Fun!!

I have found that I really do enjoy blogging.  The thought of helping is what makes it fun.  I have been lucky enough to have had the opportunity to teach SQL Server for almost two decades and blogging is just another method of teaching!!  I really look forward to what learning opportunities future posts will offer!!!

How to Start a Blog

Hopefully I have given some good advice on what to do once you have your blog up and running.  However, you may be asking yourself how do I start a blog.  Rather than reinvent the wheel, here is a blog post by Brent Ozar that gives some really good advice on how to start a blog. It is a number of years old, but I think still applies today.  How to Start a Blog

Some of my Favorite Blogs

These are just a few that I have used over the years.  The one that I almost always find myself at is Pinal Dave’s blog.  His was one of the blogs I consistently used when I first started my career and still use to this day.  If it weren’t for the many blogs that are out there, my career would not be where it is, for this I am grateful to the many folks that spend time sharing with us!!

Although I have learned a lot about how to blog, I still have so much more to learn!

 

 

UNION vs UNION ALL

Before we get into which is better to use, let’s talk about some UNION basics.  First of all, both UNION and UNION ALL are set based operations and the purpose of both of these statements is to combine two distinctly different result sets into a single set.  Let’s say I wanted to send a invitation to both employees and customers for an event, UNION or UNION ALL would work perfectly to allow me to get the list I am looking for.

Requirements

The requirements for both UNION and UNION ALL are pretty simple.  Both queries must have the same number of columns and the columns must have similar data types.

Below is an example of a UNION query using the AdventureWorks2014 database.

SELECT Name
FROM Purchasing.Vendor
UNION
SELECT Firstname + ‘ ‘ + LastName
FROM person.person

Even though the second query is using two columns from the Person.Person table, they are being concatenated into a single column.

So what happens if the number of columns do not match?  You will get this error.

Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

This is a pretty easy error to fix, simply make sure the number of columns match.

Now on to the second requirement, the columns must be similar data types.  Notice that I didn’t say exact same data types.  If the data types are different, SQL server will use the data type precedence and do a convert to the same data types.  If you go here, you can see the data type precedence.  If you look at the image below, you will see that the BusinessEntityID is INT while the LastName is NVARCHAR.  According to the precedence list, INT is higher up the list than NVARCHAR so SQL Server will attempt to convert the LastName column data to INT resulting in this error.

Msg 245, Level 16, State 1, Line 8
Conversion failed when converting the nvarchar value ‘Abbas’ to data type int.

If you get this error, you will need to decide if you want to use the CONVERT or CAST functions to convert the column with the higher data type to the lower data type.  Obviously, the two columns mentioned is something you probably wouldn’t see in the real world, but I wanted to use it for demonstration purposes.

UNION vs UNION ALL

The differences between these two is pretty simple, when using just the UNION keyword, duplicates records will be eliminated while the UNION ALL will include any duplicates.

If you execute this statement against the AdventureWorks2014 database you will see that it will return 20,077 rows and will include duplicates.  If you run the same two queries with UNION instead of UNION ALL you will only get 19,622 rows.  Andrew Cencini will appear two times in the first query and only one time in the second.

Includes duplicates:

SELECT Name
FROM Purchasing.Vendor
UNION ALL
SELECT Firstname + ‘ ‘ + LastName
FROM person.person

Does not include duplicates:

SELECT Name
FROM Purchasing.Vendor
UNION
SELECT Firstname + ‘ ‘ + LastName
FROM person.person

How can I tell what query each row is coming from?   In the query below you will see the text in pink, this is simply a string with a column alias.  In this case the string is the name of the table the row comes from with an alias of Source.

SELECT Name, ‘Vendor’ AS ‘Source’
FROM Purchasing.Vendor
UNION ALL
SELECT Firstname + ‘ ‘ + LastName, ‘Person’ AS ‘Person’
FROM person.person

When doing this, the actual name of the column will come from the first query.  In the query above there are two aliases, Source in the first query and Person in the second.  The below image shows where the column name will come from.  Also notice the values in the column identify the source table for that row.

ORDER BY

Of course in order to guarantee that the result set is to use the ORDER BY keywords.  When using UNION or UNION ALL, the ORDER BY must be after the last query.  If you try to use it in an earlier query, you will get this error.

Msg 156, Level 15, State 1, Line 4
Incorrect syntax near the keyword ‘UNION’.

What about performance?

Generally the UNION ALL will be faster than the UNION.  Here are a few links to some great posts that give more information about the impact on performance when using these statements.

Pinal Dave

SQLPerformance.com

Thanks for visiting my blog and please contact me if you have any questions.

Anatomy of a SELECT statement – JOINS – Part 4

If your database is normalized, data points will only appear in the database one time.  For example, a customer name will only appear in one table,  customer address will appear in only one table and employee names will only appear in one table. This is a nice Wikipedia article on Normalization Well, what do you do if you have a request to create a report on sales that has a requirement of containing both the sales person name and the customer name.  Since the sales person name is in one table and the customer name is in another table, you will need to use the JOIN keyword. Given that, the purpose of a join is when there are columns in different tables that you want in the same result set.

There are a number of join types that can be used in SQL Server. In this post we will cover the logical join, what you using in your code.

Logical Join Types

INNER
OUTER
CROSS
FULL

Behind the scenes the query engine will use physical joins.  There are four of these, Nested Loop, Hash Match, Merge and starting in SQL Server 2017, Adaptive joins.  Although it is important to know these when working on query optimization, these will be covered in a future post.

Logical Join Requirements

Before getting into the types of joins, we should review the requirements of joins.

  • Column names do not need to match
  • Data types must be compatible
  • SELECT list does not need to include columns from every table
  • Although not an absolute requirement, it is highly recommended that you use table aliases when joining tables

Keep in mind, if the data types are not the same, the data will have to be converted.  This can be done explicitly using CAST or CONVERT, or it will be done with an implicit convert.

What columns To Join On

Typically the join will be on the Primary Key column in one table and a foreign key column in another.  However, there are times in which this will not be the case.

Table Alias

When using joins, it is important to give your tables an alias.  When use tables aliases, I like to have them be somewhat descriptive, like e for employee or s for sales. I also like to keep them the same length when ever possible and keep the to three or less characters.  These are just my preference and in no way is a requirement.

If you don’t use any aliases and the columns are not referenced properly in the SELECT statement and the column name you want to use exists on one or more tables in the FROM clause you will get an error.

In the statement below, you will see that the ProductID column does NOT use the alias as expected. Also notice the error below it, “Ambiguous column name ProductID”. This simply means that the column name exists in more than one table and SQL Server does not know which table to get the column from.

 

INNER Join

An INNER join would be used when you want to return data that has matching values in both tables in the join.

For example, you want to see all the products that have been sold. In order for a query to pull this data, there needs to be a matching value in both tables.  Using the AdventureWorks2014 database, the Product information is in the Production.Products table while the products sold is in the Sales.SalesOrderDetail table.  If you look at the image below, you will see that the two columns that we need to join on are both named ProductID.  However, as I stated before, that is not a requirement, but something you will see quite often.  The data types for both of these columns are also the same data type, INT.  However, that is also not a requirement, they just need to be similar data types.

 

To meet the requirements of the request of seeing all the products that have been sold, the ProductID must exist in both tables.  If product ID number 1 was sold at least one time, you will see it in the ProductID column in the SalesOrderDetail table.  The image below visualizes this.  The green circle is the Product table while the orange circle represents the SalesOrderDetail table.  The points where the two circles cross represents the values that can be found in both tables.

 

Using the code below, we can query the AdventureWorks2014 database to find all the products that have been sold at least one time.  What makes this work is that a ProductID will be returned ONLY if it appears in at least one order.

SELECT p.productid
, p.Name
, s.ProductID
, s.SalesOrderID
FROM production.product p
INNER JOIN [Sales].[SalesOrderDetail] s
ON p.ProductID = s.ProductID
ORDER BY p.ProductID

When you run the above query, will the get the results below.  Notice that the result set is ordered by ProductID in ascending order.  You will also notice that the ProductID 1 does not appear, despite existing in the Production.Product table.  This is because ProductID has not been sold.

This is confirmed by using the two queries below, one for the Product table and the other for the SalesOrderDetail table. Since this is an INNER JOIN, it is expected that productid number 1 would not be returned in the above result set.

Another small note about the INNER JOIN is that it is a the default join type.  In the statement below, you will notice that the INNER key word is no longer present, yet returns the exact same results set as above.

SELECT      p.productid
, p.Name
, s.ProductID
FROM production.product p
JOIN [Sales].[SalesOrderDetail] s
ON p.ProductID = s.ProductID
ORDER BY p.ProductID

OUTER JOIN

Lets change the requirements just a bit.  Instead of wanting to see all the products that have sold, we now want to know what products have NOT sold.  The OUTER join to the rescue!

An OUTER join will return all rows from the table on one side of the join and only matching values from the other table.

Starting with the LEFT OUTER JOIN, this join returns all rows from the table on the left and rows with a matching value on the left. We will use LEFT OUTER JOIN in this blog, however the OUTER key is technical optional, I just prefer to always use it.

SELECT p.productid
, p.Name
, s.ProductID
FROM production.product p
LEFT OUTER JOIN [Sales].[SalesOrderDetail] s
ON p.ProductID = s.ProductID

Below you will find the results of the above statement.  Notice that there are a number of rows that have NULL in the ProductID column from the SalesOrderDetail table.  This is what you will see when using an outer join and there is not a matching value in the table.  In this case, ProductID 530 does not have a matching value in the SalesOrderDetail table.

 

Of course if we switch this from a LEFT OUTER JOIN to a RIGHT OUTER JOIN SQL Server will now pull all the rows from the SalesOrderDetail table and the matching values from the Product table.

The code below is an example of a RIGHT OUTER JOIN.  All I did was change the word LEFT to RIGHT.  Now it will pull all the records from the SalesOrderDetail table and the matching values in the Product table.  One thing you will notice when you run this is that there not any NULL values in the p.productid column.  The reason for this is that referential integrity will not allow for a product that doesn’t exist in the Production.Product table to be entered into the table.

SELECT p.productid
, p.Name
, s.ProductID
FROM production.product p
RIGHT OUTER JOIN [Sales].[SalesOrderDetail] s
ON p.ProductID = s.ProductID

How are the LEFT and RIGHT tables determined

If you look at the two statements below, you will notice that the only difference is the last line.  In the first statement, the product table is listed first after the ON keyword, while in the second statement the SalesOrderDetail table is listed first.  Despite this change, both results had the exact same number of rows, 121,555.

SELECT p.productid
, p.Name
, s.ProductID
FROM production.product p
LEFT OUTER JOIN [Sales].[SalesOrderDetail] s
ON p.ProductID = s.ProductID

SELECT p.productid
, p.Name
, s.ProductID
FROM production.product p
LEFT OUTER JOIN [Sales].[SalesOrderDetail] s
ON s.ProductID = p.ProductID

However, if you switch the table order in the FROM clause, like in the statement below, you will see a big difference in the result set.

SELECT p.productid
, p.Name
, s.ProductID
FROM [Sales].[SalesOrderDetail] s
LEFT OUTER JOIN production.product p
ON p.ProductID = s.ProductID

This statement returns 121,317 rows.  This is every row from the SalesOrderDetail table and matching values in the Product table.  With some simple math, we can confirm this.  If you take the 121,555 and subtract the 212,317 you will get 238.  Using the code below, we will see that there are exactly 238 products that have not been sold. 

Now let’s get to the original question, what products have not been sold. In the example below we can simply add a WHERE clause and look for a any column from the right side table that is null.  The reason right side table, it because this is a left join.

SELECT p.productid
, p.Name
, s.ProductID
FROM production.product p
LEFT OUTER JOIN [Sales].[SalesOrderDetail] s
ON s.ProductID = p.ProductID
WHERE s.ProductID is null

When you run this you will see that there are 238 products that have not been sold.

Full Join

The code for this example can be found here: FullOuterJoin.

A full join is kind of a combination of the inner and both outer joins. Below you can see the two tables we will use for the example.

SELECT a.id
, a.name
, b.id
, b.name
FROM TableA a
FULL OUTER JOIN TableB b
ON A.id = b.id

If you run the above code you will see that it returns all rows that have matching values in each table, all the rows that are in TableA and not in TableB as well as the rows in Tableb that are not in TableA.

 

CROSS JOIN

A CROSS join is also known as a Cartesian product.  This type of join returns every possible combinations.

SELECT a.id
, b.id
FROM TableA a
CROSS JOIN TableB b

Below are the results.  The first ID column is from TableA while the second ID column is from TableB.  Notice that the results show each value from TableA and each value from TableB.  That is why you see 5 ones in the second column, there are 5 values in TableA.

Joining More than One Table

There are many times where you will need to join more than one table.  In this case we want to know the dates of each time a product was sold. In this case, we need to add the SalesOrderHeader table to our query. Although, there are three tables in the query, a join can only be between two table.

SELECT p.productid
, p.Name
, s.ProductID
, h.OrderDate
FROM [Sales].[SalesOrderDetail] s
INNER JOIN production.product p
ON p.ProductID = s.ProductID
INNER JOIN sales.SalesOrderHeader h
ON s.SalesOrderID = h.SalesOrderID

 

In a future post I will discuss the physical joins SQL Server will use to implement these logical joins.

Thank you for visiting my blog!!

 

 

 

 

 

Table Variable and Rollback

I am currently at PASS Summit in Seattle and learning a ton of new stuff.  There are a lot things that I have learned so far, but one thing really jumped out at me.  I attended Brent Ozar’s session on Cardinality Estimates and he showed that when you use table variables, the transaction does not get rolled back. That is what is great about SQL Server, there is always something to learn!!

If you run this code, you will see that when the select runs, it will actually not return any rows.

CREATE TABLE #test
(Name VARCHAR(100))

BEGIN TRAN
    INSERT #Test
(Name)
VALUES
(‘Dave’),
(‘Raf’)

ROLLBACK

SELECT * FROM #test

DROP TABLE #test

However, when you run this code, you will see that that rows are still in the variable even though the rollback command has completed.

DECLARE @TestV TABLE
(Name VARCHAR(100))

BEGIN TRAN
        INSERT @Testv
(Name)
VALUES
(‘Dave’),
(‘Raf’)

ROLLBACK

SELECT * FROM @TestV

This is another reason why table variables should be used with caution. Table variables do not get statistics on the columns which of course can be a performance killer!!!