Anatomy of a SELECT statement part 6 – the WHERE clause

After a brief break, ok..maybe not so brief, this post continues the series of the Anatomy of a Select statement.   This post is about the WHERE clause.  The WHERE clause will provide a filter to limit the rows returned to only the rows that are needed.  As we should always do, it is important to only pull the data that is needed.  The WHERE clause is a key part of that effort.  We use the SELECT to limit the columns, while the WHERE is one of the statements that can limit the number of rows.  The sample code will be at the end of this blog post.

The first thing we should look at is exactly where does the WHERE go?  As indicated below, it follows immediately after the FROM.  Since the FROM is the first part processed, the WHERE is the second clause processed.

SELECT …..

FROM ……

WHERE….

The Expression

When utilizing the WHERE clause, you will also need to create an expression that will be used as a comparison for limiting the rows returned.  How this expression looks will depend on the data type you are comparing.  Although there are many data types, you will see a few fare more frequently then others. These will include the string based data types like VARCHAR and CHAR,  the DateTime related data types and numeric data types.

This expression can be a number of things.  First of all it can be a simple value you are attempting to compare. In the statement below you will see after the WHERE clause, there is a column name, a comparison operator and a comparison value.

Comparing to a Value

In the statement below we are comparing the SalesOrderID column values to the number 4.  The expression must evaluate to TRUE or FALSE. If it evaluates to TRUE, those records are returned. Of course the means that if the expression evaluates to FLASE that row will not be included in the result set.

FROM …..

WHERE SalesorderID = 4

When comparing a character based data type such as VARCHAR, you of course must include the single quotes. We will talk about wild card searches later in this post.

FROM ……

WHERE LastName = ‘Smith’

In this case the column name is LastName, the comparison operator is the equal sign(=) and the value to be compared is ‘Smith’.

Comparison Operator Options

We have a number of options when deciding on what comparison operator to use. This list includes, but not limited to, the list below.

Equal sign ( = )

Minus sign ( – )

Plus sign ( + )

Not Equal ( <> )

Greater Than ( > )

Less Than ( < )

IN

NOT IN

More than one Comparison

Many times we will have to complete more than one comparison to capture the rows we are looking for.  When you look below you will see that this WHERE clause is looking for anyone that has the LastName of Smith and the StateOfBirth is WI.

WHERE LastName = ‘Smith’

AND StateOfBirth = ‘WI’

You can also use the keyword OR rather than AND. In the statement below, All the records with a last name of Smith will be returned.  In addition, all records that have the StateOfBirth of WI.

SELECT ….

FROM …..

WHERE LastName = ‘Smith’

OR StateOfBirth = ‘WI’

An example would be if I have a last name of Smith, but I was born in New York, I would be returned in the results because I have the last name of Smith, even if the StateOfBirth is Ohio.

Using a List

The WHERE clause also supports the use of the IN keyword.  This can be used if there a multiple values in the date set that I would like to see.

SELECT …..

FROM …..

WHERE LastName IN (‘Smith’, ‘Jackson’, ‘Denault’)

In this case, all records that have the last name of Smith, Jackson or Denault will be returned in the result set. As you can see, this also works with numeric data types.

SELECT …..

FROM …..

WHERE EmployeeID IN (1, 2, 3, 4)

The above statement can also be written using Less Than or Equal to comparison operator.

SELECT …..

FROM …..

WHERE EmployeeID <= 4

Using the NOT Keyword or <>

Just as you can search for values that are equal to something, you can also search for something that is not equal to.  How you do this depends on if you are excluding a single value or a list of values.

If you are comparing a single value, you simply use this, <>.

SELECT …..

FROM …..

WHERE EmployeeID <> 4

This also works for character based data types as well.  The statement below will return all records that have a LastName of something other than Smith.

SELECT ….

FROM …..

WHERE LastName <> ‘Smith’

What Else Can be Used the WHERE

As mentioned before there are a number of ways to pull data from a table and limit the number of rows using the WHERE clause.  There is another option I would like to mention.  Although it may not be the best option, using a sub-query is indeed an option.

A sub-query is nothing more that a query in the SELECT statement.

SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID IN (SELECT ProductID
FROM Production.Product
WHERE MakeFlag = 1)

If you look at the above statement, the sub-query is in green.  This will work just fine.  If the sub-query returns more than on row, it will still work because of the use of the IN keyword.

A sub-query is nothing more that a query in the SELECT statement.

SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID IN (SELECT ProductID
FROM Production.Product
WHERE MakeFlag = 1)

However, if I were to change the IN keyword to an equal sign, it will fail. You will see this error.

Msg 512, Level 16, State 1, Line 45

Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

The reason it is failing is because the equal sign expects just a single value, while we passed in a number of values.

Wildcard Searches

Sometimes when we are searching for something, we don’t know enough details about the value we are looking for.  This is where the wildcard searches come in to play.

There are a number of options where using a wildcard search, we will just focus on the use of the % character.  This the same as searching with an * to search for a file, like *.docx.

The statement below will return any state that starts with the word New.

WHERE State LIKE ‘New%’

This statement will return anything that starts with New and ends with the letter K.

WHERE State LIKE New%k’

The code below will return any state that has the work new in it.  Notice that the percent sign is also at the front of the value.  While all wildcard searches may not be the best performing, when placing the % at the beginning of the value, this will force a scan rather than a seek.  This is a potential performance problem and should be used with caution.

WHERE State LIKE ‘%New%’

Notice that these statements use the LIKE keyword.  If you use a wildcard search with the %, you must use the key word LIKE.  If you do not, SQL Server will process the string of ‘%new%’ as a literal and look for the word new and having the % both at the beginning and at the end.  It will not be processed as a wildcard.

Examples Below

These examples will work with the AdventureWorks2014 sample database.  Although it could work on a newer version of the database, I just haven’t tested it.

SELECT *
FROM Production.Product
WHERE Productid = 4

SELECT *
FROM Production.PRODUCT
WHERE Name = ‘Blade’

SELECT *
FROM Production.Product
WHERE MakeFlag = 1
AND FinishedGoodsFlag = 1

SELECT *
FROM Production.Product
WHERE MakeFlag = 1
OR FinishedGoodsFlag = 0

SELECT *
FROM Production.Product
WHERE Name IN (‘HL Road Frame – Black, 58’, ‘Headset Ball Bearings’)

SELECT *
FROM Production.Product
WHERE ProductID IN (1, 2, 3, 4)

SELECT *
FROM Production.Product
WHERE ProductID <= 4

SELECT *
FROM Production.Product
WHERE ProductID <> 4

SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID IN (SELECT ProductID
FROM Production.Product
WHERE MakeFlag = 1)

SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = (SELECT ProductID
FROM Production.Product
WHERE MakeFlag = 1)

–This well return all records have a Name starting with Blade.

SELECT *
FROM Production.PRODUCT
WHERE Name LIKE ‘Blade%’

–This well return all records have the word name in the product name somewhere.

SELECT *
FROM Production.PRODUCT
WHERE Name LIKE ‘%Blade%’

–This well return all records have the a name that starts with the letter A and ends with the letter r.

SELECT *
FROM Production.PRODUCT
WHERE Name LIKE ‘A%r’

This may conclude my series on the Anatomy of a Select statement.  During this series of posts, I covered the processing order, SELECT, FROM, WHERE, GROUP BY\HAVING and ORDER BY.

Thanks for visiting my blog..I really hope you learned something as you read it!

 

 

 

 

 

 


 

 

 

 

Anatomy of a SELECT Statement Part 5 – GROUP BY and HAVING

After a short break from this series, I was thinking it was time to add another post to the Anatomy of the SELECT statement series. This post will focus on the GROUP BY and the HAVING clauses.  These two statements can be used together, although when using the GROUP BY a HAVING is not required.  So far in the series we have covered the SELECT, FROM, JOINS clauses as well as the order of processing.  We have also covered the PIVOT, which follows the FROM clause.

The GROUP BY can be used to get an aggregation for a group. For example it can be used to get the number of times each product has been sold.  As with most of my other posts, all examples in this post will use the AdventureWorks2014 sample database.

SELECT p.ProductID, p.name, COUNT(p.name) AS ‘NumberOfTimesSold’
FROM Production.product p
INNER JOIN Sales.SalesOrderDetail s
ON p.ProductID = s.ProductID
GROUP BY p.ProductID, p.name

In the above statement, notice the use of an aggregate function.  This is really what the GROUP BY is about.  The GROUP BY and the use of the COUNT() aggregate function work together to get the results I am seeking, the number of times each product has been sold.

Requirements of the GROUP BY

As with all other statements and clauses in TSQL, there are requirements that must be followed.  Below are the basic requirements of the GROUP BY.

  • Cannot use a column from an indexed view
  • The column used in the aggregate function does not need to appear in the SELECT list
  • Cannot group on the following data types
    • XML
    • TEXT
    • NTEXT
    • IMAGE
  • Can group on XML if the column is using in a function that outputs an acceptable datatype
  • All columns must either be in the GROUP BY or in an aggregate function

Using the GROUP BY

Now that we have covered a few of the requirements. let’s see how we can use the GROUP BY.

Using the same code as in the example above, let’s dig a little deeper.  

SELECT p.ProductID
, p.name
, p.Color
, COUNT(s.ProductID) AS ‘NumberOfTimesSold’
FROM Production.product p
INNER JOIN Sales.SalesOrderDetail s
ON p.ProductID = s.ProductID
GROUP BY p.ProductID, p.name, p.color

As you can see, the GROUP BY will follow the FROM clause and any joins.  After the GROUP BY you will simply list the columns you would like to have each group be made of.  The order of the columns listed in the GROUP BY does not matter and will not change the results or the column order in the results set, that is determined by the SELECT list. You can list them in any order you like, although I prefer to list them in the same order as they appear in the SELECT list.  Again, that is just a preference and not a requirement of TSQL.

In the example, the groups will consist of the Productid, name and color columns.  If you look at the image below, the columns in the green box make up the group, while the column in the blue box represents the aggregation, in this case it is a COUNT.

When using the GROUP  BY, all columns in the SELECT must either be in the GROUP BY or using an aggregate function.  If this requirement is not met, you well see this error:

Msg 8120, Level 16, State 1, Line 2
Column ‘Production.product.Name’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

To prevent this error, I like to do a simple test.  Look at each column and check to see if it is in the GROUP BY or using an aggregate.

SELECT p.ProductID  –Is in the Group by
, p.name   –Is in the Group by
, p.Color  –Is NOT in the Group by, will cause an error
, COUNT(s.ProductID) AS ‘NumberOfTimesSold’  –Is using an aggregate
FROM Production.product p
INNER JOIN Sales.SalesOrderDetail s
ON p.ProductID = s.ProductID
GROUP BY p.ProductID, p.name

Can Expressions be used in the GROUP BY?

Now let’s change what we are looking for.  Let’s now look for how many times each product was sold by year and month.  In order to do this, I can use an expression on the GROUP BY clause.

SELECT p.name
, MONTH(h.orderdate) AS ‘MONTH’
, YEAR(h.orderdate) AS ‘YEAR’
, COUNT(s.ProductID) AS ‘NumberOfTimesSold’
FROM Production.product p
INNER JOIN Sales.SalesOrderDetail s
ON p.ProductID = s.ProductID
INNER JOIN Sales.SalesOrderHeader h
ON s.SalesOrderID = h.SalesOrderID
GROUP BY p.name, MONTH(h.orderdate), YEAR(h.orderdate)
ORDER BY Year DESC, Month DESC

Notice in the GROUP BY the MONTH and YEAR functions are being used.  As you can see in the results below, the “Classic Vest, M” product was sold 14 times during the month of June 2014.

Final thoughts on GROUP BY

Using the GROUP BY really isn’t very complicated, you just need to do three things:

  1. Determine what columns you want to make up the groups
  2. Determine which aggregate function you would like to use
  3. Before running the statement, check all the columns in the SELECT list to make sure each column is either in the GROUP BY or using an aggregate function

Using the HAVING

Now that I have my groups and my aggregate totals, what if I want to apply a filter to the results, similar to the WHERE clause?  That is where the HAVING comes in.  You can think if this as a where clause for groups.  The HAVING clause appears after the GROUP BY and includes both the HAVING keyword and an expression.  In the case below, the expression is COUNT(s.ProductID) > 10.

SELECT p.name
, COUNT(s.ProductID) AS ‘NumberOfTimesSold’
FROM Production.product p
INNER JOIN Sales.SalesOrderDetail s
ON p.ProductID = s.ProductID
INNER JOIN Sales.SalesOrderHeader h
ON s.SalesOrderID = h.SalesOrderID
GROUP BY p.name
HAVING COUNT(s.ProductID) > 10

Using HAVING also requires the use of an aggregate function.  This aggregate really should be the same as what is in the SELECT list. If you do not include an aggregate and use something like this, HAVING s.rowguid > 10,  you will get the error below.

Msg 8121, Level 16, State 1, Line 11
Column ‘Sales.SalesOrderDetail.ProductID’ is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.

Conclusion

Using the GROUP BY and HAVING clauses are very commonly used and having a good understanding of them is essential for anyone who writes TSQL code.  Below you will find two links to documentation from Microsoft

GROUP BY

HAVING

Thanks for visiting my blog and I hope you learned something!

Select Distinct and XML data

As you know, the DISTINCT key word can be used to eliminate duplicate rows from the result set of a query.  

For example:

If you run this query against the AdventureWorks2014 database you will get 19614 rows with many duplicates.

SELECT City
FROM [AdventureWorks2014].[Person].[Address]

As you can see in the above results set there are duplicates.  Albany appears  4 times. However, my goal is to simply see what cities the database has addresses in.

I can address this by using the DISTINCT keyword.  The code below show that you simply add it just after the SELECT keyword.  Here is a link for more information on DISTINCT.

SELECT DISTINCT City
FROM [AdventureWorks2014].[Person].[Address]
ORDER BY City

The result set below shows that Albany only appears one time, duplicates have been removed.

Well, what if the result set contains XML data?  Will the DISTINCT work the same way?  The answer is no.  

Since I was working on an example for my next performance class, I decided to use an example from that session.  In the query below the qp.* part of the column list will return a column named query_plan which is an XML data type.

SELECT DISTINCT pa.,cp.,qp.*
FROM
(SELECT plan_handle
FROM sys.dm_exec_cached_plans) cp
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle ) qp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) As pa
WHERE is_cache_key = 1 AND pa.attribute = ‘set_options’

However, when I add the DISTINCT keyword I get the error below.

So what do I do about.  You can simply use the CONVERT function to convert that datatype from XML to VARCHAR(MAX) like below.

SELECT Distinct qp.dbid
, qp.encrypted
, CONVERT(VARCHAR(MAX),qp.query_plan)
FROM
(SELECT plan_handle
FROM sys.dm_exec_cached_plans) cp
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle ) qp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) As pa
WHERE is_cache_key = 1 AND pa.attribute = ‘set_options’


Just a disclaimer, I didn’t analyze this for performance because the class is about an unrelated topic. If you plan on using this in production code, please assess the impact on performance.

I will put this into the “You learn something new everyday” category!   As always, it is always fun to learn and I hope you did as well.  Thank you for visiting my blog.

Anatomy of a Select Statement Part 3

Welcome to part 3 of this series.  The first part covered the order of execution of the commands, while the second session went in depth about the SELECT command. This post will cover the FROM clause.  Most people would think that you will list tables in the FROM clause.  Yes, that is correct, however you can do so much more.

What can be listed in the FROM

This first step is to list what can you use in the FROM clause. The list includes the below list.

  • Table
  • View
  • Temporary Table
  • Table Variable
  • Derived table
  • Table valued functions

We will cover each of these.

Table and View

Tables and views are probably the most common objects used in the FROM, although I really don’t have any exact statistics on that. Below is an example is a simple query.

SELECT TOP (1000) [ProductID]
,[Name]
,[ProductNumber]
,[MakeFlag]
,[FinishedGoodsFlag]
,[Color]
,[SafetyStockLevel]
,[ReorderPoint]
,[StandardCost]
,[ListPrice]
FROM [AdventureWorks2014].[Production].[Product]

In the FROM clause you will see the table or view.  Notice that not only is the table name listed, but the database and schema are also specified.  Those two are not require, but I might not be a bad idea to start listing the schema as well as the table name.

By looking at this statement, we can only speculate that the object listed is a table.  If you use a view instead, they will look that same.  Really the only way to tell if the object is a table or view is to simply know what type of object it is.  You can look in the objects explorer in SSMS and look for the object.  Sometimes using the naming convention can also be used to make this determination.

Temporary Table

Temporary tables are created in the TempDB system database.  A temporary table can be identified by the start of the table name.  The name will start with either # or ##.

Phil Factor on Red-Gate’s web site gives a great description of when we should use temporary tables. Here is the link: RedGate

They are used most often to provide workspace for the intermediate results when processing data within a batch or procedure.

When a temporary table starts with # it is a “local” table.  This means that the only process that can see and use the temporary table is the one that created it. While global temporary tables will start with ##, they are not as secure because they can be used by other processes.

Since local temporary tables are used only by the process that created it, it will automatically be dropped when the connect that created it ends.  Global temporary tables could persist a bit longer because it will remain until all processe that are using it are terminated.  This post isn’t an in depth discuss about temporary tables, you can to here to get more information: TempTables.

Prior to using a temporary table, it must be created.  You can use the CREATE TABLE statement or use SELECT…INTO.

–Create the table
CREATE TABLE #test
(ProductName VARCHAR(50)
, Price MONEY)

–Populate the table
INSERT #test
SELECT Name
, ListPrice
FROM Production.Product

–Use the table
SELECT *
FROM #test

In the code above you will see three distinct sections.  The first is to create the temporary table, populating the table is the second and the third and last step is the use the table.

If you attempt to query the table and it is out of scope you will get the following error.

You receive this error because with a local temporary table, it is only available to the session that created it.  That is not the case with a global temporary table, one that starts with ## rather than # for a local table.

However if you change from a local to a global temporary table, you will not receive the above error.  You will actually get the results you are hoping for. If you look at the image below, you will see that the table was create with SPID 51 and used with SPID 60.  Also note that even though it is a valid table, SSMS is still indicating that it is an invalid object in SPID 60.  Of course that is not the case, it works just fine.

As you can see, temporary tables are used in pretty much the same way as user tables in a database. You just need to be aware of the scope of the table, is it local or global.

Table Variable

A table variable follows the same steps for creation and use.  Instead of using the CREATE TABLE statement, when using a table variable, you use the DECLARE statement to declare the variable as a TABLE datatype.

The scope of a table variable is the batch.  If you create a temporary table you can use it for the duration of the session.  While the duration of a table variable is the batch itself.

There are some differences on how SQL Server processes temporary tables and table variables.  That is not the purpose of this post.  You can go here to find more information regarding the differences.

Derived table

When I teach the Microsoft SQL Server certification course on TSQL, a derived table is in the same chapter as subqueries. A derived table is really not much more than a SELECT statement in the FROM clause instead of a table name.  As you can see from the code below, the derived table is highlighted.  The derived table has an alias of “v” and is enclosed in parenthesis.

Table Valued Function

Like temporary tables and table variables, before using a table valued function(TVF) the function must be created.

The code below shows how to create the function.  This function returns a table. This is from Microsoft’s web site. This will work on the AdventureWorks2014 database.

CREATE FUNCTION ProductsCostingMoreThan(@cost money)
RETURNS TABLE 
AS 
RETURN 
SELECT ProductID, ListPrice
FROM Production.Product
WHERE ListPrice > @cost

Once the function is created you can see in SSMS under Programmability.

One of the advantages of a TVF is that they can accept parameters.  In the code above, the parameter is for the list price. In the code below you will that the TVF is used just a user table might be.  Also notice the parameter being passed into the function.

In this case, we are looking for any products that cost more than $80.50 and then joining it on the Product table.

Each of these objects has performance advantages and disadvantages.  That is not the purpose of this post, however it is something that must be considered when determining the appropriate solution.

In the Part 4 on this topic, I will cover JOINS.

Thank for visiting my blog.  I hope you learned something.

 

 

Anatomy of a Select Statement Part 2

The Select statement is one of 4 Data Manipulation Language(DML), the others being INSERT, UPDATE and DELETE. As expected the SELECT statement is used to extract data from a relational database such as SQL Server.  This statement works in most database systems.  Although, there may be some differences in the different products. These differences are not the focus of this post, we will concentrate on the use of the SELECT statement in Microsoft SQL Server.  Over the years the SELECT statement has not changed much as versions of SQL Server are released, making this one of the statements that will work in older versions as well as the new ones. This is part 2 of the series.

The SELECT clause is just one of a few clauses in the SELECT statement.  The others are FROM, WHERE, GROUP BY, HAVING and ORDER BY.  In part 1 I discussed the order these clauses are processed.  As a reminder, the SELECT clause is either last or second to the last in the processing order.  If there is an ORDER BY, the SELECT is processed just before the ORDER BY.  If there is not an ORDER BY, then the SELECT is the last clause processed.  The focus of this post is just the SELECT clause.

The SELECT clause can easily be described as the place where you identify what columns you want to be included in the result set.  These columns can be columns from the table or view, an expression or the results of a scalar function.  In addition to what columns you want included in your result set, the SELECT clause is also where the order in which those columns appear in the result set is determined.

When deciding what columns from the tables to be included in the SELECT clause, it is important to only return the columns that are needed.  Avoid using “SELECT * FROM….”.  Sometimes it is simply easier to use SELECT *, however there are consequences to using it, mostly the potential impact on the performance of the query.   When listing the columns they need to separated by a comma.  Below you will find an example.  The “[” and “]” are used when the column name contains a space.  You will also see them if you right click on a table and click “Select Top x rows” in SSMS.

In addition to actual column names you can also use calculated columns in the SELECT clause. However, when this is done the column name in the result set will be (No Column Name) unless a column alias is used.

Result Set without a column alias:

The example below show how a column alias can be used.  The “AS” keyword isn’t required.  The single quotes surrounding “New Price” are also not required as long as the alias name does not include spaces.  Personally I also like to include them regardless if there is a space or not.  There really aren’t a lot of requirements for column aliases although I would try to avoid using keyword and actually column names.

Scalar Functions

Scalar functions can also be used in the SELECT clause.  These functions can be system supplies functions or user defined functions. Really the only requirement is that it must be a scalar function, not a table valued function. Just a reminder, a Scalar function is a function that returns a single value.  So when one is included in a SELECT statement, it will run the same number of times as there are rows returned.

Here are some examples of a few native SQL Server built-in functions. More information can be found here: More Info on Functions

LEFT RIGHT LEN
UPPER LOWER REVERSE
TRIM GETDATE DATEDIFF
UPPER LOWER TRY_CONVERT

When using these functions, many times the arguments will be populated with column names.  Although that is in no way a requirement.

Below you will find an example of using two Built-In functions. One, UPPER, requires an argument to be used while the second, GETDATE(), does not.  However, both would need a column alias.  This would be the same if using User Defined Functions(UDF).

Example using a UDF.  In this case it is the ufnGetSalesOrderStatusText UDF in the AdventureWorks database.  It accepts TINYINT value as an input parameter and will return the text description of the status.

If you run the above query without the TOP clause, you will see that this query returns over 31,000 records.  This means that the UDF will execute over 31,000 times. This is important to know in case there are performance issues.

TOP

We have discuss adding columns, column aliases as well as the use of scalar functions in the SELECT clause. Now lets talk about the TOP key word. This appears immediately after the SELECT key word.  This is used to limit the number of rows to a specific number, TOP(10) or percent of rows, TOP 10 PERCENT.  The TOP key word is usually used in conjunction with the ORDER BY clause.  If no ORDER BY clause is included, then SQL Server will per Microsoft,  “it returns the first N number of rows in an undefined order”.

In the example below, only 10 rows are return based on the oldest 10 records when sorted by the OrderData.  Remember, the default sort order is ascending.  Notice that there are not parenthesis around the number 10.  Again, not required at this point, but will be in the future.

Notice above that the OrderDate is the same.  That is because it only returned the top 10 records.  Well what if there are more than 10?  This is where WITH TIES comes in.  If you look at the example below you will see that there are 11 rows.  This is because the 11th row match the value of the 10th row.  WITH TIES will return if there are additional matches with the value in the last row.

$ROWGUID and $IDENTITY

These two functions are used with tables that have an Identify column or a column using the ROWGUIDCOL property.  With both $ROWGUID and $IDENTITY you actually don’t need to know the name of the columns.  If you attempt to use either of these on a table that does not have the proper column types will result in an invalid column error.

Notice in the example below, the first column and the last column in the result set are the same.  The $ROWGUID column, the first column, will also assume the column name of the source column as well. $Identify works in the same way.

DISTINCT

The last topic in the post is the use of the key word DISTINCT.  DISTINCT is used to eliminate duplicates in the result set.  Per Microsoft, NULLS are considered equal for the DISTINCT keyword.

If you look below you will see that the number of rows returned is 316.  If you add the keyword DISTINCT the number of rows returned drops to 89.  With this query, all duplicates based on Gender and Rate are removed.


During the post we have discussed many of the items that can be included in the SELECT clause. This includes column aliases, user defined functions, built-in functions, TOP and DISTINCTS keywords as well as a few other topics.

Hopefully you leaned something and thank you for visiting my blog. In the next post I will dissect the FROM clause.

Previous parts of the Series

Part 1 – Order of the Clauses in the SELECT statement

 

 

 

 

Anatomy of a SELECT Statement – Part 1

Well here it is, my first blog post.  In writing this I realized I can talk for hours on a topic and struggle when writing a blog post.  Writing blog posts is much harder than I had thought.  My plan is to start with something simple and work my way into more complex topics.  Luckily there are many great blogs out there in the SQL Server community that I can use as a model of how blogs should be done.

The Select statement is one of 4 Data Manipulation Language(DML), the others being INSERT, UPDATE and DELETE. As expected the SELECT statement is used to extract data from a relational database such as SQL Server.  This statement works in most database systems, although there may be some differences in different products. These differences are not the focus of this or future posts, we will concentrate on the use of the SELECT statement in Microsoft SQL Server.  Over the years the SELECT statement has not changed much as versions of SQL Server are released, making this one of the statements that will work in older versions as well as the new ones.  This is the first of several that will dissect the SELECT statement and provide information on each.  I will start with the order of the commands in a SELECT statement.

Order of the Commands in the SELECT Statement

The SELECT statement consists of multiple parts. Below you will see the order in which the statements must appear.

If they are not in the proper order, an error will be raised and the query will not be allow to process properly.

Below you will find the error you will see if the statements are not in the proper order:

Msg 156, Level 15, State 1, Line 9

Incorrect syntax near the keyword ‘FROM’.

Understanding this order is vital to understand why column aliases cannot be referenced in any statements besides the ORDER BY.  Just a quick reminder, column level aliases can be used on all columns, however calculated columns and columns that use functions will not have a column name unless an alias is used.

If you run the statement below you will receive this error:

Msg 207, Level 16, State 1, Line 2

Invalid column name ‘NewPrice’.

SELECT TOP 20 [ProductID]
,[Name]
,[ProductNumber]
,[MakeFlag]
,[FinishedGoodsFlag]
,[Color]
,ListPrice
,ListPrice * 2 AS ‘NewPrice’
FROM [AdventureWorks2014].[Production].[Product]
WHERE NewPrice > 100

The reason the error happens is because when the WHERE statement runs, the alias doesn’t exist yet.  With the above statement, the SELECT statement will actually run after the WHERE, therefore causing an error.

Well, how do I make it work you may ask?  The small snippet below shows how.  Rather than reference the alias in the WHERE clause, you will need to repeat the formula.  In this case the formula is the ListPrice Column times 2.

WHERE ListPrice * 2 > 100

However, you can reference the alias in the ORDER BY clause because it processes after the SELECT clause and the alias exists when the ORDER BY processes.

Hopefully you have made it this far!!!  As you can see in some situations, understanding the processing order of the commands in a SELECT statement is important.  I will cover the SELECT command in my next post.

Thanks for visiting my blog!!!