SQL Server Summarizing Data Part 2


This entry is part 2 of 4 in the series SQL Summarizing Data

Criteria

As we continue from our previous post, here is another example of summarizing data. We are continuing from the previous example. We only want the top 5. Notice that the 5 is in brackets and the 1000 is also in parentheses when we use the GUI of SSMS to create a select for us. We don’t have to use the parentheses but there is one occasion when we must so it is good to get into the habit of using them. The where clause has no effect in this case and it is there just for illustration of the location of the where clause. We can’t use “Initial” or “InitialTotal” (aliases) in the group by clause. Notice that we can order by either column.

select top(5) left(EmployeeLastName,1) as Initial, count(*) as InitialTotal
from tblEmployee
where 1=1
group by left(EmployeeLastName,1)
order by count(*) DESC --left(EmployeeLastName,1)

Here is what that looks like.

Having

Suppose we only want to see those letters with 70 or more. We need to add the HAVING clause. Having works on the rows after they have been aggregated whereas the where clause filters rows before they are aggregated. The where is processed before the having.

select left(EmployeeLastName,1) as Initial, count(*) as InitialTotal
from tblEmployee
where 1=1
group by left(EmployeeLastName,1)
having count(*) >= 100
order by count(*) DESC --left(EmployeeLastName,1)

Where

Now let’s put in an actual where clause that affects the output.

select left(EmployeeLastName,1) as Initial, count(*) as CountOfInitial
from tblEmployee
where DateOfBirth > '19860101'  -- cannot use alias
group by left(EmployeeLastName,1)
having count(*) >= 20 -- cannot use alias
order by CountOfInitial DESC -- the last one processed, can use alias

SFWGHO. The order of processing is important to understand here. That is listed below.

  • from – the whole table
  • where – filter rows
  • group by – put the same ones together (cannot use alias of select)
  • having – filter the grouping results (cannot use alias of select)
  • select – get the resulting rows
  • order by – finally order it (may use aliases here)

Order By

Can we order by something that is not part of the select statement. Yes we can if it part of the group by. Consider an example query.

select DATENAME(month,DateOfBirth) as MonthName, Count(*) as NumberEmployees
from tblEmployee
group by DATENAME(month,DateOfBirth), DATEPART(month,DateOfBirth)
order by DATEPART(month,DateOfBirth)

Here are some of the results. We can only ORDER BY something that is in either in the SELECT or the GROUP BY. In this case the month number and the month name occur at the same time. February is always 2. January is always 1 and so on.

Now we know that there are 99 employees born in January.

Count(*)

Count(*) counts all of the rows, even if they have NULLs in them. How do we count the number of employees that have middle names. Those that do not have middle names will have NULL in that field.

select DATENAME(month,DateOfBirth) as MonthName, Count(*) as NumberEmployees,
count(EmployeeMiddleName) as NumberOfMiddleNames
from tblEmployee
group by DATENAME(month,DateOfBirth), DATEPART(month,DateOfBirth)
order by DATEPART(month,DateOfBirth)

Does the counting of middle names (the third part of the select statement) need to also go in the group by statement? No. It is a summarizing statement and therefore does not need to be in the group by. Below are some of the results. Now we know that 57 employees that were born in January have middle names.

How many do not have middle names? We can subtract one from the other as shown below.

select DATENAME(month,DateOfBirth) as MonthName, Count(*) as NumberEmployees,
count(EmployeeMiddleName) as NumberOfMiddleNames,
count(*) - Count(EmployeeMiddleName) as NoMiddleName
from tblEmployee
group by DATENAME(month,DateOfBirth), DATEPART(month,DateOfBirth)
order by DATEPART(month,DateOfBirth)

Now we will add some more information, none of which needs to be included in the group by.

select DATENAME(month,DateOfBirth) as MonthName, Count(*) as NumberEmployees,
count(EmployeeMiddleName) as NumberOfMiddleNames,
count(*) - Count(EmployeeMiddleName) as NoMiddleName,
format(min(DateOfBirth),'MM-dd-yy') as EarliestDateOfBirth,
format(max(DateOfBirth),'D') as LatestDateOfBirth
from tblEmployee
group by DATENAME(month,DateOfBirth), DATEPART(month,DateOfBirth)
order by DATEPART(month,DateOfBirth)

Here are the results.

Series Navigation<< SQL Server Summarizing Data IntroductionSQL Server Summarizing Data Part 3 >>