Grouping and Aggregating Data

Except of row-by-row queries, you may need to summarize data to analyze it. Microsoft SQL Server provides built-in functions that can aggregate, and summarize information based on multiple rows. In this post, we will discuss about aggregate functions, GROUP BY, and HAVING clauses.

Using Aggregate Functions

Microsoft SQL Server provide aggregate functions such as SUM, MAX, MIN, and AVG to calculate multiple rows and summarize, then return a single result.

When using aggregate functions, you need to consider that the Aggregate Functions:

  • Return a single (scalar) value
  • Can be used in SELECT, HAVING, and ORDER BY clauses within SELECT statements.
  • Ignore NULLs except when using COUNT (*)
  • In SELECT list generate columns with no column name. for this purpose, you may need AS clause to provide column name.
  • In SELECT clause operate to all rows and all rows will be summarized, If there is no GROUP BY in SELECT statement.

Built-in Aggregate Functions

[table id=26 /]

In this post we only discuss common aggregate functions. For further information on other built-in aggregate functions, visit MSDN for Aggregate Functions (Transact-SQL)

Common Aggregate Functions

[table id=27 /]

Aggregate Example

[codesyntax lang=”tsql”]

SELECT	AVG(unitprice) AS avg_price,
	MIN(OrderQty)AS min_qty,
	MAX(UnitPriceDiscount) AS max_discount
FROM Sales.SalesOrderDetail;

[/codesyntax]

Consider that the above example doesn’t use GROUP BY clause. Therefore, all rows will be summarized and single row will return as result.

The Results:

When writing aggregates in SELECT clause, all SELECT list must be used aggregate functions, or be referenced in a GROUP BY clause.

The following example will return an error:

Partial Aggregate Error

[codesyntax lang=”tsql”]

SELECT SalesOrderID,
AVG(unitprice) AS avg_price,
MIN(OrderQty)AS min_qty,
MAX(UnitPriceDiscount) AS max_discount
FROM Sales.SalesOrderDetail;

[/codesyntax]

The Results:

As you see, SalesOrderID column doesn’t appear either in aggregate function or the GROUP BY clause. Therefore, it’s invalid to use it in SELECT list. Removing SalesOrderID from previous example will prevent an error.

Aggregate functions in addition to numeric, can also summarize character and date/time data. The following example returns first and last names of customers:

[codesyntax lang=”tsql”]

select MIN(LastName) AS first_customer,
MAX(LastName) AS last_customer
from Sales.Customer SC INNER JOIN Person.Person PP
ON SC.PersonID=PP.BusinessEntityID

[/codesyntax]

The Results:

And the following example use aggregate function to returns first and last order date.

[codesyntax lang=”tsql”]

SELECT MIN(OrderDate) AS [First Order],
MAX(OrderDate) AS [Last Order]
FROM Sales.SalesOrderHeader

[/codesyntax]

The Result:

Using DISTINCT with Aggregate Functions

DISTINCT removes duplicate values from input column before computing the summery value.

The following example returns count of customer who have placed orders. Although in the second column it uses distinct to remove duplicate customer before counting.

[codesyntax lang=”tsql”]

SELECT SalesPersonID,
YEAR(orderdate) AS orderyear,
COUNT(CustomerID) AS all_customers,
COUNT(DISTINCT CustomerID) AS unique_customers
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID, YEAR(orderdate);

[/codesyntax]

The Results:

Note that the difference between COUNT in column 3 and DISTINCT COUNT in column 4 is column 3 only returns count of all rows per SalesPersonID and order year except of NULL values whereas column 4 eliminates duplicate values for customers and then returns count of unique customers.

Aggregate Functions and NULLs

T-SQL aggregate functions with the exception of COUNT (*), ignore NULLs. This means, for example, that the SUM function just adds non-NULL values. NULLs do not evaluate to zero. Consider that if you want to count NULLs in your aggregate functions, you need to replace NULLs with zero before aggregation.

For example, given the following table named t1:

The following example compares the AVG functions NULLs-ignored verses replaced with 0.

[codesyntax lang=”tsql”]

SELECT AVG(c2) AS AvgWithNULLs,
AVG(COALESCE(c2,0)) AS AvgWithNULLReplace
FROM dbo.t1;

[/codesyntax]

The Results:

Using GROUP BY Clause

You may need to arrange your data into subsets before aggregation and summarizing. the GROUP BY clause let you subdivide the results into groups of rows.

GROUP BY Syntax

GROUP BY [, … n] | [, … n]

GROUP BY create groups to placed summarizing rows into each group.

The following example illustrate count of sales per salesman:

[codesyntax lang=”tsql”]

SELECT SalesPersonID, COUNT(*) AS CountOfSales FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID

[/codesyntax]

The Result:

[info]When you use GROUP BY clause, you are permitted to use columns without aggregate functions in SELECT clause that appear in GROUP BY clause. Otherwise, you have to use aggregate functions with columns in SELECT clause.[/info]

The following example returns an error since OrderDate is not input of GROUP BY clause and is used in SELECT clause without any aggregate function:

[codesyntax lang=”tsql”]

SELECT SalesPersonID, Year(OrderDate), COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID

[/codesyntax]

The Error Returns:

Msg 8120, Level 16, State 1, Line 15Column ‘Sales.SalesOrderHeader.OrderDate’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

If you want to amend the query to see the result per SalesPersonID and per year of OrderDate add it to the GROUP BY clause, as follow:

[codesyntax lang=”tsql”]

SELECT SalesPersonID, Year(OrderDate), COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID,Year(OrderDate)

[/codesyntax]

The Results:

The following example shows the SELECT statement that lists columns and column expressions and aggregate functions with GROUP BY clause.

[codesyntax lang=”tsql”]

SELECT SalesPersonID, Year(OrderDate) OrderDate, MAX(SubTotal) HighestOrder,
MIN(SubTotal) LowestOrder, COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID,Year(OrderDate)

[/codesyntax]

The Results:

GROUP BY and the Logical Order of Operations

The following table shows the logical order of clauses in SELECT statement:

[table id=28 /]

Filtering Grouped Data Using the HAVING Clause

As you see in the above table, HAVING clause is placed after WHERE and GROUP BY clause in forth phrase of logical order.

A HAVING clause is used to make a search condition on grouped results which returned by GROUP BY clause.

The following example groups all orders by customer, then returns only those who placed 10 or more orders.

GROUP BY with HAVING Clause

[codesyntax lang=”tsql”]

SELECT CustomerID, COUNT(*) AS CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
HAVING COUNT(*) >= 10

[/codesyntax]

The results:

Note: remember that the GROUP BY and HAVING clauses are proceed before the SELECT clause. Therefore any column aliases in SELECT list can’t be used in GROUP BY and HAVING clauses.

HAVING versus WHERE

While both HAVING and WHERE clauses filter the result. It’s important to consider that WHERE operates on results of the FROM clause, while HAVING operates on grouped results of the GROUP BY clause not detail rows.

I hope this post was informative for you, please share it with others if you think it worth to read. Stay tuned to learn more about SQL Server.

 

Share This Story

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp

Share Your Comments

About The Author

Search Articles

Categories

Follow Fard Solutions

Share Your Comments

Fard Solutions Sdn. Bhd. Company was established in 2012 in Kuala Lumpur. We are engaged in development, implementation, integration and maintenance of the most advanced database solutions and IT research & development (innovation).

(+60) 03 8994 6788

info@fard-solutions.com

Fard Technology Corp.
700-1199 West Hastings Street,
Vancouver, British Colombia,
V6E 3T5 Canada.

Fard Solutions Sdn. Bhd.
1-1C & 1-2C Incubator 1,
Technology Park Malaysia,
Bukit Jalil, 57000
Kuala Lumpur, Malaysia.

  • Copyrights © 2020 Fard Solutions Sdn. Bhd. , All rights reserved.
  • Privacy Policy
  • Terms of Service
  • A Project By Website Artisan