Except of row-by-row queries, you may need to summarize data to analyze it. Microsoft SQL Server provides built-in functions that can aggregate, and summarize information based on multiple rows. In this post, we will discuss about aggregate functions, GROUP BY, and HAVING clauses.
Using Aggregate Functions
Microsoft SQL Server provide aggregate functions such as SUM, MAX, MIN, and AVG to calculate multiple rows and summarize, then return a single result.
When using aggregate functions, you need to consider that the Aggregate Functions:
- Return a single (scalar) value
- Can be used in SELECT, HAVING, and ORDER BY clauses within SELECT statements.
- Ignore NULLs except when using COUNT (*)
- In SELECT list generate columns with no column name. for this purpose, you may need AS clause to provide column name.
- In SELECT clause operate to all rows and all rows will be summarized, If there is no GROUP BY in SELECT statement.
Built-in Aggregate Functions
[table id=26 /]
In this post we only discuss common aggregate functions. For further information on other built-in aggregate functions, visit MSDN for Aggregate Functions (Transact-SQL)
Common Aggregate Functions
[table id=27 /]
SELECT AVG(unitprice) AS avg_price, MIN(OrderQty)AS min_qty, MAX(UnitPriceDiscount) AS max_discount FROM Sales.SalesOrderDetail;
Consider that the above example doesn’t use GROUP BY clause. Therefore, all rows will be summarized and single row will return as result.
When writing aggregates in SELECT clause, all SELECT list must be used aggregate functions, or be referenced in a GROUP BY clause.
The following example will return an error:
Partial Aggregate Error
SELECT SalesOrderID, AVG(unitprice) AS avg_price, MIN(OrderQty)AS min_qty, MAX(UnitPriceDiscount) AS max_discount FROM Sales.SalesOrderDetail;
As you see, SalesOrderID column doesn’t appear either in aggregate function or the GROUP BY clause. Therefore, it’s invalid to use it in SELECT list. Removing SalesOrderID from previous example will prevent an error.
Aggregate functions in addition to numeric, can also summarize character and date/time data. The following example returns first and last names of customers:
select MIN(LastName) AS first_customer, MAX(LastName) AS last_customer from Sales.Customer SC INNER JOIN Person.Person PP ON SC.PersonID=PP.BusinessEntityID
And the following example use aggregate function to returns first and last order date.
SELECT MIN(OrderDate) AS [First Order], MAX(OrderDate) AS [Last Order] FROM Sales.SalesOrderHeader
Using DISTINCT with Aggregate Functions
DISTINCT removes duplicate values from input column before computing the summery value.
The following example returns count of customer who have placed orders. Although in the second column it uses distinct to remove duplicate customer before counting.
SELECT SalesPersonID, YEAR(orderdate) AS orderyear, COUNT(CustomerID) AS all_customers, COUNT(DISTINCT CustomerID) AS unique_customers FROM Sales.SalesOrderHeader GROUP BY SalesPersonID, YEAR(orderdate);
Note that the difference between COUNT in column 3 and DISTINCT COUNT in column 4 is column 3 only returns count of all rows per SalesPersonID and order year except of NULL values whereas column 4 eliminates duplicate values for customers and then returns count of unique customers.
Aggregate Functions and NULLs
T-SQL aggregate functions with the exception of COUNT (*), ignore NULLs. This means, for example, that the SUM function just adds non-NULL values. NULLs do not evaluate to zero. Consider that if you want to count NULLs in your aggregate functions, you need to replace NULLs with zero before aggregation.
For example, given the following table named t1:
The following example compares the AVG functions NULLs-ignored verses replaced with 0.
SELECT AVG(c2) AS AvgWithNULLs, AVG(COALESCE(c2,0)) AS AvgWithNULLReplace FROM dbo.t1;
Using GROUP BY Clause
You may need to arrange your data into subsets before aggregation and summarizing. the GROUP BY clause let you subdivide the results into groups of rows.
GROUP BY Syntax
GROUP BY create groups to placed summarizing rows into each group.
The following example illustrate count of sales per salesman:
SELECT SalesPersonID, COUNT(*) AS CountOfSales FROM Sales.SalesOrderHeader GROUP BY SalesPersonID
[info]When you use GROUP BY clause, you are permitted to use columns without aggregate functions in SELECT clause that appear in GROUP BY clause. Otherwise, you have to use aggregate functions with columns in SELECT clause.[/info]
The following example returns an error since OrderDate is not input of GROUP BY clause and is used in SELECT clause without any aggregate function:
SELECT SalesPersonID, Year(OrderDate), COUNT(*) AS CountOfSales FROM Sales.SalesOrderHeader GROUP BY SalesPersonID
The Error Returns:
Msg 8120, Level 16, State 1, Line 15Column ‘Sales.SalesOrderHeader.OrderDate’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If you want to amend the query to see the result per SalesPersonID and per year of OrderDate add it to the GROUP BY clause, as follow:
SELECT SalesPersonID, Year(OrderDate), COUNT(*) AS CountOfSales FROM Sales.SalesOrderHeader GROUP BY SalesPersonID,Year(OrderDate)
The following example shows the SELECT statement that lists columns and column expressions and aggregate functions with GROUP BY clause.
SELECT SalesPersonID, Year(OrderDate) OrderDate, MAX(SubTotal) HighestOrder, MIN(SubTotal) LowestOrder, COUNT(*) AS CountOfSales FROM Sales.SalesOrderHeader GROUP BY SalesPersonID,Year(OrderDate)
GROUP BY and the Logical Order of Operations
The following table shows the logical order of clauses in SELECT statement:
[table id=28 /]
Filtering Grouped Data Using the HAVING Clause
As you see in the above table, HAVING clause is placed after WHERE and GROUP BY clause in forth phrase of logical order.
A HAVING clause is used to make a search condition on grouped results which returned by GROUP BY clause.
The following example groups all orders by customer, then returns only those who placed 10 or more orders.
GROUP BY with HAVING Clause
SELECT CustomerID, COUNT(*) AS CountOfOrders FROM Sales.SalesOrderHeader GROUP BY CustomerID HAVING COUNT(*) >= 10
Note: remember that the GROUP BY and HAVING clauses are proceed before the SELECT clause. Therefore any column aliases in SELECT list can’t be used in GROUP BY and HAVING clauses.
HAVING versus WHERE
While both HAVING and WHERE clauses filter the result. It’s important to consider that WHERE operates on results of the FROM clause, while HAVING operates on grouped results of the GROUP BY clause not detail rows.
I hope this post was informative for you, please share it with others if you think it worth to read. Stay tuned to learn more about SQL Server.