Aggregate functions are mostly used in our queries to find out SUM, AVG, COUNT and so on information from stored records for analytical purposes. In SQL Server there are two different physical operators regarding aggregation, those are called ‘Stream Aggregate’ and ‘Hatch Match (Aggregate)’. In this post I would like to explain what are the differences and why SQL Server Optimizer choose either one or both in a query.
Basically once the query is submitted to SQL Server, it goes through few steps such as Parsing, Binding or Algebrizer, Query Optimizer and Execution Engine. Each step has its own responsibilities (Out of the Scope), ‘Execution Plan’ generation is ‘Query Optimizer’ responsibility to find the best plan within the limited time base on statistic object(s) and total number of rows within the table(s) be aware ‘All Density’ value in statistic object is no more in use by Query Optimizer.
There is a formula to find the best physical operator for the aggregate function. Lets start with simple query such as ‘Select Count(*) from dbo.Tempdata;’, it is obvious that SQL Server Optimizer will choose ‘Stream Aggregate’ physical operator because the aggregation is over all records, the size of the table are medium or small and there is no GROUP BY function, therefore it combine all the values under a group, lets look at below execution plan.
The second query is same as previous one except that GROUP BY clause is added, ‘Select Count(*) from dbo.Tempdata Group By GroupID;’ SQL Server Optimizer will choose Stream Aggregate + Sort physical operations, keep in mind that Stream Aggregate input data needs to be sorted if there is GROUP BY clause, if there is an index on the table, Optimizer will skip the SORT operator. Stream Aggregate operator reads the first record and create a group associated to the record, then read the second record, if the second record is same as first record, it keeps the accumulative values together under same group otherwise it creates new group and assign the new accumulative values of second record to the new group.
Lets increase the number of estimated rows up to 100,000,000 to fool the SQL Server optimizer instead of inserting real records. Run the second query again, as it shows in the execution plan, SQL Server Optimizer choose Hash Match (Aggregate) instead of the Stream Aggregate due to the size of the table and the records are not sorted as well, keep in mind that Hash Match is more costlier than Stream Aggregate due to hash operation.
Lets add explicit ORDER BY clause to the second query and run it again. As you see in the execution plan, SQL Server Optimizer decides to place SORT physical operation after Hash Match operation due to lower cost instead of placing SORT physical operation after Table Scan and use the Stream Aggregate.
Note: The Exclamation mark is due to increasing the estimated # of rows to fool the SQL Server Optimizer.
Author: Hamid Jabarpour Fard