When COUNT() isn’t the only way to count


I have come across a situation a number of times in the past that seems to be one of those things that are so obvious when you see the solution, but can’t see them before the penny has dropped.

Imagine the following scenario:

You want to find the total number of orders that have the Order Status ‘A’ and the number of orders with an Order Status of ‘B’. This sounds like a simple enough request, that I’m sure you have heard of before.

Lets start off with some test data.

--Test Structure
USE master
go
IF DB_ID('Sandbox') IS NULL
BEGIN
    CREATE DATABASE Sandbox
END
GO

USE Sandbox
GO
IF OBJECT_ID('dbo.CountExample') IS NOT NULL
BEGIN
    DROP TABLE dbo.CountExample
END
GO
IF OBJECT_ID('dbo.Nums') IS NOT NULL
BEGIN
    DROP FUNCTION dbo.Nums
END
GO
-- Test Function to allow fast test data creation
CREATE FUNCTION [dbo].[Nums] (@m AS bigint)
RETURNS TABLE
AS
RETURN
WITH t0
AS (SELECT n = 1
UNION ALL
SELECT n = 1),
t1
AS (SELECT n = 1
FROM t0 AS a,
t0 AS b),
t2
AS (SELECT n = 1
FROM t1 AS a,
t1 AS b),
t3
AS (SELECT n = 1
FROM t2 AS a,
t2 AS b),
t4
AS (SELECT n = 1
FROM t3 AS a,
t3 AS b),
t5
AS (SELECT n = 1
FROM t4 AS a,
t4 AS b),
results
AS (SELECT ROW_NUMBER() OVER (ORDER BY n) AS n
FROM t5)
SELECT n
FROM results
WHERE n <= @m

GO
CREATE TABLE dbo.CountExample
(OrderId int NOT NULL,
OrderStatus char(1) NOT NULL)

GO

--Test data
INSERT INTO dbo.CountExample
(OrderId,
OrderStatus)
SELECT n,
CHAR(n % 27 + 64)
FROM dbo.Nums (1000) AS N
GO

Now that we have some test data and tables, we can take a look at what solutions are possible.

Solution 1:

The solution that I have seen come from a lot of people has been to basically run two queries, one for each Order Stautus and then collect these together returning the result.

Something along the lines of:

SELECT (SELECT COUNT(*) CountA
        FROM dbo.CountExample AS CE
        WHERE OrderStatus = 'A') CountA,
       (SELECT COUNT(*) CountB
        FROM dbo.CountExample AS CE
        WHERE OrderStatus = 'B') CountB

This delivers the correct result, but causes two separate queries to be run (one for each Order Status). There are variations of this solution, using sub-queries or CTEs, but I hope you get the idea that a separate COUNT() is required for each total that you want to calculate.

Solution 2:

The best way, that I know of, to achieve this would be to change the logic from a COUNT() to a SUM(). This sounds wrong at first, especially because the column Order Status is a char(1) and not an integer!

Take a look at how I have solved the problem with SUM():

 
SELECT SUM(CASE WHEN OrderStatus = 'A' THEN 1 ELSE 0 END) CountA,
       SUM(CASE WHEN OrderStatus = 'B' THEN 1 ELSE 0 END) CountB
FROM dbo.CountExample AS CE

Looking at the code, we can see that I have not just used SUM(), but also a CASE statement. CASE is one of my favourite constructs in T-SQL, as it allows you to perform logical processing of an entire set or only part of a set without filtering using a WHERE clause.

If you take a look at the execution plan, you will also see that the table is accessed once. This is an instant improvement over the “standard” solution of COUNT()-ing per Order Status and has the added bonus of never being noticeably more expensive, regardless of how many different Order Status totals are required.

So there you go.  COUNT() isn’t always the best way to count data in SQL Server.

Advertisements