When COUNT() isn’t the only way to count


I have come across a situation a number of times in the past that seems to be one of those things that are so obvious when you see the solution, but can’t see them before the penny has dropped.

Imagine the following scenario:

You want to find the total number of orders that have the Order Status ‘A’ and the number of orders with an Order Status of ‘B’. This sounds like a simple enough request, that I’m sure you have heard of before.

Lets start off with some test data.

--Test Structure
USE master
go
IF DB_ID('Sandbox') IS NULL
BEGIN
    CREATE DATABASE Sandbox
END
GO

USE Sandbox
GO
IF OBJECT_ID('dbo.CountExample') IS NOT NULL
BEGIN
    DROP TABLE dbo.CountExample
END
GO
IF OBJECT_ID('dbo.Nums') IS NOT NULL
BEGIN
    DROP FUNCTION dbo.Nums
END
GO
-- Test Function to allow fast test data creation
CREATE FUNCTION [dbo].[Nums] (@m AS bigint)
RETURNS TABLE
AS
RETURN
WITH t0
AS (SELECT n = 1
UNION ALL
SELECT n = 1),
t1
AS (SELECT n = 1
FROM t0 AS a,
t0 AS b),
t2
AS (SELECT n = 1
FROM t1 AS a,
t1 AS b),
t3
AS (SELECT n = 1
FROM t2 AS a,
t2 AS b),
t4
AS (SELECT n = 1
FROM t3 AS a,
t3 AS b),
t5
AS (SELECT n = 1
FROM t4 AS a,
t4 AS b),
results
AS (SELECT ROW_NUMBER() OVER (ORDER BY n) AS n
FROM t5)
SELECT n
FROM results
WHERE n <= @m

GO
CREATE TABLE dbo.CountExample
(OrderId int NOT NULL,
OrderStatus char(1) NOT NULL)

GO

--Test data
INSERT INTO dbo.CountExample
(OrderId,
OrderStatus)
SELECT n,
CHAR(n % 27 + 64)
FROM dbo.Nums (1000) AS N
GO

Now that we have some test data and tables, we can take a look at what solutions are possible.

Solution 1:

The solution that I have seen come from a lot of people has been to basically run two queries, one for each Order Stautus and then collect these together returning the result.

Something along the lines of:

SELECT (SELECT COUNT(*) CountA
        FROM dbo.CountExample AS CE
        WHERE OrderStatus = 'A') CountA,
       (SELECT COUNT(*) CountB
        FROM dbo.CountExample AS CE
        WHERE OrderStatus = 'B') CountB

This delivers the correct result, but causes two separate queries to be run (one for each Order Status). There are variations of this solution, using sub-queries or CTEs, but I hope you get the idea that a separate COUNT() is required for each total that you want to calculate.

Solution 2:

The best way, that I know of, to achieve this would be to change the logic from a COUNT() to a SUM(). This sounds wrong at first, especially because the column Order Status is a char(1) and not an integer!

Take a look at how I have solved the problem with SUM():

 
SELECT SUM(CASE WHEN OrderStatus = 'A' THEN 1 ELSE 0 END) CountA,
       SUM(CASE WHEN OrderStatus = 'B' THEN 1 ELSE 0 END) CountB
FROM dbo.CountExample AS CE

Looking at the code, we can see that I have not just used SUM(), but also a CASE statement. CASE is one of my favourite constructs in T-SQL, as it allows you to perform logical processing of an entire set or only part of a set without filtering using a WHERE clause.

If you take a look at the execution plan, you will also see that the table is accessed once. This is an instant improvement over the “standard” solution of COUNT()-ing per Order Status and has the added bonus of never being noticeably more expensive, regardless of how many different Order Status totals are required.

So there you go.  COUNT() isn’t always the best way to count data in SQL Server.

Advertisements

Controlling access to database objects


I have been playing around with database security recently and a question over at ASK regarding controlling access to database objects has prompted this post.

It is regarded a best practice to lock down your SQL Server instances from unwanted access.  As SQL Server offers so many features, there are many facets of the system that need to be adjusted to reach the goal of a watertight system.

Removing user access to your tables is one of a number of ways of guarding your data.  If they cannot access the tables, then they have an extra roadblock in accessing and breaking your data.  Once this access is revoked, you can then go about granting access to these tables through other objects: vies, functions, stored procedures that give a layer of abstraction from the data and allow a tighter control over user access.

This would also allow you to build up a data access layer (DAL) that would move the database design towards the more traditional programming techniques applied in object oriented programming (OOP).  A DAL offers you the benefit of making structural design changes that are transparant to anyone or anything that is accessing the DAL, similar to interfacing between objects in OOP.

There are plenty of resources on this topic but this may give someone, somewhere a start off in the right direction.  Below is a test script that will create a test user, a table and a view that accesses the table.  The test user has access rights revoked to the table itself, but is allowed to access a subset of the table columns through the test view.

/* Create a test user without a login for this example only */

CREATE USER TestUser WITHOUT LOGIN ;

/* Create a test table */

CREATE TABLE dbo.TestTable

(Col1 int NOT NULL PRIMARY KEY CLUSTERED

, Col2 int NOT NULL

, Col3 int NOT NULL) ;

/* Deny select rights to TestUser */

DENY SELECT ON dbo.TestTable TO TestUser ;

/* Create a view that selects the first two columns of the test table */

CREATE VIEW dbo.TestView AS

SELECT Col1,Col2 FROM dbo.TestTable ;

/* Grant select rights to TestUser for the TestView */

GRANT SELECT ON TestView TO TestUser ;

/* Impersonate TestUser to inherit his access rights*/

EXECUTE AS USER = ‘TestUser’;

/*Try selecting from the base table – fails*/

SELECT * FROM dbo.TestTable ;

/* Try selecting from the TestView – success*/

SELECT * FROM dbo.TestView ;

/* Revert back to your access rights */

REVERT

/* Tidy up */

DROP VIEW dbo.TestView;

DROP TABLE dbo.TestTable;

DROP USER TestUser;

Let technology replace your ageing brain


After seeing a question on ASK (Sqlsercentral version of Stackoverflow) asking for help with code to extract index meta data, I took a look into my little box of coding tricks.  I had put something together a while back to rename indexes to fit a certain naming scheme that almost fit the job.  I promptly posted my answer and the OP was suprised at the speed of the reply.

This proved to me again, that keeping all scripts that you ever create is really important.  If you have written it, save it somewhere permanent.  Ideally you will keep these scripts on a network share or on the web, so that you can access it any time, anywhere.  I have learned the hard way, that the little innocent script you wrote and threw away, is going to be needed again.  This normally happens about a month or so later, and tools like SSMS Toolpack with the excellent Query Execution History can help, but not as good as a script collection.

I know that I will be updating my script collection and will post the scripts and a little note here as and when I get the time.

So remember, save your scripts and be prepared!