Reducing data footprints on test systems

Picture the scene:

You are a DBA. You have to make everyone happy, including DEVs (yes, including them too). Part of that is to ensure that your DEVs have a fresh copy of your production data so that they can test their programming with current data and structures. The issue with that is your production database is multiple hundreds of GB in size and your test system doesn’t have the capacity to store a full copy of your database. Your IT Budget will also not stretch to purchasing more storage! What can you do?

I had a similar situation recently and took a look at the possibilties and here is what I came up with:

1. Try out tools like SQL Virtual Restore to allow you to “mount” a backup from a remote server avoiding the space issue all together. This is, of course, a solution that requires a license (no budget maybe?)

2. Use the great features offered to you by SQL Server Developer Edition: data compression to the rescue!

So of course, I chose door number 2.

As you may know, Developer Edition offers the same features as Enterprise Edition with the caveat that the instance cannot be used for production purposes in any way. This means that the seriously powerful and useful data compression becomes available for your test servers! This counts even if you only use Standard Edition in production – joy of joys! The good thing being that Developer Edition only costs around $50 (or less if you have an MSDN subscription) – even more joys!

If you have never seen/used compression take a quick look over on MSDN to see what it all is (Data Compression). Basically, you can compress data in indexes and tables at the storage level avoiding any query rewrites and still profiting from the storage savings. This can be a major advantage in terms of raw storage needs, but can also benefit you in high read environments with low RAM sizes. The data remains compressed when held in the buffer pool and is only de-compressed when being accessed. This means that you can keep more data in the buffer pool and reduce hard disk I/O (obviously not for free, compressing and de-compressing costs CPU cycles).  This may be acceptable on a test system with extremely constrained storage space.

The usage in my example scenario is now an even better proposition, not only can I reduce my storage footprint, I can also potentially increase test system performance on the I/O side of the equation (who doesn’t have I/O issues, especially on a test box!).

The next hurdle is of course identifying which indexes and tables that are in the database you want to squeeze down. This is possible via SSMS’ object explorer, but only if you want to spend an eternity doing so! The best way is to take a look at the meta-data tables/views/functions to interrogate the system objects. These are really interesting on their own (at least they are to me), but after writing my Index Script Creator, I realised the potential for using the information in these system objects to allow me to automate certain tasks like this one.

Similarly to the Index Script Creator, my Index Compressor Script (uploaded as a doc  but it is a .sql file really – stupid wordpress restrictions!) runs off and investigates all tables and indexes (down to the partition level) and then generates an index rebuild command to compress any currently non-compressed partitions of indexes / tables. The resulting commands can then be run and space can be reclaimed – just be aware that rebuilding indexes does cause log activity.

After compressing your tables and indexes you should have a much smaller amount of data (depending on how well your data compresses) and be able to reclaim some disk space.

I realise that this method will mean that you have to have a large enough disk to have the database in its uncompressed state to begin with, but you can end up with a cost effective solution to a budget-constrained test environment.

Happy zipping!


When COUNT() isn’t the only way to count

I have come across a situation a number of times in the past that seems to be one of those things that are so obvious when you see the solution, but can’t see them before the penny has dropped.

Imagine the following scenario:

You want to find the total number of orders that have the Order Status ‘A’ and the number of orders with an Order Status of ‘B’. This sounds like a simple enough request, that I’m sure you have heard of before.

Lets start off with some test data.

--Test Structure
USE master
IF DB_ID('Sandbox') IS NULL

USE Sandbox
IF OBJECT_ID('dbo.CountExample') IS NOT NULL
    DROP TABLE dbo.CountExample
    DROP FUNCTION dbo.Nums
-- Test Function to allow fast test data creation
CREATE FUNCTION [dbo].[Nums] (@m AS bigint)
AS (SELECT n = 1
SELECT n = 1),
AS (SELECT n = 1
FROM t0 AS a,
t0 AS b),
AS (SELECT n = 1
FROM t1 AS a,
t1 AS b),
AS (SELECT n = 1
FROM t2 AS a,
t2 AS b),
AS (SELECT n = 1
FROM t3 AS a,
t3 AS b),
AS (SELECT n = 1
FROM t4 AS a,
t4 AS b),
FROM t5)
FROM results
WHERE n <= @m

CREATE TABLE dbo.CountExample
(OrderId int NOT NULL,
OrderStatus char(1) NOT NULL)


--Test data
INSERT INTO dbo.CountExample
CHAR(n % 27 + 64)
FROM dbo.Nums (1000) AS N

Now that we have some test data and tables, we can take a look at what solutions are possible.

Solution 1:

The solution that I have seen come from a lot of people has been to basically run two queries, one for each Order Stautus and then collect these together returning the result.

Something along the lines of:

        FROM dbo.CountExample AS CE
        WHERE OrderStatus = 'A') CountA,
       (SELECT COUNT(*) CountB
        FROM dbo.CountExample AS CE
        WHERE OrderStatus = 'B') CountB

This delivers the correct result, but causes two separate queries to be run (one for each Order Status). There are variations of this solution, using sub-queries or CTEs, but I hope you get the idea that a separate COUNT() is required for each total that you want to calculate.

Solution 2:

The best way, that I know of, to achieve this would be to change the logic from a COUNT() to a SUM(). This sounds wrong at first, especially because the column Order Status is a char(1) and not an integer!

Take a look at how I have solved the problem with SUM():

       SUM(CASE WHEN OrderStatus = 'B' THEN 1 ELSE 0 END) CountB
FROM dbo.CountExample AS CE

Looking at the code, we can see that I have not just used SUM(), but also a CASE statement. CASE is one of my favourite constructs in T-SQL, as it allows you to perform logical processing of an entire set or only part of a set without filtering using a WHERE clause.

If you take a look at the execution plan, you will also see that the table is accessed once. This is an instant improvement over the “standard” solution of COUNT()-ing per Order Status and has the added bonus of never being noticeably more expensive, regardless of how many different Order Status totals are required.

So there you go.  COUNT() isn’t always the best way to count data in SQL Server.

Be prepared!

While I was on my recent trans-global sojourn I came to the conclusion that the boy scouts really had a valid point with their motto “Be Prepared”. Of course, I am not meaning it quite in the semi-military sense that Robert Baden-Powell meant, but being prepared before the large system roll-out really saved a lot of time, nerves and money.

The roll-out was a set of reporting servers in a transactional replication setup, pushing operational data from a central server to the outlying branch offices around the world. I spent a great deal of time preparing these systems for the roll-out; standardising the installations, scripting out each step required to set up users, roles, permissions, DB objects, linked servers, jobs etc. This long preparation period was very tedious and I would often lose the motivation to keep at it.

The final payoff for this long drawn-out process has been the actual roll-out at each location. I have been able to arrive at each office, totally jet-lagged and tired from the entire travel regime and basically sit and watch the scripts run through and see the systems come to life. Had I not been prepared, I would have been in a world of pain, trying to remember what needed doing and when whilst fighting off sleep and headaches.

As a side note: A former colleague/mentor told me to save every script I ever run. If you don’t, you’ll need to repeat that script that took you an hour to write the very next day after you deleted it. This task has been made much easier to do thanks to the SSMS Toolspack provided by the more than awesome Mladen Prajdić. His SSMS extension has saved me hours of time when I have written code and accidentally thrown it away, or when SSMS has crashed. Use his tool and donate to the cause!

So, before you start any project, always keep the Scouts motto in mind – “Be prepared”. Hard disk space is so cheap as to be free, how cheap is your time?

SSMS ToolsPack – Powershell Turboboost

I have been using the rather brilliant SSMSToolsPack from Mladen Prajdić recently and love the query execution history feature.

I like to keep my history around for a while – I do a lot of work that then has to be repeated later – and this tool lets me not worry about forgetting to save that important query I ran for someone weeks ago.

I ran in to a little problem though.  All those query executions have to be stored somewhere, this is done by setting a folder to store the query text. 

SSMSToolsPack stores the query text in txt files, these can get out of control if you run enough queries.  They are then stored per day in a folder; so for today the queries would be stored in the folder “2010-12-03”.

If you want to search through the history and have a great number of files and folders, the search can get very slow (I suspect this has to do with the directory and file traversal).  I am lucky to have a small SSD on my main machine, I store source code and the execution history files/folders on there.  This speeds things up, but it seems that even then the search is sluggish (takes about 30 seconds to index on my machine).

I took a quick look at these files and saw that they were basically all the same content wise.  I tried just combining the files to see if that could improve the performance of searching and lo-and-behold search was blazingly fast!

Being lazy, I whipped up a script in Powershell to make this easier/semi-automatic and here it is:

clear host
$path = "" #Set path here!
foreach ($folder in Get-ChildItem $path)
  if ($folder.PSIsContainer -eq "False")
      foreach ($file in Get-ChildItem $folder.FullName)
         $target = $path+$folder.Name+"\output.txt" #set output file
         if ($file.Name -ne "output.txt")
          { #concatenate content of all files except output
            cat $file.FullName > $target 
            #del $file.FullName #delete the file after processing

It is nothing special, but maybe if you use SSMSToolsPack and have experienced a similar slowdown, you can use this to help.

Presenting the Index Creator Script

Born of a need to originally script out indexes in a way similar to how SSMS creates them, I created a script to do just that. The original was very quick-n-dirty as can be seen here:
I had not accounted for very much, other than the indexes as they were. All options, schema information etc. was ignored, as we have nothing special at work and I really needed those indexes quickly.

Since posting on ASK, I have tinkered on-and-off with the script for a while. I am now at a point where I think other people could really profit from it and no longer have it stuck to some hard-coded schemas etc.

Presenting the Index Creator Script v1.0! This script will go through the current database, finding all indexes (optionally system indexes too) and supply you with create index scripts.

It is clever enough to spot the difference between / usage of :

– Primary Keys
– Unique Constraints
– Clustered and Non-Clustered Indexes
– Filtered Indexes (produces the filter too)
– Included Columns
– Partitioned Tables/Indexes (although the partition schema and functions are not produced – yet!)
– Data Compression (on a partition level if used – yes this is possible!)
– Fill Factor
– Index Padding
– Locking (Row and Page)

I have supplied two versions of the code; one for SQL 2005 and one for SQL 2008 and above. This is done as SQL 2008 offers Data Compression, which is implemented in the indexes and partitions. Some of the script relies on this information and would not be backwards compatible.

It has been very interesting coding this script, as it has enlightened me on the structures in SQL server with regards to indexes. For example, as of SQL 2005, regardless of edition, SQL Server creates indexes using partitions. Although partitions cannot be used by editions lower than Enterprise/Developer Edition, all indexes are created with at least one partition. This makes sense, as that would mean there would have to be a separate structures depending upon edition. This way, regardless of edition, the storage engine works the same, you just don’t get the option of creating partitions on an edition lower than Enterprise/Developer. As soon as you migrate a database to Enterprise Edition, you get the possibility of then splitting the indexes on to multiple partitions.

Even better than that, I found out that indexes can be compressed by partition. I sort of knew this already, but in writing the script I saw this in even more clarity. Each partition of an index can use a different level of compression. This can be very interesting, especially if the costs of compression are high, but the benefits in storage are high too. Think of a CPU bound system where some partitions are accessed often and would need a lower compression to reduce CPU load, with other partitions that are accessed rarely which can benefit from the higher compression ratio.

I hope these scripts are of some benefit. If you have comments/questions/suggestions, please get in touch.

Index Creator Script – This is a zip file. Download, change file extension and open in your favourite ZIP manager (damn you wordpress!). There are 2 .sql files in there (one for 2005 and one for 2008). Disclaimer – use at your own risk, I am not responsible if it breaks your PC/Server.

In reality, this script can’t break things, but you have been warned!

UPDATE: Thanks to @Fatherjack for the quick heads-up on a syntax error. Things should look good now though! 🙂

Scripting DB Objects using Powershell

I recently had to script out all objects for our database at work so we could put it into TFS.  I am sure there are much better ways of doing this, but I wanted to learn Powershell and heard about SMO via simple-talk.

This is what I came up with (sorry about code formatting I haven’t found out how to improve that on wordpress yet):

# define parameters


$server = "",

$instance = "default",

$database = "",

$schema = "dbo",

$basefolder = "C:\temp\Powershell\$database\SchemaObjects\",

$objectoption = $(read-host "All Objects = 0, Tables = 1, Table Triggers = 2, Views = 3, Functions = 4, Stored Procedures = 5")

# trap errors

$errors = "C:\temp\Powershell\errors.txt"



"______________________" | out-file $errors -append;

"ERROR SCRIPTING TABLES" | out-file $errors -append;

get-date | out-file $errors -append;

"ERROR: " + $_ | out-file $errors -append;

"`$server = $server" | out-file $errors -append;

"`$instance = $instance" | out-file $errors -append;

"`$database = $database" | out-file $errors -append;

"`$tables = $tables" | out-file $errors -append;

"`$path = $path" | out-file $errors -append;

"`$scripts = $scripts" | out-file $errors -append;

#throw "ERROR: See $errors"


# load .NET assembly
[reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") | out-null
function MakeDirectory #Checks to see if the supplied Directory is there and creates it if not. Inside a Process to allow a Pipe to use it.
if (!(Test-Path -path $DirName))
New-Item $DirName -type directory | Out-Null
MakeDirectory ($basefolder)
# Create Server Object using SMO
$srv = new-object Microsoft.SqlServer.Management.Smo.Server($server)

# Create Database Object
$db =$srv.databases[$database]

# SMO Scripter creation and options set

$scr = New-Object "Microsoft.SqlServer.Management.Smo.Scripter"
$scr.Server = $srv
$options = New-Object "Microsoft.SqlServer.Management.SMO.ScriptingOptions"
$options.AllowSystemObjects = $false
$options.IncludeDatabaseContext = $false
$options.IncludeIfNotExists = $false
$options.ClusteredIndexes = $true
$options.NoCollation = $true
$options.Default = $true
$options.DriAll = $true
$options.Indexes = $true
$options.NonClusteredIndexes = $true
$options.IncludeHeaders = $false
$options.ToFileOnly = $true
$options.Permissions = $true
$options.ScriptDataCompression = $true
$options.ScriptDrops = $false
$options.AppendToFile = $false

#Set options for SMO.Scripter
$scr.Options = $options
if ($objectoption -eq 0 -or $objectoption -eq 1)
# script each table
foreach ($table in $db.Tables | where {$_.IsSystemObject -eq $false})
$tablefolder = $basefolder + "Tables\";
MakeDirectory ($tablefolder); #Check for folder, and create if needed
$tablefile = $tablefolder + $table.Name +".table.sql";
$options.FileName = $tablefile;
$scr.Options = $options;
if ($objectoption -eq 0 -or $objectoption -eq 2)
# Script table triggers (go into tables then triggers)
foreach ($table in $db.Tables)
foreach ($trigger in $Table.Triggers | where {$_.IsSystemObject -eq $false})
$triggerfolder = $basefolder + "Tables\Triggers\";
MakeDirectory ($triggerfolder); #Check for folder, and create if needed
$triggerfile = $triggerfolder + $ +".trigger.sql";
$options.FileName = $triggerfile;
$scr.Options = $options;
if ($objectoption -eq 0 -or $objectoption -eq 3)
# script each view
foreach ($view in $db.Views | where {$_.IsSystemObject -eq $false})
$viewfolder = $basefolder + "Views\";
MakeDirectory ($viewfolder); #Check for folder, and create if needed
$viewfile = $viewfolder + $view.Name +".view.sql";
$options.FileName = $viewfile;
$scr.Options = $options;
if ($objectoption -eq 0 -or $objectoption -eq 4)
# script each function
foreach ($function in $db.UserDefinedFunctions | where {$_.IsSystemObject -eq $false})
$functionfolder = $basefolder + "Programmability\Functions\";
MakeDirectory ($functionfolder); #Check for folder, and create if needed
$functionfile = $functionfolder + $function.Name +".function.sql";
$options.FileName = $functionfile ;
$scr.Options = $options;
if ($objectoption -eq 0 -or $objectoption -eq 5)
# script each stored procedure
foreach ($procedure in $db.StoredProcedures | where {$_.IsSystemObject -eq $false})
$procedurefolder = $basefolder + "Programmability\Stored Procedures\";
MakeDirectory ($procedurefolder); #Check for folder, and create if needed
$procedurefile = $procedurefolder + $procedure.Name +".proc.sql";
$options.FileName = $procedurefile ;
$scr.Options = $options;

Controlling access to database objects

I have been playing around with database security recently and a question over at ASK regarding controlling access to database objects has prompted this post.

It is regarded a best practice to lock down your SQL Server instances from unwanted access.  As SQL Server offers so many features, there are many facets of the system that need to be adjusted to reach the goal of a watertight system.

Removing user access to your tables is one of a number of ways of guarding your data.  If they cannot access the tables, then they have an extra roadblock in accessing and breaking your data.  Once this access is revoked, you can then go about granting access to these tables through other objects: vies, functions, stored procedures that give a layer of abstraction from the data and allow a tighter control over user access.

This would also allow you to build up a data access layer (DAL) that would move the database design towards the more traditional programming techniques applied in object oriented programming (OOP).  A DAL offers you the benefit of making structural design changes that are transparant to anyone or anything that is accessing the DAL, similar to interfacing between objects in OOP.

There are plenty of resources on this topic but this may give someone, somewhere a start off in the right direction.  Below is a test script that will create a test user, a table and a view that accesses the table.  The test user has access rights revoked to the table itself, but is allowed to access a subset of the table columns through the test view.

/* Create a test user without a login for this example only */


/* Create a test table */

CREATE TABLE dbo.TestTable


, Col2 int NOT NULL

, Col3 int NOT NULL) ;

/* Deny select rights to TestUser */

DENY SELECT ON dbo.TestTable TO TestUser ;

/* Create a view that selects the first two columns of the test table */


SELECT Col1,Col2 FROM dbo.TestTable ;

/* Grant select rights to TestUser for the TestView */

GRANT SELECT ON TestView TO TestUser ;

/* Impersonate TestUser to inherit his access rights*/


/*Try selecting from the base table – fails*/

SELECT * FROM dbo.TestTable ;

/* Try selecting from the TestView – success*/

SELECT * FROM dbo.TestView ;

/* Revert back to your access rights */


/* Tidy up */

DROP VIEW dbo.TestView;

DROP TABLE dbo.TestTable;