Reducing data footprints on test systems


Picture the scene:

You are a DBA. You have to make everyone happy, including DEVs (yes, including them too). Part of that is to ensure that your DEVs have a fresh copy of your production data so that they can test their programming with current data and structures. The issue with that is your production database is multiple hundreds of GB in size and your test system doesn’t have the capacity to store a full copy of your database. Your IT Budget will also not stretch to purchasing more storage! What can you do?

I had a similar situation recently and took a look at the possibilties and here is what I came up with:

1. Try out tools like SQL Virtual Restore to allow you to “mount” a backup from a remote server avoiding the space issue all together. This is, of course, a solution that requires a license (no budget maybe?)

2. Use the great features offered to you by SQL Server Developer Edition: data compression to the rescue!

So of course, I chose door number 2.

As you may know, Developer Edition offers the same features as Enterprise Edition with the caveat that the instance cannot be used for production purposes in any way. This means that the seriously powerful and useful data compression becomes available for your test servers! This counts even if you only use Standard Edition in production – joy of joys! The good thing being that Developer Edition only costs around $50 (or less if you have an MSDN subscription) – even more joys!

If you have never seen/used compression take a quick look over on MSDN to see what it all is (Data Compression). Basically, you can compress data in indexes and tables at the storage level avoiding any query rewrites and still profiting from the storage savings. This can be a major advantage in terms of raw storage needs, but can also benefit you in high read environments with low RAM sizes. The data remains compressed when held in the buffer pool and is only de-compressed when being accessed. This means that you can keep more data in the buffer pool and reduce hard disk I/O (obviously not for free, compressing and de-compressing costs CPU cycles).  This may be acceptable on a test system with extremely constrained storage space.

The usage in my example scenario is now an even better proposition, not only can I reduce my storage footprint, I can also potentially increase test system performance on the I/O side of the equation (who doesn’t have I/O issues, especially on a test box!).

The next hurdle is of course identifying which indexes and tables that are in the database you want to squeeze down. This is possible via SSMS’ object explorer, but only if you want to spend an eternity doing so! The best way is to take a look at the meta-data tables/views/functions to interrogate the system objects. These are really interesting on their own (at least they are to me), but after writing my Index Script Creator, I realised the potential for using the information in these system objects to allow me to automate certain tasks like this one.

Similarly to the Index Script Creator, my Index Compressor Script (uploaded as a doc  but it is a .sql file really – stupid wordpress restrictions!) runs off and investigates all tables and indexes (down to the partition level) and then generates an index rebuild command to compress any currently non-compressed partitions of indexes / tables. The resulting commands can then be run and space can be reclaimed – just be aware that rebuilding indexes does cause log activity.

After compressing your tables and indexes you should have a much smaller amount of data (depending on how well your data compresses) and be able to reclaim some disk space.

I realise that this method will mean that you have to have a large enough disk to have the database in its uncompressed state to begin with, but you can end up with a cost effective solution to a budget-constrained test environment.

Happy zipping!

See me speak at SQL Saturday 194 in Exeter


I'm Speaking at SQL Saturday 194

After a long time and plenty of prodding by a few SQL People I have finally submitted and been chosen to speak at SQL Saturday 194 in Exeter on the 9th of March.

My talk: Database Replication – What, How and Why will be about SQL Server Replication (surprise, surprise). I’ll be trying to explain the ins and outs of how replication works inside SQL Server and go into what components make up replication. I will finish up explaining when/why you would use replication.

I chose this topic because I work with replication on a daily basis and it seems to be a rare topic at conferences.  The reasoning for that could be that a) no-one uses replication or b) no-one wants to talk about it.  Either way, I was chosen and that means I am now knuckling down and trying to patch together a semi-coherent presentation on the topic.

Luckily I have a few people that are going to be there offering moral support (I hope). I am also on the volunteer team so that I can busy myself before my session to hopefully keep my nerves in check.

If you haven’t already booked and you are able to make it down to Exeter, take a look at the schedule and I am sure you’ll be convinced that it would be a good use of your time. Just look at the speaker line-up, some of the biggest and best names from around the world will be speaking, so please register and come and see some excellent content…… for free!

SQL Saturday 194 in Exeter

When COUNT() isn’t the only way to count


I have come across a situation a number of times in the past that seems to be one of those things that are so obvious when you see the solution, but can’t see them before the penny has dropped.

Imagine the following scenario:

You want to find the total number of orders that have the Order Status ‘A’ and the number of orders with an Order Status of ‘B’. This sounds like a simple enough request, that I’m sure you have heard of before.

Lets start off with some test data.

--Test Structure
USE master
go
IF DB_ID('Sandbox') IS NULL
BEGIN
    CREATE DATABASE Sandbox
END
GO

USE Sandbox
GO
IF OBJECT_ID('dbo.CountExample') IS NOT NULL
BEGIN
    DROP TABLE dbo.CountExample
END
GO
IF OBJECT_ID('dbo.Nums') IS NOT NULL
BEGIN
    DROP FUNCTION dbo.Nums
END
GO
-- Test Function to allow fast test data creation
CREATE FUNCTION [dbo].[Nums] (@m AS bigint)
RETURNS TABLE
AS
RETURN
WITH t0
AS (SELECT n = 1
UNION ALL
SELECT n = 1),
t1
AS (SELECT n = 1
FROM t0 AS a,
t0 AS b),
t2
AS (SELECT n = 1
FROM t1 AS a,
t1 AS b),
t3
AS (SELECT n = 1
FROM t2 AS a,
t2 AS b),
t4
AS (SELECT n = 1
FROM t3 AS a,
t3 AS b),
t5
AS (SELECT n = 1
FROM t4 AS a,
t4 AS b),
results
AS (SELECT ROW_NUMBER() OVER (ORDER BY n) AS n
FROM t5)
SELECT n
FROM results
WHERE n <= @m

GO
CREATE TABLE dbo.CountExample
(OrderId int NOT NULL,
OrderStatus char(1) NOT NULL)

GO

--Test data
INSERT INTO dbo.CountExample
(OrderId,
OrderStatus)
SELECT n,
CHAR(n % 27 + 64)
FROM dbo.Nums (1000) AS N
GO

Now that we have some test data and tables, we can take a look at what solutions are possible.

Solution 1:

The solution that I have seen come from a lot of people has been to basically run two queries, one for each Order Stautus and then collect these together returning the result.

Something along the lines of:

SELECT (SELECT COUNT(*) CountA
        FROM dbo.CountExample AS CE
        WHERE OrderStatus = 'A') CountA,
       (SELECT COUNT(*) CountB
        FROM dbo.CountExample AS CE
        WHERE OrderStatus = 'B') CountB

This delivers the correct result, but causes two separate queries to be run (one for each Order Status). There are variations of this solution, using sub-queries or CTEs, but I hope you get the idea that a separate COUNT() is required for each total that you want to calculate.

Solution 2:

The best way, that I know of, to achieve this would be to change the logic from a COUNT() to a SUM(). This sounds wrong at first, especially because the column Order Status is a char(1) and not an integer!

Take a look at how I have solved the problem with SUM():

 
SELECT SUM(CASE WHEN OrderStatus = 'A' THEN 1 ELSE 0 END) CountA,
       SUM(CASE WHEN OrderStatus = 'B' THEN 1 ELSE 0 END) CountB
FROM dbo.CountExample AS CE

Looking at the code, we can see that I have not just used SUM(), but also a CASE statement. CASE is one of my favourite constructs in T-SQL, as it allows you to perform logical processing of an entire set or only part of a set without filtering using a WHERE clause.

If you take a look at the execution plan, you will also see that the table is accessed once. This is an instant improvement over the “standard” solution of COUNT()-ing per Order Status and has the added bonus of never being noticeably more expensive, regardless of how many different Order Status totals are required.

So there you go.  COUNT() isn’t always the best way to count data in SQL Server.

SQLRoadtrip, part the 6th


My #sqlroadtrip is now over (part 1, part 2, part 3, part 4 and part 5). I have had 2 1/2 weeks of offline time, where I ignored technology as far as I could and spent some much needed time with my family. During that time I had two more flights and a hotel stay, although a much more relaxed version (a week in Sardinia and not a server in sight!).

@Fatherjack (Web|twitter) requested a DBA-typical numbers breakdown of my trip, I have attempted to oblige him here:

38 days
20 flights
15 countries
11 servers
11 hotels
5 continents
4 power cuts
1 case of food poisoning

In that time, I racked up 37229 miles (according to www.gcmap.com) and reached the “Frequent Flyer” status with Lufthansa. I had a total flight time of approximately 80 hours (according to Lufthansa’s site), with the shortest flight being 150 miles and the longest 6526 miles.

I am happy to say, the rollout went well. The systems have been up an running since, and even survived a 5 day outage in one of the offices. The office came back online and synched up within a few hours, I was most surprised at that!

I am now easing myself back into my daily work, with an eye on my next trip to the PASS Summit in October – I hope to put my new Lufthansa status to use, maybe even upgrade this time round!

P.S. A final picture update is now online, with a few sites from Rio. Really nice city, if it wasn’t for all the smog!!

Be prepared!


While I was on my recent trans-global sojourn I came to the conclusion that the boy scouts really had a valid point with their motto “Be Prepared”. Of course, I am not meaning it quite in the semi-military sense that Robert Baden-Powell meant, but being prepared before the large system roll-out really saved a lot of time, nerves and money.

The roll-out was a set of reporting servers in a transactional replication setup, pushing operational data from a central server to the outlying branch offices around the world. I spent a great deal of time preparing these systems for the roll-out; standardising the installations, scripting out each step required to set up users, roles, permissions, DB objects, linked servers, jobs etc. This long preparation period was very tedious and I would often lose the motivation to keep at it.

The final payoff for this long drawn-out process has been the actual roll-out at each location. I have been able to arrive at each office, totally jet-lagged and tired from the entire travel regime and basically sit and watch the scripts run through and see the systems come to life. Had I not been prepared, I would have been in a world of pain, trying to remember what needed doing and when whilst fighting off sleep and headaches.

As a side note: A former colleague/mentor told me to save every script I ever run. If you don’t, you’ll need to repeat that script that took you an hour to write the very next day after you deleted it. This task has been made much easier to do thanks to the SSMS Toolspack provided by the more than awesome Mladen Prajdić. His SSMS extension has saved me hours of time when I have written code and accidentally thrown it away, or when SSMS has crashed. Use his tool and donate to the cause!

So, before you start any project, always keep the Scouts motto in mind – “Be prepared”. Hard disk space is so cheap as to be free, how cheap is your time?

The Home Stretch


I am now on the final third of my #sqlroadtrip (part 1, part 2, part 3 and part 4).

My last post saw me in New Zealand where I spent a great 36 hours taking in what I could of Auckland. Unfortunately I didn’t meet up with Dave Dustin, his work got in the way at short notice :( – this didn’t stop me from checking out quite a bit of Auckland, including the SkyTower. I got a great view of the city and then walked around town taking in the sights and sounds.

The evening flight from Auckland to Los Angeles ended up being really comfortable. Air New Zealand’s Premium Economy is a major step up from standard economy, offering miles of legroom and the business class menu and winelist (for a snip at $150 on top). I didn’t sleep, but was definitely comfortable for the 12 hour flight. This was followed by a 90 minute layover at LAX, where I was given priority entry into the US so that I could get to my connecting flight on time.

Calgary offered up another uneventful installation, allowing me and a colleague to check out the Banff National Park. It has some great views, with some of the mountain ranges looking like the mountains from the start of my trip in Switzerland (decide for yourself by checking out my web-album – link at the bottom of the page).

As luck would have it, when I checked into my flight from Calgary to Houston, I was offered an upgrade to Continental First Class for the wallet-friendly price of $100. I took them up on the offer and had a very comfortable 4 hour flight down to Houston. I was picked up from the airport by my data-courier colleague who had landed in Houston a couple of hours before me. He had managed to get a nice rental car upgrade because the rental company had messed up the reservation and only had a big SUV left! :) We took off and setup the server in Houston and were treated to a relaxing evening at the local IT manager’s house. We were able to cool off in his pool and enjoyed a few ice-cold beers. This was the first real chance of relaxation since starting the trip all those weeks ago – thank you Volker! After missing the chance to meet Dave Dustin in New Zealand, I got another opportunity to meet an online acquaintance from the ASKSSC forumOleg Netchaev – for lunch in Houston. It was great to put a name to a face.

I then flew down to Veracruz for the next server installation. This was a little touch-and-go, because the hardware had been stuck in customs for a few weeks (it had been sent before I started my trip!). This was sorted out whilst I was in Houston and the server was brought online just in time to get things setup before my flight down to Mexico.

I arrived and finished setting up the server in Veracruz, no problems, everything ran like clockwork. The local IT manager then took us for a quick drive round the city, showing us a few sights and letting us get an idea of what the “real” Mexico is like. We ended up eating some mexican food (although I skipped the pigs brain in tortillas!) and heading down to the Aquarium. Whilst there, we were offered the opportunity to get into a glass box and feed the sharks in the big shark tank. We jumped at the opportunity and spent a fascinating 15 minutes in the water with an oceanologist who speacialises in sharks. That was definitely an experience I will never forget.

From Veracruz I set off to Rio de Janeiro (via Houston – where I am right now) to setup the 11th and final server. This will be a short hop, arriving Thursday and leaving on Sunday, but at this point I really don’t mind. The trip has been really interesting, but extremely challenging, both physically and mentally, and I’m looking forward to going home.

Luckily, I will be arriving in Germany next Monday and taking a 3 week break from work (and possibly technology). I should be able to recharge my batteries and get back to some sort of normal daily routine. My son has been asking if I’m in the planes that fly over our house since the first week of my trip. He seems to be looking forward to me coming home, I was worried he wasn’t bothered or had even forgotten about me :)

So, time to sit back and relax before the second longest flight of this trip. The President’s Club is comfortable and quiet and I’ll be fine for the 4.5 hours I have till boarding.

TTFN!

P.S. I have recently updated my web-album with the pictures taken since my last blog-post, with views from Calgary, Houston and Veracruz.