Macro-optimisations

I’ve used Simple.Data in a few production projects now (and it’s doing a great job so far). It’s not often you actually get to use the software that you write, but when you do, it’s a great opportunity to see it through users’ eyes, and I’ve made a few changes and improvements over the past year as a result.

The most recent project that we’ve used it on at Dot Net Solutions is the Met Office open data thing that was announced on Tuesday. And that forced me to bring forward an optimisation that’s been way down my to-do list, partly because I didn’t really just how much of an optimisation it would be.

The Met Office project involves inserting something like 8 million records a day into a SQL Azure database, which isn’t a huge amount, but enough to need you to be smart about how you do it. The version of Simple.Data that was on NuGet when we started supported bulk inserts, but it wasn’t friendly to the error handling we needed and it assumed it needed to return the inserted records, doing that whole ‘select just-inserted-record’ thing, which is often completely unnecessary.

(So it turns out that when you’re handling TryInvokeMember in a DynamicObject, you can actually find out whether the return value is used by the caller, and not bother if it isn’t. But that’s another blog post.)

Anyway, I tweaked a couple of things and shaved off a fraction of the time it was taking, but it was a small fraction, and things were still far too slow. So we did what we should have done in the first place, and used SqlBulkCopy.

If you haven’t used this (SqlClient-specific) method, you should read up on it and keep it in your mental list of “things that are good that I might need some day”. It lets you prepare a big batch of rows in a DataTable (turns out they’re still good for something) and then insert them in a single operation, and man, it’s quick.

But it’s SQL Server specific, so I couldn’t support it in the generic ADO adapter code.

I’ve exposed a few interfaces in the Simple.Data.Ado assembly which providers can optionally implement if they need to do something a little differently or can do something better. The first instance was ICustomInserter, which is implemented in the Oracle provider to handle fetch-backs in a world without IDENTITY columns. Since then I’ve added more as I went along, and IBulkInserter was one of them because, as I said earlier, I had half a mind to implement this. And now I have.

Anyway, I’ll stop blathering now and just post the comparison code I wrote (measures time to insert 10,000 records, five times) and the before and after results.

Code:

namespace BulkInsertComparison
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using Simple.Data;

    class Program
    {
        static void Main(string[] args)
        {
            var db = Database.OpenConnection("data source=.;initial catalog=BulkInsertTest;integrated security=true");

            for (int i = 0; i < 5; i++)
            {
                Console.WriteLine(TimeInsert(db));
            }
        }

        private static TimeSpan TimeInsert(dynamic db)
        {
            var stopwatch = Stopwatch.StartNew();
            db.Target.Insert(GenerateItems(10000));
            stopwatch.Stop();
            return stopwatch.Elapsed;
        }

        static IEnumerable<Item> GenerateItems(int number)
        {
            for (int i = 0; i < number; i++)
            {
                var guid = Guid.NewGuid();
                yield return new Item(0, guid, guid.ToString("N"));
            }
        }
    }

    class Item
    {
        private readonly int _id;
        private readonly Guid _guid;
        private readonly string _text;

        public Item(int id, Guid guid, string text)
        {
            _id = id;
            _guid = guid;
            _text = text;
        }

        public string Text { get { return _text; } }
        public Guid Guid { get { return _guid; } }
        public int Id { get { return _id; } }
    }
}

Before:

00:00:16.9799819
00:00:17.1971797
00:00:18.0744958
00:00:19.1514537
00:00:17.3798541

After:

00:00:00.4616911 (First run includes MEFing IBulkInserter)
00:00:00.2757802
00:00:00.2852119
00:00:00.2504587
00:00:00.2453277

Totally worth it.

0.12.2 on NuGet now. Mini-roadmap: eager-loading (0.14) and upserts (0.15).

Comments

  1. With eager loading how many levels deep will you cater for. E.g. Parent -> child -> grandchild ? Will you also cater for distinct hydration? Can’t wait to find out..

Trackbacks

  1. [...] Macro-optimisations – Merk Rendle discusses using his own product, discussing some of the optimisations and improvements which came out of using Simple.Data on a number of his real world commercial projects. [...]

  2. [...] Macro-optimisations and Yes, Simple.Data protects against SQL Injection (Mark Rendle) [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,742 other followers

%d bloggers like this: