Mark Rendle, Freelance Consultant

As some of you may know, for some time I have been considering starting my own venture, and I’m pleased to announce that I’m taking the first step on this journey. Starting in March 2012 I will be reducing my commitment to my current employer, Dot Net Solutions, and spending the remainder of my time working as an independent software development consultant, focusing on both Cloud Computing and Software Quality. Work on a web site is underway, which will outline the full set of services I will be offering.

This does not, however, mark in any way an end to my relationship with Dot Net Solutions. I will remain in my current position as Principal Consultant and will continue to work closely on major projects and supporting Microsoft as an active member of the VTSP (Virtual Technology Specialist Program) performing consulting and conducting ADSs (Architecture Design Sessions).

From my perspective, there are things one cannot achieve as a singleton, and this continuing relationship will enable me to pursue potential work that would otherwise be out of reach.

Dot Net Solutions has been a great company to work for, and over the last two years they have been very supportive of my community and OSS work. I have learned a great deal, and I like to think that I have made a valuable contribution too. As a company, DNS are in an excellent position to grow this year and beyond, and I am very excited to remain part of that process.

If you would like to discuss what I can do for your project, team, or company, please use the Contact Me page to get in touch.

Eager-loading RFC

There are two more features I want to get into Simple.Data before releasing 1.0beta1: upserts, and eager loading. Upserts are pretty straight-forward, but there are a few different approaches to eager loading and I want to get some input from the community before I pick one.

In case you don’t know, eager loading is a pattern for getting data from the database in an optimal fashion, with the minimum number of requests or least-possible server load. It’s the opposite of lazy loading, which Simple.Data already does.

The implementation syntax will look like this:

    IEnumerable<Customer> customers = _db.Customers.FindAllByRegion("South")
                .With(_db.Customers.Orders)                .Cast<Customer>();

This will create the customer objects and populate an ICollection<Order> Orders property in each one, assuming there is one.

There are two directions in which a class may have relationships with other classes: it may have “parent” (or “lookup”) classes, where there is at most a single instance of that parent class within the instance of the class in question; and it may have “child” classes, where there is a collection of instances of the child class. Eager loading of parent classes is trivial; you just add the necessary join(s) to the master table(s) and marshal the data into the right places in code.

Eager loading of child classes is similarly easy when there is only a single child collection to be loaded; you join to the detail table and then process the results so that a single instance of the main table row is created, and instances of the detail table rows are grouped into collections.

For example, loading user objects with a list of phone numbers:

SELECT User.Id, User.Name, Phone.Number
FROM User
LEFT JOIN Phone ON User.Id = Phone.UserId

Returned rows:

User.Id User.Name Phone.Number
1 Alice 0123456789
1 Alice 0738473284
2 Bob 0129934845
2 Bob 0729478594

From this we create a User object for Alice with her two phone numbers, and an object for Bob with his. Even if there are a lot of rows in the detail table, this is still the preferred way of handling eager loading for this scenario as it involves the fewest round-trips to the database. Often, you can handle a parent-child-child relationship using this algorithm, too, although beyond that point you’re getting a bit scary.

The complexity arises when you want to load two unrelated child collections, say, load Customers with their Invoices and Orders:

SELECT Customer.*, Invoice.*, Order.*
FROM Customer
LEFT JOIN Invoice ON Customer.Id = Invoice.CustomerId
LEFT JOIN Order ON Customer.Id = Order.CustomerId

This creates what is technically called an “outer Cartesian product”, where a row is returned for every possible combination of the rows from the two outer-joined tables. So if you’re grabbing the details for a single valuable customer who has placed a thousand orders and generated a thousand invoices, you get a million rows back from the database. I have a hunch this might not be optimal.

So this is my Request for Comments: what would you like to see done here?

The options are:

  1. Only allow one detail-table With clause to be specified per query. Throw an exception if there is more than one.
  2. Allow multiple detail-table With clauses, and use additional SELECT operations for all except the first.
  3. Add a parameter to the With method allowing the developer to specify the technique on a per-detail-table basis:
  4.     var customers = _db.Customers.FindAllByRegion("South")
                    .With(_db.Customers.Orders, WithUsing.Join)
                    .With(_db.Customers.Orders.Items, WithUsing.Join)
                    .With(_db.Customers.Invoices, WithUsing.ExtraQuery);
    

    If only one detail table is specified, or one detail with a detail below it (e.g. _db.Customers.Orders.Items), the default behaviour when no WithUsing parameter is supplied will be to use JOIN. Multiple detail tables without a WithUsing will cause the ADO adapter to throw an exception (other adapters, such as the one for MongoDB, can use the With clause as they see fit).

I think I’ve rubber ducked here a bit, as Option 3 is looking like a no-brainer, but I’ve typed it all now, so if you do have any comments or suggestions, please do leave them below. I’m planning on starting code for this on Sunday morning, so there’s plenty of time to change my mind.

Yes, Simple.Data protects against SQL Injection

Several people who weren’t following the show at the inception of Simple.Data have asked whether it does anything to prevent SQL Injection attacks. Those of you who remember the start of the project will understand why I’m kind of amused by the question.

I originally built Simple.Data as a proof-of-concept when there was a big web-scuffle over what was then called Microsoft.Data (now WebMatrix.Data). David Fowler, a developer I have since come to respect greatly, posted an example on his blog which used concatenated SQL, and everybody lined up to tear him a new one. I’m a big fan of code over cockfights, so I had a go at implementing a simple data access library which made SQL Injection impossible by eliminating SQL strings. Thus Simple.Data. You told it what you wanted, and provided the necessary values, and then it wrapped them safely up in parameters and executed them. I stuck it on GitHub, and posted about it on my blog.

(In fact, the first version did allow you to execute SQL strings, much like Dapper and Massive, because the functionality was extremely limited otherwise. As I’ve iterated on it, I’ve been able to expose nearly all the required functionality using the dynamic approach, so the only way you can execute SQL directly now is by getting an IDbConnection object from the AdoAdapter and using ADO.NET against it.)

Macro-optimisations

I’ve used Simple.Data in a few production projects now (and it’s doing a great job so far). It’s not often you actually get to use the software that you write, but when you do, it’s a great opportunity to see it through users’ eyes, and I’ve made a few changes and improvements over the past year as a result.

The most recent project that we’ve used it on at Dot Net Solutions is the Met Office open data thing that was announced on Tuesday. And that forced me to bring forward an optimisation that’s been way down my to-do list, partly because I didn’t really just how much of an optimisation it would be.

The Met Office project involves inserting something like 8 million records a day into a SQL Azure database, which isn’t a huge amount, but enough to need you to be smart about how you do it. The version of Simple.Data that was on NuGet when we started supported bulk inserts, but it wasn’t friendly to the error handling we needed and it assumed it needed to return the inserted records, doing that whole ‘select just-inserted-record’ thing, which is often completely unnecessary.

(So it turns out that when you’re handling TryInvokeMember in a DynamicObject, you can actually find out whether the return value is used by the caller, and not bother if it isn’t. But that’s another blog post.)

Anyway, I tweaked a couple of things and shaved off a fraction of the time it was taking, but it was a small fraction, and things were still far too slow. So we did what we should have done in the first place, and used SqlBulkCopy.

If you haven’t used this (SqlClient-specific) method, you should read up on it and keep it in your mental list of “things that are good that I might need some day”. It lets you prepare a big batch of rows in a DataTable (turns out they’re still good for something) and then insert them in a single operation, and man, it’s quick.

But it’s SQL Server specific, so I couldn’t support it in the generic ADO adapter code.

I’ve exposed a few interfaces in the Simple.Data.Ado assembly which providers can optionally implement if they need to do something a little differently or can do something better. The first instance was ICustomInserter, which is implemented in the Oracle provider to handle fetch-backs in a world without IDENTITY columns. Since then I’ve added more as I went along, and IBulkInserter was one of them because, as I said earlier, I had half a mind to implement this. And now I have.

Anyway, I’ll stop blathering now and just post the comparison code I wrote (measures time to insert 10,000 records, five times) and the before and after results.

Code:

namespace BulkInsertComparison
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using Simple.Data;

    class Program
    {
        static void Main(string[] args)
        {
            var db = Database.OpenConnection("data source=.;initial catalog=BulkInsertTest;integrated security=true");

            for (int i = 0; i < 5; i++)
            {
                Console.WriteLine(TimeInsert(db));
            }
        }

        private static TimeSpan TimeInsert(dynamic db)
        {
            var stopwatch = Stopwatch.StartNew();
            db.Target.Insert(GenerateItems(10000));
            stopwatch.Stop();
            return stopwatch.Elapsed;
        }

        static IEnumerable<Item> GenerateItems(int number)
        {
            for (int i = 0; i < number; i++)
            {
                var guid = Guid.NewGuid();
                yield return new Item(0, guid, guid.ToString("N"));
            }
        }
    }

    class Item
    {
        private readonly int _id;
        private readonly Guid _guid;
        private readonly string _text;

        public Item(int id, Guid guid, string text)
        {
            _id = id;
            _guid = guid;
            _text = text;
        }

        public string Text { get { return _text; } }
        public Guid Guid { get { return _guid; } }
        public int Id { get { return _id; } }
    }
}

Before:

00:00:16.9799819
00:00:17.1971797
00:00:18.0744958
00:00:19.1514537
00:00:17.3798541

After:

00:00:00.4616911 (First run includes MEFing IBulkInserter)
00:00:00.2757802
00:00:00.2852119
00:00:00.2504587
00:00:00.2453277

Totally worth it.

0.12.2 on NuGet now. Mini-roadmap: eager-loading (0.14) and upserts (0.15).

Simple.Data for Mono

TL;DR

I’ve got Simple.Data running on Mono 2.10.6. Most tests pass. YMMV. Download it here.

Ritalin version

Simple.Data on Mono is something that people have been asking about, and I’ve been meaning to sort out, pretty much since the project went from a proof-of-concept to an actual OSS product. One way or another I’ve never gotten round to it, but a couple of things have made it seem more relevant. Firstly, there are providers for lots of OSS databases now (MySQL, SQLite, PostgreSQL); secondly, a certain @fekberg has been very persistent in his status update requests, so I’ve taken the time this weekend to make it work.

The challenges

For the most part, I’ve been pleasantly surprised by how easy it’s been. I’ve found what I think is a bug in the Mono implementation of either dynamic or LINQ or the combination of the two, which I’m going to file with Xamarin once I’ve created a simple repro project. I’ve come up with a workaround involving old-school class-based IEnumerable/IEnumerator implementations, and the performance doesn’t seem to be affected, so that’s fine.

The hardest thing about the whole process is that MonoDevelop just isn’t Visual Studio 2010 + ReSharper. It’s not a bad IDE by any standards – I’d still take it over Eclipse any day – but I work day in, day out with VS and jumping into any other IDE just feels like somebody moved all the cheese. Add to that the fact that I’m running it under OSX, so even the standard Windows keyboard shortcuts don’t work, and it’s a bit like running in treacle. As I understand it, the Mono Tools for Visual Studio don’t support 2.10; Miguel de Icaza tells me that they’re working on some awesome new VS tooling, so I’m really looking forward to that.

The other big challenge was testing the ADO adapter against a real database. I only own the SQL Server and SQL Compact providers, so I really wanted to test with SQL Server to keep things simple and let me step-debug if necessary. I was expecting not to have fun with this, but it turned out fine. I run my Windows development environment on my MacBook using Parallels 7, so I set up the Host-Guest networking (easy) and opened inbound port 1433 in Windows Firewall (also easy) and that was it. There were a couple of failing tests, one calling a stored procedure with a DataTable and one involving a scalar function, but the rest just passed. I’m guessing that the majority of people who want to use Simple.Data on Mono will be using one of the OSS DB providers, so hopefully this won’t be a problem.

Releases

I don’t know what the Mono NuGet situation is, so I’ll be releasing Mono builds as tgz downloads from the GitHub project page. For the time being, there are differences between the Mono build of Simple.Data.SqlServer and the Microsoft .NET build, so if you want to use that provider, don’t use the NuGet version.

Right now, there’s a 0.11.4 build in the Downloads section which I hope works. However, I haven’t really exercised it to any great degree, so if you encounter any problems please raise an issue if you think it’s a bug, or ask on the mailing list if you aren’t sure.

If you are one of the developers of an adapter or provider, you might want to test against Mono. If, for any reason, that’s not an option, then let me know and I’ll try to find time to test it for you.

Going forward, I will support (in the OSS sense of the word) both Microsoft .NET and Mono for all releases.

Simple.Data 0.11

Some API changes and enhancements

After the slow-down in development caused by all that InMemoryAdapter stuff, there were a few important things I needed to address quickly. One of these will have broken third-party adapters (but not providers) so let’s talk about that one first.

Get

var db = DatabaseHelper.Open();
var user = db.Users.Get(1);
Assert.AreEqual(1, user.Id);

I’m not really sure why Get wasn’t already there, to be honest. Part of the problem is that it requires a new abstract method internally, and until adapter authors implement that method, their users are stuck on <0.11, which I try to avoid where possible.

For the ADO adapter, Get will use the table’s primary key to construct the query (once; it’s then cached internally, no worries about performance). For the MongoDB adapter, I’d expect it to use the built-in id value that Mongo assigns to all records. Somebody is working on an OData adapter, for which, e.g., Customers.Get(1001) will resolve to the /Customers(1001) URL.

Get is supported in the InMemoryAdapter, but you’ll have to configure the key(s) for each table:

var adapter = new InMemoryAdapter();
adapter.SetKeyColumn("Test", "Id");
Database.UseMockAdapter(adapter);
var db = Database.Open();
db.Test.Insert(Id: 1, Name: "Alice");
var record = db.Test.Get(1);
Assert.IsNotNull(record);
Assert.AreEqual(1, record.Id);
Assert.AreEqual("Alice", record.Name);

Trace configurability

The ADO adapter has been writing all generated SQL to the Trace output at the point of execution for a while now. While this is often very useful, I’ve had a couple of people ask if I could make it turn-off-and-on-able, so I have. You can do this in two ways:

In code:

Database.TraceLevel = TraceLevel.Off;

In config:

<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <sectionGroup name="simpleData"> <section name="simpleDataConfiguration"                type="Simple.Data.SimpleDataConfigurationSection, Simple.Data"/> </sectionGroup> </configSections> <simpleData> <simpleDataConfiguration traceLevel="Error"/> </simpleData> </configuration>

(Gotta love XML.)

ADO SQL output will happen with the trace level set to Info, Warning or Error.

More ADO connection control

I occasionally see how Simple.Data performs compared to other ORM/micro-ORM tools, using the PerformanceTests project from Dapper. I was running this through the other day, and I realised that Simple.Data was losing a lot of time opening and closing connections, while the other test cases were mostly using an open connection for the duration of the test. I’ve had a few comments that they’d like more control over the connection, or that Simple.Data is too aggressive in closing connections, so I decided to improve my standing in the Dapper smack-down and hopefully help some real people out too.

Start using an open connection like this:

SqlConnection connection = Program.GetOpenConnection();
((AdoAdapter) db.GetAdapter()).UseSharedConnection(connection);

And stop using it again like this:

((AdoAdapter) simpleDb.GetAdapter()).StopUsingSharedConnection();

And that’s it. In my performance test project (which is in the solution on Github), this knocked 20-30% off the runtime, with 500 FindById operations taking ~80ms, versus ~50ms using plain ADO.

I’ve also tried to tone down the aggression a little when it comes to closing connections, doing it as soon as possible, instead of (occasionally) before.

Immediate road-map

I want to get the Azure Table Service adapter done, and help with the OData adapter, and I’ve got a cool website to build using Simple.Data and Nancy, so Core will go into maintenance while I do those things for a while. I’ll try to fix any problems in a timely fashion, as usual. If you’ve got anything still outstanding as of 0.11.1 (now on Nuget) please gently remind me on Twitter.

Simple.Data 0.10

Deserves its own post, even though I’m about to write the 0.11 one.

This release took much longer than I would have liked, mainly because I bit off more than I could chew. I had a requirement to provide poly-fills that implement query operations against in-memory datasets, on behalf of adapters for data-stores that don’t support certain operations. Think aggregates, grouping and so on. At some point, it occurred to me that this would make it really easy to create an in-memory adapter which would just have all query functionality implemented by these poly-fills.

Aside: From very early on, Simple.Data has had the XmlMockAdapter which could be used for providing a quick and dirty database for test environments. At the time I wrote that, I wasn’t totally confident in the Simple.Data model, so a way to instantiate that data without using the API seemed important. Since then, of course, the API has become much more stable and I’ve gained confidence, so it seems perfectly OK to use it to set up the test data as well as consume it. Consider XmlMockAdapter deprecated.

Simple.Data.InMemoryAdapter

The InMemoryAdapter lives directly in the Simple.Data namespace in the Simple.Data.dll assembly from the Simple.Data.Core package. It is a first-class component, because testing is a first-class concern.

You can use the adapter like this:

Database.UseMockAdapter(new InMemoryAdapter());
var db = Database.Open();
db.Test.Insert(Id: 1, Name: "Alice");
var record = db.Test.FindById(1);
Assert.IsNotNull(record);
Assert.AreEqual(1, record.Id);
Assert.AreEqual("Alice", record.Name);

That UseMockAdapter method is now directly on the Database class. After calling that, all subsequent calls to any of the Database.Open methods will return a database using that adapter.

Any tables you reference in the database will be “created” if they don’t already exist. Internally, the tables are just lists of dictionaries. Any data you insert will retain its CLR type, so the adapter can handle byte arrays and so on quite happily.

Joins are supported:

var adapter = new InMemoryAdapter();
adapter.ConfigureJoin("Customer", "ID", "Orders", "Order", "CustomerID", "Customer");
Database.UseMockAdapter(adapter);
var db = Database.Open();
db.Customer.Insert(ID: 1, Name: "NASA");
db.Customer.Insert(ID: 2, Name: "ACME");
db.Order.Insert(ID: 1, CustomerID: 1, Date: new DateTime(1997, 1, 12));
db.Order.Insert(ID: 2, CustomerID: 2, Date: new DateTime(2001, 1, 1));

var customers = db.Customer    .FindAll(db.Customer.Orders.Date < new DateTime(1999, 12, 31)).ToList();
Assert.IsNotNull(customers);
Assert.AreEqual(1, customers.Count);

And most query operations work OK.

YMMV: This is probably not perfect, but it was stalling other development so I took the decision to release it. So far in my own projects it has not caused any problems; I’ve had one reported which is fixed in 0.11 (more on that in the next post).

Bug fixes and such

There were a bunch of fixes for various issues in 0.10, most of which were contributed by Richard Hopton, who has also released an ADO provider for Sybase SQL Anywhere.

I’m going 100% digital in 2012

I know it’s early for New Year resolutions, but I’m going to share this one now partly so I don’t forget in six weeks.

I buy a lot of media. I have an entire 6′x3′ bookcase full of technical books, most of which are out of date by at least one version. There’s a cubic hectare of Blu-ray and DVD boxes lying around the place, and I’m pretty sure there are boxes of CDs (remember those?) in the loft somewhere. Probably half the available storage space in my house is taken up with things which would actually fit on a few terabytes of hard disk.

It’s not like everything in the house isn’t already digital. We’ve got a 160GB iPod Classic, iPhones, iPads, Kindles, three laptops, an Xbox 360, a PS3 (with a 320GB HD), a 50Mbps internet connection, and on Saturday we bought an Apple TV.

It’s ridiculous. It has to stop. It ends, starting in January. From then on, where a digital purchase option is available, I shall avail myself of it. I will invest in a home server with a few TB of storage on mirrored drives; I’m thinking of the Lacie 5big Network 2, which will take up to 10TB. Then I can “back up” my movie collection to that, and install iTunes on it for music streaming. Paper books will go to charities and libraries; if I want to read any of them again, I’ll buy them in ebook format. Magazines through Newsstand and papers through the Times iPad app.

Now, I just need Sony and Microsoft to distribute all PS3 and Xbox games digitally, and I need never darken the door of another shop. Not that I do anyway. All this crap comes from Amazon.

Just a quick note about Tests

I’ve been fixing a few bugs in Simple.Data over the last couple of days, and I’m feeling the need to post something about the benefits of a good test suite.

For most feature development on Simple.Data, I do test-driven development. The dynamic nature of my API lends itself really well to this approach; this is one of the real reasons TDD is popular with dynamic languages like Ruby and Python. When I want to add a new Query operator or method, I can write a test that uses the syntax I’m aiming for, and the Behaviour test project will compile and run, and I’ll get a failing test, either with a failed assert or an exception. I really like the latter form of failure, since I get a stack trace and I can just dive into the code at that point and start working out what’s wrong.

So that’s great, but the thing that makes me want to write this post is my “QA” process. When someone reports a bug, like this one, it means my test suite is incomplete. So again, the first thing I do is to create a test which reproduces the bug. Then I fix the code so that test passes. Then I do a Release build, and run the full set of tests (currently 560+ including integration tests) to make sure that the fix hasn’t broken anything else. And then, and this is the really awesome part: if all the tests pass, I package the build and push it to NuGet.

I can do this because I trust those tests to be verifying all the behaviour that Simple.Data users are relying on in their applications. If the tests pass, I’m not going to break anybody’s system when they update to the new version. On the (rare) occasions when this has gone wrong, it’s been because I didn’t have the right tests, and I’ve gone back and added them (would you believe the SQL Server test project didn’t test delete’s until yesterday? Epic fail).

When people look at the Simple.Data repository, and say “wow, you’ve got lots of tests”, it’s as if they think that’s a discipline thing, that I’m just really conscientious about coverage. But that’s not it. I couldn’t do this without the tests. And neither can you, whatever you’re working on.

OK, so I think HTML/JS for Metro apps is a good idea

(Just a quick post while I wait for Anders’s talk to start. Hope he confirms destructuring assignment.)

I turned up to BUILD with a negative attitude to this whole idea of using HTML and JavaScript for creating Windows applications. And I held onto that attitude throughout Tuesday’s keynote and Big Picture sessions. I went back to my hotel that evening and threw together an app which pulls issues from Github and displays them in Metro style, and I did it in C# and XAML. And it worked.

Then I watched yesterday’s session on Using the Windows Runtime (WinRT) from C#, and my attitude changed. It’s very easy to create WinRT components in C# which can be consumed from C++ and JavaScript apps. Really, really easy. You barely have to think about it. And the support in this new Windows JavaScript for consuming WinRT components is very, very good, with binding and everything. You can effectively write an MVVM application, but use HTML and CSS instead of XAML.

This is completely brilliant.

Why?

Well, at Dot Net Solutions we do a lot of WPF and Silverlight work. And we’re cracking engineers, but we’re not designers. We get outside resource in to make our software look beautiful. And finding design studios with excellent XAML and Blend expertise is very, very difficult. But finding design studios with excellent HTML, CSS and JavaScript? Well, that’s much easier.

So my Windows app development paradigm in this brave new Win8 world is going to be this: all the internal logic, the Data Access Layer, the Models and ViewModels and so on can be built in C#, with type safety and unit tests and the performance gains. Then those ViewModels are exposed across the WinRT ABI boundary and bound to HTML views, styled with CSS and with all the UI-specific code (transitions, animations etc) implemented in JS by people whose idea of a good colour scheme isn’t “blue”.

And if there’s performance-critical code, I’m going to write it in C++ using the AMP library.

Right. I’m going to watch this talk, then I’m refactoring my Github app and do the UI in HTML.

Follow

Get every new post delivered to your Inbox.