Lightning-fast coding

Published: Aug 27, 2018 by luxagen

Modern computers are fast. Unbelieveably, mind-blowingly fast.

You might not be very aware of this. I’m a programmer and even I forget it most of the time, usually because a simple website I’m using is stuffed full of unnecessary JavaScript frameworks, or the desktop software I’m using is layered on too many heavyweight APIs. Deep, inefficient technology stacks are everywhere these days.

Much of the time, though, that isn’t the whole problem; it’s the fundamentals too. When I implemented two features for Drummer - my C++-based rhythm-programming software for Windows - their sheer blazing speed reminded me how much details matter.

One feature was rendering: the process of taking all the user’s rhythm programming and spitting out either a single mixed audio file or a parallel set of them (one per instrument) for inclusion in your DAW project. As an example, a rhythm track for a 5-minute song with 10 monophonic instruments involves generating (at a sample rate of 48 kHz) about half a gigabyte of data. Drummer can render that to a WAV file, or a set of them, in less than half a second on a modern machine with a typical cheap SATA SSD.

The second feature was the full-text query system for finding sound samples in your collection based on their metadata. It builds the indices from scratch for 4,500 sample files in 7 seconds on the same SSD, which is pretty neat, but searching is what really shines: a results list matching a keyword string takes about 100 microseconds to generate - that’s a tenth of a millisecond.

How did I achieve this? Through avoiding a few common habits:

unnecessary in-memory data copying;
on-the-fly allocation of data buffers in stream-processing loops;
direct processing of sparse data (i.e. consisting mostly of zeroes);
too many buffer scans during compound processing of data streams.

Other languages

This might all sound very C++-centric, but that’s not true. C# and many other languages can achieve nearly the same performance, and I’ve used these principles to make similarly hair-raising improvements to speed-critical web-service code in my day-to-day contract development. The biggest pitfall is the garbage collector: it frees you from explicit ownership concerns, at least for read-only data, but places some inconvenient constraints on memory layout and allocation strategy that can cause locality to suffer. If your buffers aren’t unnecessarily huge and you’re not allocating them on the fly, though, locality matters much less thanks to the CPU cache.

While C# now has unsafe and pointers to mitigate these problems, there’s a good argument that using them can badly compromise your code’s developer-compatibility, always a concern in attrition-prone organisations. The good news is that there’s usually no need to go this low-level unless you’re squeezing the last few percent on performance. With careful design, even vanilla “safe” C# can be much faster than you might think.