Info Wars 2: a New Hope

Published: Jan 28, 2021 by luxagen

There’s an idea that’s been kicking around in the recesses of my mind for a year or two, but I couldn’t see how to make it work. I recently realised that it’s actually possible.

First, let me backtrack. For some years now my approach to data integrity (i.e. combatting the surprisingly frequent bit rot) has revolved around md5deep. I’ve been storing logs from that tool with datasets so that I can check later — via another run and a diff — that no bits have flipped in the intervening time. The big problem with this approach has been that these logs quickly become out of date owing to file/directory moves and other day-to-day management, and this means that, even when diffing, there’s the problem of tying everything up across moves in order to check that the content hashes themselves haven’t changed.

“If only”, I thought to myself, “I could store the hash information WITH the files themselves! Then all the moving/renaming in the world won’t matter.” I just didn’t see how to do that.

While trying to research support for the DOS archive attribute via Samba on Linux™ (spoiler alert: there doesn’t seem to be any), I came across the concept of extended file attributes. “Holy mackerel!” I said to myself, “This is it!”

I dusted off my rusty Perl skills and bashed out a prototype with a friend in an hour or two. A week of work in two stints later, I had an initial stab at a tool with real value.

Apart from the benefits of attaching the metadata to the files — no more hash logs or verification difficulties from moves/renames — I’ve quickly found the tool to be a game-changer in other ways too. Firstly, just by separating the long-running initial hashing job from active decisionmaking, it’s proved to be a killer tool for manually deduplicating and organising my files. By using the export feature to generate hash logs for trees, I can use diff to instantly verify that suspected duplicate directories are actually identical and just delete one.

I look forward to doing more work over the next couple of months to polish this product and make it as useful as possible. Watch this space!

Addendum: The tool is named RotKraken.