Published: Aug 2, 2016 by luxagen
I learned some time ago that my Roland D-70 (a keyboard synthesiser from the early ’90s that showed up on a lot of classic dance tunes during that decade) is a completely digital machine that outputs two stereo audio streams at 32 kHz and 16 bits. The schematics in the service manual also make it clear that these are output by time-multiplexing a single Burr-Brown PCM56 DAC at 128 kHz, resulting in a spurious quarter-sample delay between the left and right channel of each stereo pair.
In order to correct this and get pristine digital transfers, I began to research DIY projects for generating S/PDIF signals and came across this article. It was a great start, but had a problem: my Edirol FA-101 audio interface can’t run at 32 kHz, preventing lossless transfers at that rate. One solution is to pad the 32 kHz samples with zeroes to get a 96 kHz signal; one can then simply drop the zeroes later.
This created a further danger for an encoder, though: running out of CPU time. Although my Propeller Quickstart board can run at 80 MHz, integration with the D-70 would require multiplying it from the latter’s 32 MHz master clock in order to synchronously sample its audio bus, leaving 64 MHz as the maximum clock speed once integrated. Since Micah’s code only had to work at 48 kHz, there was no guarantee that it would manage 96.
Let’s look at this in more detail: how many symbol bits do we need to output for each stereo S/PDIF frame? There are two channels, and each sample is encoded in 32 bits, so 64 are needed for stereo, but S/PDIF’s biphase mark coding (BMC) scheme uses two symbol bits to represent each data bit. We therefore need to output 128 bits of binary signal for each frame of stereo audio.
To quote the above-mentioned article, “I just use the fastest unrolled loop I can to perform the biphase mark encoding in two instructions per bit”. Even at only 16 bits per sample, a stereo 48 kHz signal requires 1.536 million bits per second; at two instructions per bit, that’s 3.072 million BMC-encoding instructions per second. In reality, of course, S/PDIF frames can store 24 data bits and 4 subcode bits, all of which must be BMC-encoded, so unless we’re prepared to limit the code to 16-bit samples in exchange for a mild efficiency gain, we’re really looking at 2.688 million bits (5.376 million instructions) per second. Since the Propeller tends to run most instructions in 4 clock cycles, this means that Micah’s solution spends at least 21.504 million clocks per second just doing BMC encoding, and that’s before we allow for getting the samples from somewhere and formatting them into S/PDIF words. The Propeller’s handy video generator, which allows a binary signal to be buffered for automatic output 32 bits at a time, makes the problem of outputting the final signal tractable, but at these data rates even that simple job will eat quite a few clock cycles too.
All this drove me to one conclusion: there was no way Micah’s code would run at 96 kHz without severely overclocking the Propeller. Unless I wanted to start using heatsinks and fans, I’d have to start from scratch in Propeller assembly and do a much more efficient job.
Luckily, there was room for improvement: Micah points out in her article that lookup tables would greatly improve efficiency compared to bit-by-bit encoding. It would be important to put the table in cog-local memory for speed, though; with a total program+data limit of 2 kB, the most practical lookup table was therefore one to translate 8-bit bytes to 16-bit signal sequences. 256 2-byte entries would take up 512 bytes – only a quarter of the quota.
In addition, her statement that “biphase mark encoding is not parallelisable” is only true in the strict sense that there are trivial sign dependencies; while it wouldn’t be worthwhile to multithread the encoding, having eight 8-bit bytes in a stereo S/PDIF frame would make it possible to write a tuned code sequence that works the sign propagation into convenient places. The final complication – that the frame preamble is not in BMC form and its endpoint lies at an inconvenient place in the frame word – could be addressed fairly elegantly through frame-preprocessing logic, allowing for pure lookup-based BMC.
I wrote a small C++ program to generate the Propeller assembly directives for the table, not so much to save effort as to avoid the inevitable manual mistakes; there was also no need to optimise such a trivial program, so I deliberately wrote it in an inefficient and idiot-proof style, with plenty of sanity checking. Armed with this table, I wrote the Propeller assembly that would accept two 24-bit samples, add the other frame components, BMC-encode the frame, and propagate sign changes using a state variable.
Since the FA-101 will happily accept a 5V signal on its front inputs without overload, and supports a 192 kHz sample rate, I installed both a TOSLINK module and an RCA S/PDIF connector on my test board. I could generate 48 kHz test signals, record the TOSLINK signal exactly, and verify the sample values in Audition; when bugs showed up, I could switch the sample rate to a small factor of 192 kHz, record the electrical S/PDIF signal itself, and use Audition as a digital oscilloscope to diagnose the problem. I thus got the new code working despite Propeller assembly’s nonexistent debugging facilities.
After all this effort, though, there still weren’t quite enough clock cycles remaining in the loop to actually output the signal; at the high signalling rate required for 96 kHz audio, even the small number of instructions required to run the video generator would overrun what was left of the instruction budget. What to do?
I reluctantly threw the S/PDIF frame formatting out of the loop, and was able to keep the BMC encoding and output on the same cog; this meant that the data buffers supplied to the cog via shared memory would have to contain preformatted S/PDIF frames. This was annoying, but it actually made the solution more generic, allowing for things like AES, 4-channel modes, and subcode control without complicating the core code. It also didn’t take too much processing time to do this on a second cog; despite being written in Spin, my test-signal code could both generate and format the frames at the required rate.
That finally did it: I had a streaming S/PDIF solution that would encode and output 24/96 audio via one or more pins – all on two cogs of a 64 MHz Propeller chip. Although the D-70 only provides 16-bit samples, I implemented 24-bit support because I could, and in case it was more useful to others.
I acquired the hardware needed to interface to the D-70’s mainboard, and succeeded in running the QuickStart board at 64 MHz by doubling the D-70’s master clock, but the project stumbled in the integration stage because I couldn’t get a clean signal from the audio bus. This was presumably because of improper termination, and since my makeshift oscilloscope didn’t have enough bandwidth to analyse a 128 kHz signal, the project went on hiatus pending access to a real oscilloscope.
Having produced a high-performance generic S/PDIF solution, though, I decided today that it was time to get the work so far online regardless. The repository is now on Github, licensed under an
ncurses (MIT-style) licence.
I hope somebody finds this useful, and there’s plenty of reference material on the unfinished integration in there for anyone interested in equipping their own D-70 with digital output – please let me know if you manage it!