mid's site

you're logged in as loser

🌍 Go Paperful

🔗 Subscribe via RSS

The sound of Brainfuck running the ChaCha20 cipher

ChaCha20 is an electronic cipher thought to be secure against advanced adversaries. It computes a pseudo-random bitstream, the keystream, by operating on a 4x4 matrix of 32-bit integers.

The core consists of the quarter-round function, which takes in four variables and kneads them like so:

a = a plus b, d = d xor a, d = d rol 16
c = c plus d, b = b xor c, b = b rol 12
a = a plus b, d = d xor a, d = d rol 8 
c = c plus d, b = b xor c, b = b rol 7 

This quarter-round function is applied to each column of the matrix, giving you one round. The second round applies the same quarter-round function to each diagonal of the matrix. Subsequent rounds alternate between columns and diagonals.

Initially, this matrix holds the key, block counter, nonce (one-time number) and a cipher-wide constant. In ChaCha20, 20 such rounds are applied to the matrix. Because all of these rounds are reversible, the last step adds the initial matrix to the final. The matrix at the end comprises the keystream. This keystream is then XOR'd with the plaintext, producing the ciphertext.

Brainfuck needs no introduction. It's minimalism and immature name has made it known to everyone. The language is composed of a tape for memory, input & output streams and eight instructions to control them.

  • < and > scroll through the tape
  • + and - in/decrement the value at the tape head
  • . and , outputs / reads into the value at the tape head
  • [ and ] loop the code inbetween while the value at the tape head is non-zero

But what about combining the two? After writing over 140K Brainfuck instructions, I present the Brainfuck ChaCha encryptor. I recommend to start it now. The source is available here.

View

Speed

0 instructions executed

a
b
c
d

Hit start to begin

The core requires 85 octets of memory: 1 as a double-round counter, 20 for temporary data, and 64 for the state matrix. Actual encryption would require a bit more for the initial matrix.

The features of Brainfuck are enough to allow for turing-completeness, a property, which, poorly stated, means that anything computable can be found, given enough time and memory. In practice, however, turing-completeness brings not much. You do not need turing-completeness to write a ChaCha20 implementation, a pathtracer, a keyboard controller. We almost always deal with such practically finite problems.

On the other hand, turing-completeness brings with it bugs in the form of the Halting Problem: unpredictable execution, memory use, lackluster guarantees to compilers, etc. We do not need graphics accelerators that can crash.

Simply put, turing-completeness does not imply practicality. You'll see what I mean later.

ChaCha20 and its predecessor Salsa20 are what are known as ARX ciphers, which is to say that they use only addition, rotation and XOR for operation. This naturally makes ChaCha20 software-friendly, unlike AES which needs hardware acceleration to make it competitive speed-wise. Brainfuck, though, only has byte-sized increments and decrements, making it friendly to neither. I still went with ChaCha20 for its ubiquity, and because a cipher with only addition has very little security. As you will see, one full round of ChaCha20 already makes it random to the human eye, but we are only confident in it's security from 8 rounds onward.

Despite Brainfuck having increments, addition is the slowest part. I am to blame, because the algorithm is mine own. It sums the most significant octets, and then sums the next most significant with one layer of overflow, and then the third octets with two layers of overflow, and so on. Each increment needs an overflow test so that it can be carried to the next octet, and the particular test I chose to use needs copying the octet twice each time (that's the "pew.. pew pew pew" sound you'll sometimes hear). I'm sure it can be done better, but I tried to minimize outside influence.

Furthermore, for a variable input Brainfuck wildly varies in execution time. Consider the code [-]. If the tape head points at a non-zero value, we hit - and decrement said value. After that, we hit ], making it jump back to the [ if the value remains non-zero. This loops until we get zero, but it takes longer to get to 0 from 255 than from 1. For this same reason copying larger values is slower, too, even though all these values take the same amount of memory. An Brainfuck implementation might notice some patterns and optimize them, but there nevertheless remains a catch.

Those astute in InfoSec or cryptography will know what I mean. Let me confirm your hunch: basically nothing is constant-time. This theoretically makes the implementation rife with timing-based informational leaks. Dare I say there are so many, that absolutely nobody will bother with analysis? I'm joking, but it's an interesting thought. I'm no cryptographer. Anyway, such a flaw in general makes online use discouraged, like for securing the Brainfuck static HTTP server. Don't blame me for that.

In ChaCha, half of the rounds operate on the columns and half operate on the diagonals of the matrix. Here you are shown only one full round, because it would take too long otherwise. Beef needs a full 0.3 seconds - at full speed - to go through one round on my computer. Individual operations were written in a self-contained manner, so there are quite a bit of redundant copies.

The cipher core is then used in a utility program, sest.bf, that initially takes a key as input, and then continually takes in plaintext while outputting the ciphertext. Another catch is that because cells can only store 0..255, there is no pleasant way for the , instruction, which reads from the input stream, to signify EoF. Some implementations give zero, which works fine for readable text but not for binary data. Some give -1, which has other pros and cons. Some don't update the cell at all. In the end I made the utility program take in input continually, until explicitly killed. ChaCha20 does not require some "termination signal", so I decided that was the best idea.

This utility program can be standalone, but IO becomes rather unpleasant compared to the traditional Unix terminal. For that, there is also a bash script sest.bash that wraps the utility program.

A slight disclaimer is that the whole program does only eight rounds, not 20. The difference in code is minute as the number is determined by one tape cell, but should you ever use this you ought to know.

How long did it take to write this?

The first quarter-round took about two days, but once I realized that eight of the twelve statements in each quarter-round were the exact same, I managed to copy & paste a lot of it. After that I would be able to do up to one QR per day. Even so, I didn't to prevent any mental fatigue. After all of that came the bug-fixing which took about two days. So I'd say over a week, excluding breaks.

What is that sound?

That is a sine wave, the frequency of which, upon running either a + or - instruction, is set to the new value. Specifically, (1 + val / 255) * 220. This makes the wave go between the A3 and A4 notes. I considered putting in another wave that would flash upon scrolling, but decided not to. It'd be too much of a clutter.

Why did I make this?

I have a belief, that to a good worksman, the quality of his work should not depend on his choice of tools. I was recently challenged on this belief.

Yes, in other words I made this to prove someone wrong.

There's a lot of repetition in the code. Why didn't you use a Brainfuck precompiler with macro support?

If I had, I wouldn't have written a Brainfuck program, as was required. I would've perhaps made one, but I wouldn't have been the one to write it. That would end up being the job of the precompiler. If I had used a precompiler, would this have been as impressive? How much of it would've been my work? If I had written this in C and used C2BF, would this be worth even five seconds of your time, or five units of anything?

Let me exaggarate my point: imagine houses A and B, which look like this, respectively:

House A was made by a master of his trade, who placed each brick with the utmost care and patience, his sweat now within the walls, solidifying his labor and presence for centuries to come. He has fulfilled his passion, and via his humanly ways has given a family their new home. House B was made by a "House Artist", who plopped his AI robot on the ground before hitting the Start button, and walking away to get a coffee or spend the rest of his time in his smartphone.

Tell me, which is the better house? And, coming back to my last point, who is the better builder?

If you ask me, the man responsible for house B didn't build anything. He is at best a secretary, at worst a disposable nobody. Would he kill himself, no one on the planet will notice. Therefore, house B is invaluable. As in it's shit.

FYI, I still have doubts for using Ctrl+C & Ctrl+V, so give me a break.