Info

http://zach.e-worm.club

employ me so I can continue to eat: http://zach.digital

{

title: Cryptography is not Magic

description: Crypto is often treated like a black art only anointed

experts can hope to wield safely. It's not. There are

laws, and we can learn them.

}

July 2020

Don't roll your own crypto.

Broken crypto often looks like it works.

The best way to build crypto is to learn how to break it.

Use existing, vetted, well reviewed libraries.

Leave it to the experts.

*Infosec people and their followers.*

The running theme seems to be that cryptography is a kind of Dark Magic,

best left to anointed High Priests. Us mere Mortals cannot hope to

wield it safely without first becoming one of those vaunted Experts — a

futile endeavour for those of us who know their place.

While being a good first order approximation,

that kind of gate keeping is problematic on an number of levels. First,

some people back in the real world have needs that current crypto

systems don't always fill. Insulting them with "you don't know what you

are doing" does not help. Second, it has a couple perverse effects:

- It stops many reasonable people from studying the subject.
- It causes less reasonable people to charge ahead anyway.
- If cryptography is an Art, we could make things up as we go along. And invent "unbreakable" ciphers that are anything but.

I would like to advocate a different approach. Cryptography has rules,

which are both simpler and more fundamental than we might think. We

could tell people *what* they don't know, and give them some pointers:

introductory books or courses, or specific words and

concepts. Here, I will outline what you need to look for before you

roll your own crypto system. I've identified 3 broad categories:

implementing crypto, designing protocols, and designing primitives.

*Note: this is an opinion piece, so let my clarify where I came from. I
program professionally since 2007, and started to teach myself
cryptography 4 years ago. I wrote a crypto library, whose
recent audit uncovered no major flaw.*

Perhaps surprisingly, implementing cryptographic primitives & protocols

requires little cryptographic knowledge. *Choosing* what to implement

is more delicate (you need to know the state of the art), but once

you've made your choice, your only worries are side channels and

correctness.

Side channels have one rule: don't let any data flow from secrets to

that channel. Interesting side channels include timings, energy

consumption, electromagnetic emissions… Most can be ignored most of the

time (energy consumption for instance requires physical access), except

timings. Never ignore timings, they're part of most threat models.

On most CPUs, the timing side channel can be eliminated by removing all

secret dependent branches and all secret dependent indices. Some CPUs

also have variable time arithmetic operations. Watch out for

multiplications and shifts by variable amounts in particular.

Also be careful around compilers and interpreters. No mainstream

language specifies how to make constant time code, so your tools could

insert secret dependent branches where the source code had none. High

level languages in particular tend to have variable time arithmetic. In

practice, low level languages like C are fairly reasonable, but I've

seen exceptions.

Note that some primitives are more amenable to constant time software

implementations than others: Chacha20 tends to be naturally immune to

timing attacks on most platforms, while AES requires special care if you

don't have hardware support.

Then we have correctness. The slightest error may throw cryptographic

guarantees out the window, so we cannot tolerate errors. Your code must

be bug free, period. It's not easy, but it *is* simple: it's all about

*tests* and *proofs*.

Ciphers and hashes are fairly easy to test. At their core, they are

about taking an input, and mangle it so thoroughly that the slightest

change in the input will completely garble the output. In practice, the

slightest programming error tend to change the end result completely.

All you have to do is compare the output of your primitive with a

reference implementation or test vectors. Ideally, do that for all

possible input & output lengths. *Including zero (empty inputs)*.

Practically, you can stop at a couple iterations of your biggest

internal loop, to be sure you hit all data paths.

If you have an init-update-final interface, try all possible cut-off

points in the updates to make sure you get the same results. I've

caught several bugs that way.

Modular arithmetic (polynomial hashes, elliptic curves…) is more

delicate. Now we're adding or multiplying big numbers (130 bits, 255

bits…) that do not fit in a single machine register. Many libraries

don't run in constant time, and custom code is often faster. So you

end up dividing those big numbers in several *limbs*, which can fit in a

single machine register.

*(You can think of a limb as a huge digit: instead of working in base
10, we can work in base 2³², and store the corresponding "digits" in
32-bit registers.)*

Once you've applied the precautions used for ciphers and hashes, the

biggest trap is *limb overflow*. Not a problem with naive

implementations, but there is one dangerous trick that often drastically

improves performance: *delayed carry propagation*. Make your limbs

smaller than the range of the registers they are stored into (26 bits

limbs with 32 bits registers for instance), so that they can temporarily

exceed their normal range.

With this trick, some overflow bugs happen rarely enough that random

tests will not catch them. There are two ways to address the problem:

either don't delay carry propagation to begin with (it's safest), or

write a mathematical proof that shows your algorithm never triggers an

overflow. Ideally you'd have a machine verify that proof.

Elliptic curves have their own traps. If you have explicit formulas

telling you what arithmetic operation to perform in what order, no

problem (beyond comparing to a reference implementation and limb

overflow). If you want to be clever, make sure you know *exactly* what

you are doing. If you see some mathematical term you haven't seen

before (like "birational equivalence"), don't guess, *look it up*.

I broke that rule once, and it wasn't pretty.

Also note that Elliptic curves often require conditional selection,

swapping, or even lookups. There are techniques to do that in constant

time. Use them. Failing to do so has enabled actual exploits.

Constructions and protocols are generally much simpler to implement than

their underlying primitives. The same precautions applies, with a

couple differences:

- Don't forget the checks. If the protocol says you must abort upon a
certain condition, you
*absolutely must*. Failing to check something almost certainly destroys the security properties of the protocol or construction you are implementing. - Try the protocol (or construction) with corrupted inputs. Corrupt
every single byte, perhaps every single
*bit*, one by one, and test whether this causes the protocol to abort. If it doesn't, you may have forgotten a check. - That's a corollary from "no conditional branches", but it bears repeating: when you compare buffers, use constant time code. Crypto libraries generally provide the comparison routines you need.

That's where things become interesting. Where implementing crypto was

mostly tedious, designing protocols is *delicate*. One is not

necessarily harder than the other, but they're very different.

This is no longer about faithfully implementing an unambiguous

specification. This is about addressing a *threat model*. This means

being aware of what you're up against, addressing that threat, and

proving that you have done so. On top of that, the protocol should

allow, even encourage, good APIs that are hard to misuse.

We need to realistically and *precisely* define the capabilities of the

adversary. A typical threat model is the untrusted network, where the

adversary can basically do anything: watch messages, intercept

messages, forge messages, replay messages…

Some adversaries can also steal your keys. Maybe the police comes with

a warrant. Maybe your computer is being hacked. You can limit the

damage with forward secrecy (messages sent before the key was stolen

can't be decrypted), and *key compromise impersonation resistance*

(where stealing your keys don't allow the attacker to impersonate

*others* when they are talking *to you*).

Yet another threat is meta data analysis. Maybe you want to hide your

identity to outsiders. Some protocols achieve some measure of

*anonymity*, where it is hard for the attacker to determine who is

talking to whom.

Or maybe you fear *traffic analysis*, where the size and timings of the

messages reveal too much about the content. You could mitigate that by

padding your messages to some standard size, or even send a constant

stream of data, and fill the bandwidth you don't use with noise.

Once you know your threat model, you need to demonstrate that your

protocol addresses it. For instance, when the adversary is an untrusted

network, you need IND-CCA2 (indistinguishability under adaptive

chosen ciphertext attack). It's a formalisation of what you need to

combat active adversaries (man in the middle). It's defined thus:

- We allow the adversary to ask an
*Oracle*to encrypt or decrypt anything they please. - At any point, the adversary can issue a challenge, where they provide
*two*plaintexts, and the oracle responds with only*one*ciphertext. - IND-CCA2 is achieved if the adversary cannot tell which plaintext the ciphertext is from, without asking for a decryption of that particular ciphertext.

Let's try an example with *passive* adversaries, for which we only need

IND-CPA: independence under chosen *plaintext* attack. It's the same as

IND-CCA2, except we don't have the decryption oracle.

Alice needs to send messages to Bob, using a shared secret key `K0`

, and

a stream cipher `Stream`

that takes a key as input, and outputs an

arbitrarily long key stream. (Real stream ciphers also have a

nonce, but we'll make do without one for the sake of the exercise.)

To send her messages, Alice will use *key erasure*, or *ratchet*:

K1, S1 = Stream(K0)

msg1 = plaintext XOR S1

K2, S2 = Stream(K1)

msg2 = plaintext XOR S2

K3, S3 = Stream(K2)

msg3 = plaintext XOR S3

...

To send a message, we first split the key stream in two parts: a new

encryption key, and a data stream. Note that `K1`

is the first bytes of

`Stream(K0)`

, and `S1`

is the rest, so they're independent from each

other. Same for `K2`

and `S2`

, etc. We can't break that one with the

encryption oracle alone (IND-CPA). Here's why:

- For any
`n`

, if`K(n)`

is an independent random string, then so are`K(n+1)`

and`S(n+1)`

, which come from`Stream(K(n))`

`K0`

is an independent random string.- Corollary:
`S1`

,`S2`

,`S3`

… are all independent random strings. As a whole, they act like a one time pad. - Conclusion: there is no relation whatsoever between different ciphertexts. To the attacker, they all look like independent random strings. When the attacker submits its challenge, the answer will be just as random, and there will be no way to tell which plaintext it came from.

Not the most rigorous proof. That will do for this simple construction

under this simple threat model, but a more complex protocol may not

tolerate as much hand waving. And if you want a machine to check your

proof (a good thing to strive for), it'll have to be perfect.

Another limitation is my choice of the *symbolic model*: I am assuming

the stream cipher is unbreakable, and will always be indistinguishable

from a true random number generator to anyone who doesn't know the key.

Works well enough for most purposes, but keep in mind that real

cryptographic primitives aren't perfect: they can't generate a genuinely

infinite stream without repeating themselves, and a finite amount of

computing work is enough to break them. Those limits are high enough to

be ignored in many cases, but depending on the primitive you use, and

what you use them for, they may not be.

The biggest limitation of all of course is that we only proved IND-CPA

here. This construction is only immune to *passive* attackers, that

only listen and never talk. What we really want is to protect against

*active* attackers, that could perform a full Man in the Middle attack.

A chosen ciphertext attack easily destroys our construction:

- Send the challenge with the plaintexts
`p1`

and`p2`

. Get back the ciphertext`c`

. - Generate some non-null string
`r`

. - Ask the oracle to decrypt
`c XOR r`

, get the plaintext`p`

back. - Compute
`p XOR r`

. It will be equal to`p1`

or`p2`

, depending on which it came from.

We basically tricked the oracle into decrypting the challenge

ciphertext, without breaking the rules: we didn't *directly* asked to

decrypt `c`

, we only took advantage of ciphertext malleability to get

around the restriction. To avoid this, we need authentication,

which I'll leave as an exercise to the reader for brevity's sake.

The proper way to achieve IND-CCA2 is to first encrypt the plaintext,

*then* compute an authentication tag over the ciphertext.

Encrypt-then-mac. We could instead authenticate the *plaintext*, and

only then encrypt the whole thing (including the authentication

tag). Mac-then-encrypt. It can be done correctly, and we can even prove

IND-CCA2. It is also a bad idea.

Ideally, we want the recipient to verify first. This has some

advantages:

- No time lost decrypting corrupted messages.
- No memory allocated or overwritten for corrupted messages.
- No temptation to provide a detached API in two steps, where the users could forget the authentication step.

What your users need is *authenticated encryption*, where decryption

either successfully decrypts the message, or does nothing at all. That's

the only way you can be sure no information will leak because of

corrupted messages.

In fact, "success or nothing" is a rule I try generalise to all my APIs:

check the inputs, *then* act on them. If the inputs are wrong, don't do

anything, just return an error to the caller.

That's the tough one. There's generally no way to prove that a given

primitive is indeed as secure as it claims to be. There's an obvious

tension between performance and security, and designing something as

secure *and* as fast as the state of the art is not trivial to say the

least.

I have also no experience in the domain. I can only give a few

pointers, that will hopefully make you realise what it means to stare

into the abyss.

All ciphers and hashes I know are designed around a core *permutation*:

mangling the hell out of a block of data to produce a pseudo-random

result:

**Hashes**use that permutation to build a*compression function*, whose output is shorter than the sum of its inputs, then repeatedly use it to compress arbitrarily long messages into short digests.**Block ciphers**are like a compression function (with the key and a piece of data as input), except it's reversible if you have the key.**Stream ciphers**are something of a blend between hashes and block ciphers: they mix the key and some input (often a nonce and counter) to output a random piece of data, but they don't need to be reversible.**Password hashes**are their own beast. They're designed to require huge amounts of resources to make it harder to brute force passwords, so the underlying permutations tend to be very different.

Some permutations (Keccak, Xoodoo, Gimli…) are designed to rule them

all, but they don't have to. The most popular ciphers and hashes have

their own permutation, tailored just for them. Chacha20's permutation

for instance produces zero when the input is all zero. Not very random,

but we don't care, because the input is never zero: part of it is a

constant, that aims to "break symmetry".

Speaking of which: before you make a serious attempt at designing your

hash, cipher, or permutation, there are some terms and concepts you

probably should be deeply familiar with:

**Indistinguishability.**Basically what we ask of most ciphers and hashes: the output must be indistinguishable from random under some conditions (like not knowing the keys). The primitive is broken if we find a "distinguisher".**Memory hardness.**Good password hashes are optimised to require lots of resources. Our machines have lots of RAM, we want to use it so brute-force guesses are more expensive. The idea is to make honest verifications fast enough to be tolerable, and dishonest guesses as expensive as possible (ideally as expensive as a honest verification, even if the attacker can use custom hardware).**Cache hardness.**Our machines have a fast cache hierarchies. We could as well take advantage of it, make brute-force guesses even more expensive without slowing down honest attempts.**Rounds.**Permutations are generally about repeating a core permutation, the "round", several time to achieve security. Chacha20 for instance repeats its core loop 20 times. Each round is strictly speaking a permutation, but we don't call them that.**Security margin.**Cryptographic literature shows that increasing the number of rounds in a primitive makes it harder to break. So when we study a primitive, we try and break weaker versions, with fewer rounds. The security margin of a primitive is the difference between the minimum number of rounds required to thwart all known attacks, and the actual number of rounds. Chacha20 for instance has a margin of 12 rounds: as of 2020, 8 rounds stop all known attacks, and we have 20.**Confusion.**It's how many bits of input you need to define a bit of output. If I wave my hands a lot, the higher the confusion, the more effective a single round is at contributing to security.**Diffusion.**It's the ability of a permutation round to propagate a change from the input to the output. Like confusion, higher diffusion increases the security contribution of a single round (if I wave my hands a lot).**Symmetry.**I'm not familiar with the term myself. I believe it refers to algebraic properties of a permutation, which can help cryptanalysis in some conditions. It must be "broken", or otherwise be rendered inexploitable to get a secure result. Or so I gathered.**Linear cryptanalysis.**A particular way of breaking symmetric primitives. I know nothing about it.**Differential cryptanalysis.**A particular way of breaking symmetric primitives. I know nothing about it.**Cryptanalysis.**There are other methods. I've heard of "rotational" which seems to help with RAX (Rotate Add Xor) designs, "integral", and I'm sure there are others.*Pretty much every paper presenting a new cipher these days show attempts at cryptanalysis, and how many rounds were broken. Can't skip this if we are to have any hope of being secure.*

Those are an incomplete list of terms I stumbled upon. My hope is by

the time you get familiar with those, you'll have a better idea of what

*else* you should study.

The primary use case of of polynomial hashes is message authentication

codes. They're not as easy to use as regular hashes, but they're very

fast, and their security properties are provable. If you're going to

invent one, you need to produce a proper security reduction (a

mathematical proof of how secure it is, given some assumptions).

Also note that Poly1305 and GHASH already cover a lot of ground.

Poly1305 is designed to allow simple and efficient software

implementations, while GHASH is very hardware friendly. One of them may

already offer what you need.

The modern way to use the difficulty of discrete logarithm. Unlike

other primitives, I feel like elliptic curves are *found* more than they

are invented. Granted, they're not trivial. The maths involved are

definitely intimidating, and a single mistake could destroy security.

One way to find a curve is to target a security level, choose a prime

that matches that security level (generally twice as big), and follow

the SafeCurves recommendations. The first curve you find is almost

certainly the one you want.

If you want to play with extension fields instead of prime fields, I can

only point out that some extension fields are more efficient on tiny

embedded targets, and the security literature around binary extension

fields is less stable than it is for prime fields.

I also strongly advise sticking to Montgomery or (twisted) Edwards

curves. They are relatively simple, and they're *fast*. They don't

have a prime order, but that can be worked around, or even solved

entirely. Avoid short Weierstraß curves. They're not

broken, but they are slower and harder to implement securely.

I know even less than Jon Snow. Can't help you, sorry.

I won't sugar coat it, rolling your own crypto is not easy. Mistakes

are easy to make, and the stakes are often high — getting it wrong can

even get people killed. Don't rush it, and if you can, seek guidance

and feedback. Some communities are very welcoming.

Still, we *can* teach ourselves. The rules may not be easy to follow,

but they are simple:

- Make it bug free. Test, test, test. Prove what you can. Be extra-rigorous around constructions and protocols.
- Know your side channels. Make it at least constant time.
- If you're inventing something, know the state of the art, including cryptanalysis. Even if what you want does not exist, what does may help you design it.

*Discuss on
r/crypto,
r/programming,
Lobsters,
Hacker News.*

···