Fastest phone crypto

In 2001 I designed the fastest RSA cryptography for phones, specifically for the ARM7tdmi processor. It used 98ms to do 1024 bit crypto, when the closest competitor used 150ms.

How did I achieve that?

All the information is public, so I can tell you this. To get to second place, one needs very good programmers, of which we were two, and the methods from the book:

Handbook of Applied Cryptography
Alfred J. Menezes
Paul C. van Oorschot
Scott A. Vanstone

Since RSA public key encryption is dependent on arithmetic, such as multiplication and exponentiation modulo, this needs to be fast. And the book describes methods such as Barrett reduction for acceleration of modular exponentiation, and other fast methods.

But to get to first place, one needs:

The secret ingredient

Faster multiplications.

There are ways to speed up very long multiplications, such as Karatsuba multiplication or fast fourier convolution methods, but 512 bits are a little short for these excellent methods. So what I did, was to optimize use of registers, minimizing memory accesses, keeping the processor busy with multiplications without wasting time on getting and storing results.

A way of doing this is described in my thesis, paragraph 7.3 "Register use"
The idea is to change the order of the smaller parts of the big multiplications. The smaller multiplications and additions. By reordering them, one can maximize the access of intermediate results by minimizing memory accesses, instead storing as much as possible in registers.

Aftermath

I thought this would be an excellent product to sell on the market, since the phone market was very big and there ought to be a lot to earn there, but my boss, J. Lyseggen, seemed oblivous as usual. He only talked about artificial deadlines and ignored all our arguments.

After a while the phones started using cryptographic chips for this, and they still do that in 2016 even though the processors are more than fast enough. The chips have developed into digital fortresses protecting phones even against the FBI. Especially Apple have done this thorougly in the iPhone. Perhaps NSA have a backdoor in the crypto chips.
But ARM processors are used now more than ever, especielly in small devices on the net such as Raspberry Pi, and all of them would benefit from faster crypto, and I could make that. But everyone use slower open source code there, and making fast crypto again is a lot of work.