Salsa20
The Salsa quarter-round function. Four parallel copies make a round. | |
General | |
---|---|
Designers | Daniel J. Bernstein |
First published | 2007 (designed 2005)[1] |
Related to | Rumba20, ChaCha |
Certification | eSTREAM portfolio |
Cipher detail | |
Key sizes | 256 bits |
State size | 512 bits |
Structure | ARX |
Rounds | 20 |
Speed | 3.91 cpb on an Intel Core 2 Duo[2] |
Best public cryptanalysis | |
2008 cryptanalysis breaks 8 out of 20 rounds to recover the 256-bit secret key in 2251 operations, using 231 keystream pairs.[3] |
Salsa20 is a stream cipher submitted to eSTREAM by Daniel J. Bernstein. It is built on a pseudorandom function based on add-rotate-xor (ARX) operations — 32-bit addition, bitwise addition (XOR) and rotation operations. Salsa20 maps a 256-bit key, a 64-bit nonce, and a 64-bit stream position to a 512-bit block of the key stream (a version with a 128-bit key also exists). This gives Salsa20 the unusual advantage that the user can efficiently seek to any position in the key stream in constant time. It offers speeds of around 4–14 cycles per byte in software on modern x86 processors,[4] and reasonable hardware performance. It is not patented, and Bernstein has written several public domain implementations optimized for common architectures.[5]
A related cipher, ChaCha, which has similar features but a different round function, was published by Bernstein in 2008.
Structure
Internally, the cipher uses bitwise addition ⊕ (exclusive OR), 32-bit addition mod 232 ⊞, and constant-distance rotation operations (<<<) on an internal state of sixteen 32-bit words. Using only add-rotate-xor operations avoids the possibility of timing attacks in software implementations. The basic Salsa20 round primitive R(a,b,c,k)
is
b ⊕= (a ⊞ c) <<< k;
The initial state is made up of 8 words of key, 2 words of stream position, 2 words of nonce (essentially additional stream position bits), and 4 fixed words. Then the array is mixed as follows.
A quarter-round takes a four-word input and produces a four-word output. The internal 16-word state is arranged as a 4x4 matrix; even-numbered rounds apply the quarter-round operation to each of the four rows, while odd-numbered rounds apply the quarter-round operation to each of the four columns. Two consecutive rounds (a row-round and column-round) together are called a double-round.
A more precise specification appears below as pseudocode, although in this form the row/column pattern is more difficult to see. ⊞ is addition modulo 232, <<< is the left-rotate operation, and ⊕ is exclusive-or. x ⊕= y
is an abbreviation for x = x ⊕ y
.
x[ 4] ⊕= (x[ 0] ⊞ x[12])<<<7; x[ 9] ⊕= (x[ 5] ⊞ x[ 1])<<<7; x[14] ⊕= (x[10] ⊞ x[ 6])<<<7; x[ 3] ⊕= (x[15] ⊞ x[11])<<<7; x[ 8] ⊕= (x[ 4] ⊞ x[ 0])<<<9; x[13] ⊕= (x[ 9] ⊞ x[ 5])<<<9; x[ 2] ⊕= (x[14] ⊞ x[10])<<<9; x[ 7] ⊕= (x[ 3] ⊞ x[15])<<<9; x[12] ⊕= (x[ 8] ⊞ x[ 4])<<<13; x[ 1] ⊕= (x[13] ⊞ x[ 9])<<<13; x[ 6] ⊕= (x[ 2] ⊞ x[14])<<<13; x[11] ⊕= (x[ 7] ⊞ x[ 3])<<<13; x[ 0] ⊕= (x[12] ⊞ x[ 8])<<<18; x[ 5] ⊕= (x[ 1] ⊞ x[13])<<<18; x[10] ⊕= (x[ 6] ⊞ x[ 2])<<<18; x[15] ⊕= (x[11] ⊞ x[ 7])<<<18; x[ 1] ⊕= (x[ 0] ⊞ x[ 3])<<<7; x[ 6] ⊕= (x[ 5] ⊞ x[ 4])<<<7; x[11] ⊕= (x[10] ⊞ x[ 9])<<<7; x[12] ⊕= (x[15] ⊞ x[14])<<<7; x[ 2] ⊕= (x[ 1] ⊞ x[ 0])<<<9; x[ 7] ⊕= (x[ 6] ⊞ x[ 5])<<<9; x[ 8] ⊕= (x[11] ⊞ x[10])<<<9; x[13] ⊕= (x[12] ⊞ x[15])<<<9; x[ 3] ⊕= (x[ 2] ⊞ x[ 1])<<<13; x[ 4] ⊕= (x[ 7] ⊞ x[ 6])<<<13; x[ 9] ⊕= (x[ 8] ⊞ x[11])<<<13; x[14] ⊕= (x[13] ⊞ x[12])<<<13; x[ 0] ⊕= (x[ 3] ⊞ x[ 2])<<<18; x[ 5] ⊕= (x[ 4] ⊞ x[ 7])<<<18; x[10] ⊕= (x[ 9] ⊞ x[ 8])<<<18; x[15] ⊕= (x[14] ⊞ x[13])<<<18;
Finally, the mixed array is added, word by word, to the original array to obtain its 64-byte key stream block. Salsa20 performs 20 rounds of mixing on its input.[1] However, reduced round variants Salsa20/8 and Salsa20/12 using 8 and 12 rounds respectively have also been introduced. These variants were introduced to complement the original Salsa20, not to replace it, and perform even better[note 1] in the eSTREAM benchmarks than Salsa20, though with a correspondingly lower security margin.
eSTREAM selection
Salsa20 has been selected as a Phase 3 design for Profile 1 (software) by the eSTREAM project, receiving the highest weighted voting score of any Profile 1 algorithm at the end of Phase 2.[6] Salsa20 had previously been selected as Phase 2 Focus design for Profile 1 (software) and as a Phase 2 design for Profile 2 (hardware) by the eSTREAM project,[7] but was not advanced to Phase 3 for Profile 2 because eSTREAM felt that it was probably not a good candidate for extremely resource constrained hardware environments.[8]
Cryptanalysis
As of 2015, there are no published attacks on Salsa20/12 or the full Salsa20/20; the best attack known[3] breaks 8 of the 12 or 20 rounds.
In 2005, Paul Crowley reported an attack on Salsa20/5 with an estimated time complexity of 2165, and won Bernstein's US$1000 prize for "most interesting Salsa20 cryptanalysis".[9] This attack, and all subsequent attacks are based on truncated differential cryptanalysis. In 2006, Fischer, Meier, Berbain, Biasse, and Robshaw reported an attack on Salsa20/6 with estimated time complexity of 2177, and a related-key attack on Salsa20/7 with estimated time complexity of 2217.[10]
In 2007, Tsunoo et al. announced a cryptanalysis of Salsa20 which breaks 8 out of 20 rounds to recover the 256-bit secret key in 2255 operations, using 211.37 keystream pairs.[11] However, this attack does not seem to be comparative with the brute force attack.
In 2008, Aumasson, Fischer, Khazaei, Meier, and Rechberger reported a cryptanalytic attack against Salsa20/7 with a time complexity of 2153, and they reported the first attack against Salsa20/8 with an estimated time complexity of 2251. This attack makes use of the new concept of probabilistic neutral key bits for probabilistic detection of a truncated differential. The attack can be adapted to break Salsa20/7 with a 128-bit key.[3]
In 2012 the attack by Aumasson et al. was improved by Shi et al. against Salsa20/7 (128-bit key) to a time complexity of 2109 and Salsa20/8 (256-bit key) to 2250.[12]
In 2013, Mouha and Preneel published a proof[13] that 15 rounds of Salsa20 was 128-bit secure against differential cryptanalysis. (Specifically, it has no differential characteristic with higher probability than 2−130, so differential cryptanalysis would be more difficult than 128-bit key exhaustion.)
ChaCha variant
In 2008, Bernstein published the closely related "ChaCha" family of ciphers, which aim to increase the diffusion per round while achieving the same or slightly better performance.[14] The Aumasson et al. paper also attacks ChaCha, achieving one round fewer: for 256 bits ChaCha6 with complexity 2139 and ChaCha7 with complexity 2248. 128 bits ChaCha6 within 2107, but claims that the attack fails to break 128 bits ChaCha7.[3]
ChaCha replaces the basic Salsa20 round primitive R(a,b,c,k)
b ⊕= (a ⊞ c) <<< k;
with the modified computation:
b ⊞= c; a ⊕= b; a <<<= k;
The rotation amounts are also updated. A full quarter-round, QR (a,b,c,d)
becomes:
a ⊞= b; d ⊕= a; d <<<= 16; c ⊞= d; b ⊕= c; b <<<= 12; a ⊞= b; d ⊕= a; d <<<= 8; c ⊞= d; b ⊕= c; b <<<= 7;
In addition to being more efficient on 2-operand instruction sets (like x86), this updates each word twice per quarter-round.
The fact that two of the rotates are multiple of 8 allows some optimization.[15] Additionally, the input formatting is rearranged to support an efficient SSE implementation optimization discovered for Salsa20. Rather than alternating rounds down columns and across rows, they are performed down columns and along diagonals.[16] So a double round in ChaCha is
QR (0, 4, 8, 12) QR (1, 5, 9, 13) QR (2, 6, 10, 14) QR (3, 7, 11, 15) QR (0, 5, 10, 15) QR (1, 6, 11, 12) QR (2, 7, 8, 13) QR (3, 4, 9, 14)
where the numbers are the indexes of the sixteen 32-bit state words. ChaCha20 uses 10 iterations of the double round.[17]
ChaCha is the basis of the BLAKE hash function, a finalist in the NIST hash function competition, and BLAKE2 successor tuned for even higher speed. It also defines a variant using sixteen 64-bit words (1024 bits of state), with correspondingly adjusted rotation constants.
ChaCha20 adoption
Google has selected ChaCha20 along with Bernstein's Poly1305 message authentication code as a replacement for RC4 in TLS, which is used for Internet security.[18] Google's implementation secures https (TLS/SSL) traffic between the Chrome browser on Android phones and Google's websites.[19]
Shortly after Google's adoption for TLS, both the ChaCha20 and Poly1305 algorithms were also used for a new chacha20-poly1305@openssh.com cipher in OpenSSH.[20][21] Subsequently, this made it possible for OpenSSH to avoid any dependency on OpenSSL, via a compile-time option.[22]
ChaCha20 is also used for the arc4random random number generator in OpenBSD[23] and NetBSD[24] operating systems, instead of the broken RC4, and in DragonFly BSD[25] for the CSPRNG subroutine of the kernel.[26][27] Starting from version 4.8, the Linux kernel uses the ChaCha20 algorithm to generate data for the nonblocking /dev/urandom device.[28][29][30]
An implementation reference for ChaCha20 has been published in RFC 7539. The IETF's implementation modified Bernstein's published algorithm by changing 64-bit nonce and 64-bit block counter to 96-bit nonce and 32-bit block counter[31]. The name was not changed when the algorithm was modified and it could be the source of confusion for developers. Because of the reduced block counter, maximum message length that can be safely encrypted by the IETF's variant is 232-1 blocks of 64 bytes (about 256 GB). For usages where this is not enough, such as file or disk encryption, RFC 7539 proposes using the original algorithm with 64-bit nonce.
Use of ChaCha20 in IKE and IPsec have been proposed for standardization in RFC 7634. Proposed standardization of its use in TLS is published as RFC 7905.
See also
- Speck – an add-rotate-xor cipher developed by the NSA
Notes
- ↑ Since the majority of the work consists of performing the repeated rounds, the number of rounds is inversely proportional to the performance. That is, halving the number of rounds roughly doubles the performance. Reduced-round variants are thus appreciably faster.
References
- 1 2 Daniel J. Bernstein (2007-12-24). "The Salsa20 family of stream ciphers" (PDF).
- ↑ Daniel J. Bernstein (2013-05-16). "Salsa 20 speed; Salsa20 software".
- 1 2 3 4 Jean-Philippe Aumasson, Simon Fischer, Shahram Khazaei, Willi Meier, and Christian Rechberger (2008-03-14). "New Features of Latin Dances" (PDF).
- ↑ Daniel J. Bernstein (2013-05-16). "Snuffle 2005: the Salsa20 encryption function".
- ↑ "Salsa20: Software speed". 2007-05-11.
- ↑ "The eSTREAM Project: End of Phase 2". eSTREAM. 2008-04-29.
- ↑ Hongjun Wu (2007-03-30). "eSTREAM PHASE 3: End of Phase 1". eSTREAM.
- ↑ "eSTREAM: Short Report on the End of the Second Phase" (PDF). eSTREAM. 2007-03-26.
- ↑ Paul Crowley (2006-02-09). "Truncated differential cryptanalysis of five rounds of Salsa20".
- ↑ Simon Fischer, Willi Meier, Côme Berbain, Jean-François Biasse , M. J. B. Robshaw (2006). "Non-randomness in eSTREAM Candidates Salsa20 and TSC-4". Indocrypt 2006. doi:10.1007/11941378_2.
- ↑ Yukiyasu Tsunoo, Teruo Saito, Hiroyasu Kubo, Tomoyasu Suzaki and Hiroki Nakashima (2007-01-02). "Differential Cryptanalysis of Salsa20/8" (PDF).
- ↑ Zhenqing Shi, Bin Zhang, Dengguo Feng, Wenling Wu (2012). "Improved Key Recovery Attacks on Reduced-Round Salsa20 and ChaCha". ICISC'12 Proceedings of the 15th international conference on Information Security and Cryptology: 337–351. ISBN 978-3-642-37681-8. doi:10.1007/978-3-642-37682-5_24.
- ↑ Nicky Mouha; Bart Preneel (2013). "Towards Finding Optimal Differential Characteristics for ARX: Application to Salsa20" (PDF).
- ↑ Daniel J. Bernstein (2008-04-25). "The ChaCha family of stream ciphers".
- ↑ Neves, Samuel (2009-10-07), Faster ChaCha implementations for Intel processors, retrieved 2016-09-07,
two of these constants are multiples of 8; this allows for a 1 instruction rotation in Core2 and later Intel CPUs using the pshufb instruction
- ↑ Bernstein, Daniel J. (2008-01-28), ChaCha, a variant of Salsa20 (PDF), p. 4, Document ID: 4027b5256e17b9796842e6d0f68b0b5e, retrieved 2016-09-07
- ↑ Y. Nir (Check Point), A. Langley (Google Inc.) (May 2015). "ChaCha20 and Poly1305 for IETF Protocols: RFC 7539".
- ↑ A. Langley, W. Chang, N. Mavrogiannopoulos, J. Strombergson, S. Josefsson (2015-12-16). "ChaCha20-Poly1305 Cipher Suites for Transport Layer Security (TLS)". Internet Draft.
- ↑ "Google Swaps Out Crypto Ciphers in OpenSSL". InfoSecurity. 2014-04-25. Retrieved 2016-09-07.
- ↑ Miller, Damien (2016-05-03). "ssh/PROTOCOL.chacha20poly1305". Super User's BSD Cross Reference: PROTOCOL.chacha20poly1305. Retrieved 2016-09-07.
- ↑ Murenin, Constantine A. (2013-12-11). Unknown Lamer, ed. "OpenSSH Has a New Cipher — Chacha20-poly1305 — from D.J. Bernstein". Slashdot. Retrieved 2016-09-07.
- ↑ Murenin, Constantine A. (2014-04-30). Soulskill, ed. "OpenSSH No Longer Has To Depend On OpenSSL". Slashdot. Retrieved 2016-09-07.
- ↑ guenther (Philip Guenther), ed. (2015-09-13). "libc/crypt/arc4random.c". Super User's BSD Cross Reference: arc4random.c. Retrieved 2016-09-07.
ChaCha based random number generator for OpenBSD.
- ↑ riastradh (Taylor Campbell), ed. (2016-03-25). "libc/gen/arc4random.c". Super User's BSD Cross Reference: arc4random.c. Retrieved 2016-09-07.
Legacy arc4random(3) API from OpenBSD reimplemented using the ChaCha20 PRF, with per-thread state.
- ↑ "kern/subr_csprng.c". Super User's BSD Cross Reference: subr_csprng.c. 2015-11-04. Retrieved 2016-09-07.
chacha_encrypt_bytes
- ↑ "ChaCha Usage & Deployment". 2016-09-07. Retrieved 2016-09-07.
- ↑ "arc4random(3)". NetBSD Manual Pages. 2014-11-16. Retrieved 2016-09-07.
- ↑ Corbet, Jonathan. "Replacing /dev/urandom". Linux Weekly News. Retrieved 2016-09-20.
- ↑ "Merge tag 'random_for_linus' of git.kernel.org/pub/scm/linux/kernel/git/tytso/random". Linux kernel source tree. Retrieved 2016-09-20.
random: replace non-blocking pool with a Chacha20-based CRNG
- ↑ Michael Larabel (2016-07-25). "/dev/random Seeing Improvements For Linux 4.8". Phoronix. Retrieved 2016-10-03.
- ↑ "ChaCha20 and Poly1305 for IETF protocols" (PDF). Retrieved 2017-08-07.
Changes from regular ChaCha. The nonce: block sequence number split was changed from 64:64 to 96:32
External links
- Snuffle 2005: the Salsa20 encryption function
- Salsa20 specification (PDF)
- Salsa20/8 and Salsa20/12 (PDF)
- The eSTREAM Project: Salsa20
- The ChaCha family of stream ciphers
- Salsa20 Usage & Deployment