Stream ciphers, hashing (message digests), MAC ---------------------------------------------- Periodic stream cipher: key stream k repeats. Is Caesar cipher a stream cipher? (Yes) Is it periodic? Yes. (i repeats) One-time pad: stream cipher, but not periodic. RC4: stream cipher, but not periodic (or extremely large periods). (from Stallings) Uses variable length keys Period: Believed to be > 10**100 ( or 2 ** 256) bytes Key length: 8 to 2048 bits. Algorithm: for i = 0 to 255 do S[i] = i; T[i] = K[i mod keylength]; // copy K into T, repeating if necessary Use T to permute S. (Seems like we could just use the permuted S as the stream, but that would repeat after every 256 bytes). j = 0; for i = 0 to 255 do j = (j + S[i] + T[i]) mod 256; Swap(S[i], S[j]); Stream generation: You can now discard K and work with array S of 256 different bytes. Keystream is generated from S. i, j = 0; while (true) { i = i+1 mod 256; j = j + S[i] mod 256; swap(S[i], S[j]); t = (S[i] + S[j]) mod 256; k = S[t]; } Use k to increment the next byte. k will keep getting generated. RC4: susceptible to known plaintext attack. Result: should use the key k only once. Performance comparison: ----------------------- (Using openssl speed command) The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 2000.58k 4250.09k 5860.69k 6436.18k 6665.93k mdc2 0.00 0.00 0.00 0.00 0.00 md4 16718.98k 56213.46k 142114.93k 237169.66k 288281.94k md5 11397.78k 41356.01k 123076.61k 244466.35k 340404.91k hmac(md5) 8777.07k 32427.71k 88665.34k 205941.76k 331164.33k sha1 14244.14k 34355.78k 80533.15k 124394.42k 147156.36k rmd160 11549.65k 30350.81k 66911.01k 93505.63k 105096.76k rc4 96304.37k 106050.07k 108503.45k 109524.36k 109830.92k des cbc 33030.84k 35314.88k 35731.86k 35992.40k 36055.76k des ede3 12944.22k 12995.66k 13108.40k 13113.04k 13101.72k idea cbc 0.00 0.00 0.00 0.00 0.00 rc2 cbc 13161.51k 13114.21k 13480.38k 13578.31k 13652.26k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 81783.14k 86933.24k 88271.32k 88761.49k 87504.68k cast cbc 32425.33k 33813.42k 34961.39k 35267.18k 36122.55k aes-128 cbc 51984.80k 55026.02k 55702.26k 56455.62k 56483.70k aes-192 cbc 46555.25k 48190.32k 48404.98k 49830.21k 49457.80k aes-256 cbc 41744.95k 43098.52k 44063.36k 44193.65k 44223.06k sign verify sign/s verify/s rsa 512 bits 0.0008s 0.0001s 1231.9 14428.6 rsa 1024 bits 0.0040s 0.0002s 248.3 4821.2 rsa 2048 bits 0.0239s 0.0007s 41.8 1409.9 rsa 4096 bits 0.1634s 0.0025s 6.1 399.3 sign verify sign/s verify/s dsa 512 bits 0.0007s 0.0008s 1506.9 1254.5 dsa 1024 bits 0.0020s 0.0024s 499.4 419.5 dsa 2048 bits 0.0067s 0.0082s 148.3 121.3 Note that RC4 has better throughput than block ciphers. (Also, hash functions and HMAC are also much faster than block ciphers). Say two people share a key k. You can use that to provide confidentiality. A->B: E(k, m) Only A and B should be able to read m, provided k is known only to those two. Some design principles: 1. Use a key that is large enough, given the type of data being protected 2. Use an appropriate ncryption algorithm Why is the following not such a good idea? -- using RC4 to encrypt sessions between two users using their common key (password)? 3. To guard against cryptanalysis, use the "master secrets" sparingly. In other words, don't use to encrypt data, only to generate session keys that are session-specific. A->B: E(master key, session key) B->A: E(session key, data1) ... A->B: E(session, key, data2) (This is vulnerable to attacks however -- so don't want to use this without adding freshness guarantees) 4. In practice, for the session key, often RC4 is used for web traffic. Integrity: ---------- For integrity, we could embed a value in the message and use that to authenticate. 1. E(k, M|checksum) OR 2. E(k, m) | checksum(E(k,m)) OR 3. E(k,m) | checksum(m) Problem is that encryption is often shower than what is required here. Also, perhaps we don't really care for confidentiality of M. Also, checksum may not have sufficient redundancy -- if it is a 1-bit checksum, one can get lucky and a tampered message can still verify. Is there a difference between 1, 2, and 3 from an attacker's perspective? Which one is easy to manipulate by Mallory so that the message m would validate? MAC (Message Authentication Code) MAC(k, m) = a fixed-length string that validates m. (usually 128 or 160 bits) A->B: m, MAC(k, m) B verifies m. HMAC is a common MAC function The above can be used to authenticate m. Combining with confidentiality: A->B: E(k, m), MAC(k, m) OR E(k, m || MAC(k,m)) (Authentication tied to plain-text) Could also authenticate encrypted text: A->B: E(k, m), MAC(k, E(k,m)) One-way hash functions: ----------------------- 1. H can be applied to a block of data of any size. 2. It produces a fixed-length output 3. H(x) is fast and easy to compute for any x. 4. One-way property: For any value h, it is computationally infeasible to find an x such that H(x) = h. 5. Weak-collision resistance: Given a block x, it is computationally infeasible to find a y not equal to x such that H(x) = H(y) 6. Strong-collision resistance; It is computationally infeasible to find any pair of x, y, x is not equal to y, such that H(x) = H(y) Hashing methods: One way is to use block ciphers Start with a known value (e.g., 0). Keep encrypting that value with the blocks in the message. MD5: 128-bit digest. Works on 512-bit blocks at a time in the input (it pads the last block with a fixed value). SHA (Secure Hash): Generates a 160-bit digest. Works on 512-bit blocks at a time. SHA-1: 160-bit digest SHA-256: 256-bit digest SHA-384: 384-bit digest SHA-512_ 512-bit digeset (Security: about 1/2 the # of bits due to Birthday attack) RIPEMD-160: generates 160-bit digest Application: A->B: M, H(M||S) Where S is a secret value shared between A and B. The above provides origin integrity (authenticates) that M originated from A. It turns out that there is a more robust algorithm ("trusted" algorithm) for doing a "keyed hash". It is called HMAC and works with any encryption key and any cryptographic hash function. HMAC-SHA1 with 56-bit keys: Keying using a DES key (not a good idea, since key is too small). HMAC-SHA1 with 128-bit key: E.g., keying using a AES key; SHA1 hash function HMAC-MD5: HMAC with MD5 as the underlying hash function. Key and key length must also be given. HMAC: HMAC(k,M) = H[(K+ XOR opad) || H[(K+ XOR IPAD) || M]] K+ = K padded with zeros to the left so that it results in b bits in length for the XOR ipad = 00110110 (36 Hex) repeated b/8 times opad = 01011100 (5C) repeated b/8 times. H can be MD5, SHA1, RIPEMD-160, etc. M = message with any padding required by H (so that it is a multiple of b bits). H can be MD5, SHA1, RIPEMD-160. ||: concatenation operator. b is a block size. Hashing algorithm usually dictates the block size. It is 512 bits for MD5, SHA-1, and RIPEMD-160. Using HMACs for integrity: --------------------------- A->B: m, HMAC-SHA1(k, m) B can verify that m has not been tampered with and sent by a person who knows k. This is called authenticating a message. Hence the name "MAC" or message authentication code. Hash: also known as message digest. MAC: message authentication code. Same as a keyed hash or keyed message digest. Combining integrity and confidentiality: ------------------------------------------ Two essential ways: A -> B: E(k1, m || HMAC-SHA1(k2, m)) i.e., add the MAC first to authenticate a message and then encrypt. (Could also just store message digest, rather than MAC). k1 and k2 could be the same key, though using different keys could provide a little more security. A -> B: E(k1, m) || HMAC-SHA1[k2, E(k1, m)) k1 and k2 could be the same. Here, we are encrypting the message first and then attaching a MAC of the ciphertext. Other possibilities exist also. Which one is better? Performance perspective? HMAC computation is faster than encryption/decryption typically. Which of the above formats would be easier to verify for the receiver? Security perspective? Probably both very hard to break, though there is some research that shows that 2nd alternative has an edge. Result: "ETA" principle. Encrypt, Then Authenticate. Generating hashes in practice: ------------------------------ In general, one can hash arbitrary length data. But, how do you generate the hash of a large file? Possibility one: read the entire file into a buffer. hash the buffer. Problem: buffer has to be really large. Could lead to performance problems (thrashing, etc.). Better solution in OpenSSL Library: EVP interface: initialize a "context" for hashing/encrypting data hash-update first part of data --> first part of hash hash-update 2nd part of data --> 2nd part of hash ... hash-update last part of data --> next to last part of hash "finalize" the context --> last part of hash `