HMAC: HMAC(k,M) = H[(K+ XOR opad) || H[(K+ XOR IPAD) || M]] K+ = K padded with zeros to the left so that it results in b bits in length for the XOR ipad = 00110110 (36 Hex) repeated b/8 times opad = 01011100 (5C) repeated b/8 times. H can be MD5, SHA1, RIPEMD-160, etc. M = message with any padding required by H. H can be MD5, SHA1, RIPEMD-160. Using HMACs for integrity: --------------------------- A->B: m, HMAC-SHA1(k, m) B can verify that m has not been tampered with and sent by a person who knows k. This is called authenticating a message. Hence the name "MAC" or message authentication code. Hash: also known as message digest. MAC: message authentication code. Same as a keyed hash or keyed message digest. Combining integrity and confidentiality: ------------------------------------------ Two essential ways: A -> B: E(k1, m || HMAC-SHA1(k2, m)) i.e., add the MAC first to authenticate a message and then encrypt. (Could also just store message digest, rather than MAC). k1 and k2 could be the same key, though using different keys could provide a little more security. A -> B: E(k1, m) || HMAC-SHA1[k2, E(k1, m)) k1 and k2 could be the same. Here, we are encrypting the message first and then attaching a MAC of the ciphertext. Other possibilities exist also. Which one is better? Performance perspective? HMAC computation is faster than encryption/decryption typically. Which of the above formats would be easier to verify for the receiver? Security perspective? Probably both very hard to break, though there is some research that shows that 2nd alternative has an edge. Result: "ETA" principle. Encrypt, Then Authenticate. Generating hashes in practice: ------------------------------ In general, one can hash arbitrary length data. But, how do you generate the hash of a large file? Possibility one: read the entire file into a buffer. hash the buffer. Problem: buffer has to be really large. Could lead to performance problems (thrashing, etc.). Better solution in OpenSSL Library: EVP interface: initialize a "context" for hashing/encrypting data hash-update first part of data --> first part of hash hash-update 2nd part of data --> 2nd part of hash ... hash-update last part of data --> next to last part of hash "finalize" the context --> last part of hash Works well for both hashing and encrypting. You could do HW1 using evp interface as well (but that is getting to HW2). Authentication: --------------- We figured out one way to provide origin authentication between two parties. But what if one person wants to provide authenticated data to the whole world? We need public key infrastructure. Reading Section 8.3 Use different keys for encryption and decryption. -- computationally infeasible to determine private key from public key. -- safe from chosen plaintext attack. -- easy to encrypt/decrypt using the appropriate key. E(e-key, plaintext) -> ciphertext D(d-key, ciphertext) -> plaintext (E and D may be the same function in some algorithms). e-key: public key d-key: private key But wait! In most systems, private key can also be used to encrypt data. E(secret key, plaintext): "signed plaintext" D(public key, plaintext): if plaintext, then the plaintext was "signed" by the person posessing the d-key. One algorithm: RSA The public and private keys are large: 1024 to 2048 bits. (512-bit keys are not considered secure enough). Choose p and q be two large primes. Let n = pq. (Given only n, it should be hard to find out p and q.) phi(n) = (p-1)(q-1) = # of numbers < n that are coprime with n. Result from arithmetic: x ** phi(n) = 1 mod n for any x, if x and n are co-primes. For example, if n is 2*3, phi(n) = (2-1)*(3-1) = 2. 1**2 mod 6 = 1; 5**2 mod 6= 1 (note that 1 and 5 are co-primes with 6, but not 2, 3, and 4) That means, if n is p*q: (x ** (phi(n)+1)) mod n = x for most values of x (other than p, q, and multiples of p and q). E.g., p = 2, q = 3 => n = 6. phi(n) = 2 (1, 5 are coprime to 6). 2**3 mod 6 = 2 3**3 mod 6 = 3 etc. Let's choose e and d so that e.d = k*phi(n) + 1 for some k OR ed = 1 mod phi(n) (Normally, one chooses e, and then finds d such that it is the multiplicative inverse of e, given phi(n)) Now: x**(e*d) mod n = x. Therefore, e and d can be used as encryption and decryption keys. Note that, phi(n) is easy to compute given p and q. But hard to compute knowing only n. So: RSA's public key ------------------- e and n Private key: d and n Often: e is key small (e.g., 3 or 17) and d is large. n is large. That way: public key exponentiation is fast (signature verification) though signing is slow (large d). Performance: Modular exponentation with 1024-bit numbers can be quite slow, as compared to symmetric encryption. So, public/private keys are used sparingly -- only during initial phases of verifying identities. One usually goes to symmetric keys very quickly. A -> world: wants to send a digitally signed message m to the world. One solution: A -> world: m, E(A-private, m). But m can be a large message and public key encryption is very slow. Better: A can sign the digest of m, instead of m. A -> world: m, E(A-private, H(m)) To verify, recepient decrypts the digital signature, and sees if it matches the hash of m. The above is an example of a digital signature scheme. ---------------------------------------------------------- Attacks on RSA. Suppose you used the following scheme for signing: Signing: E(A-private, m). Returns x. Send m, x to Bob Verification: M, D(A-public, x) Check if M = x Attack? The attacker can generate arbitrary number of random "signed" messages. How? Why does the following prevent this attack? Signing: M, E(E-private, hash(M)) -> (M, x) Verification: hash(M). Compute E(E-public, x) = y. If hash(M) = y, accept. Attack on this? Could the attacker generate a fake message that appears signed? --------------------------------------------------------------- Combining digital signing with confidentiality: Which one first? Generally, it is better to digitally sign the plaintext, and then encrypt it for confidentiality. E(symmetric key|recepeint's public key, m | digital signature) is better than E(k, m) | digital signature of E(k, m) It turns out that there are some subtle attacks on the latter, when RSA is used for both encryption and digital signatures. Hashing/signing multiple files (e.g., an entire file system): ------------------------------------------------------- Merkle hash trees: Could arrange files at the leaves of a tree. Hash the leaves to "blind them". Each intermediate node represents hash of the concatenation of the left and right child (which themselves are hashes). The root hash and intermediate hashes can be made public, along with root's signature. It can serve as the signature of the entire file system. Any given file (e.g., f2 in a system of size 8) can be verified individually: send h(f1), f2, h(f3,f4 group), h(f5,f6,f7,8) group, signed hash of root If a group of files need updating, changes propagate up the tree. However, only one digital signature needs to be computed at the end. (Could be a useful scheme to maintain a digitally signed file system or database at all times.) Authentication of identity -------------------------- (Bishop: Chapter 11.) Typically, services must authenticate users who are trying to access a service. At times, the users must also authenticate a service. Authentication problem: -- Proving identity (who you are) Authorization problem (access control): -- Making access control decisions based on that. Meanwhile, communication must be secured. Password-based: -------------- You present a to a service. Service looks up the password in a list. If it finds it, your identity is accepted. Risks: passwords in a list could be read by an admin or a hacker. Solution: store hashed passwords. But hash computation is generally fast (e.g., MD5 and SHA1). Unix: uses a somewhat slower hash function (based on DES). However, that itself has a problem -- DES keys are small. Here is how Unix scheme works: password: 8 characters take lowest 7-bits of each character, giving 56 bits. Use the 56-bits as a key to DES. The DES is applied repeatedly to a constant string (all zeros typically). The return value is mapped to 11 printable ASCII characters. Note: brute-force attack possible against crypt(). See crack(1) Brute-force attack with regular machines: 26 lower case, 5 characters: about 100 seconds 95 printable, 5 characters: about 50-100 hours. 26 lower case, 8 characters: about 10-30 days 96 printable, 8 characters: 1000-2000 years (H/W crypto: probably 100-1000 times faster). Risks: Rainbow tables: someone has a lready figured out hashes of many passwords and you can download them or get them on 5 DVDs or 30 CDs (640 MB each). They have been used to crack Windows NT passwords -- a few minute process. (NT LM authentication) -- even up to 15 characters consisting of A-Z, a-z, 0-9, and all the characters above the numbers. (see http://sarcaprj.wayreth.eu.org) Rainbow tables are a data structure to make the search very efficient, without storing everything in memory (with very high probability). Current Unix solution: Increase the search space by including a salt. Linux: salt: 8 characters (64-bits) uses MD5 for hashing password and salt. What is hashed: $1$8 characters of salt$22 bytes of password Using openssl: openssl passwd -1 -salt <8-character-salt> (prompts for password) OpenSSL> passwd -1 -salt abc Password:hello $1$abc$jWy0FsfCoVBK.I85XpXf30 OpenSSL> passwd -1 -salt abcdefgh Password:hello $1$abcdefgh$rwnEbRiN0agqVgZBovWNQ/ (sometimes, there are variations on the above -- systems differ in how they compute the hash) What is stored in /etc/shadow: userid:$1$salt$encrypted MD5-based hash:... userid:encrypted password:aging:expiry fields (This is using crypt() to encrypt). In Unix: a 2-character salt --> maps to 4096 hash functions. (effectively, a 12-bit salt). hash function maps password to a 11 character string. What is stored in /etc/shadow: salt|hash (13 characters) openssl passwd -salt <2-charactersalt> OpenSSL> passwd -salt ab Password:hello abl0JrMf6tlhw (13 character output that goes in /etc/shadow) By examining /etc/shadow, you can tell whether a Unix-flavor system is using crypt, MD5, or blowfish for encrypting passwords. MD5: Start with $1$ Blowfish: $2a$ crypt: no specific characteristics. Usually short, around 13 characters. (based on DES)