Stream ciphers, hashing (message digests), MAC 
	         ----------------------------------------------

	Periodic stream cipher: key stream k repeats.

	Is Caesar cipher a stream cipher?

	(Yes)


	Is it periodic?

	Yes. (i repeats)


	One-time pad: stream cipher, but  not periodic.


	RC4: stream cipher, but not periodic (or extremely large periods).
	(from Stallings)

		Uses variable length keys
		Period: Believed to be > 10**100 ( or 2 ** 256) bytes
		Key length: 8 to 2048 bits.

		Algorithm:

		for i = 0 to 255 do
			S[i] = i;
			T[i] = K[i mod keylength]; // copy K into T,
	 						repeating if necessary
		
		
		Use T to permute S.
		(Seems like we could just use the permuted S as the stream,
		but that would repeat after every 256 bytes).	

		j = 0;
		for i = 0 to 255 do
			j = (j + S[i] + T[i]) mod 256;
			Swap(S[i], S[j]);

		Stream generation:

			You can now discard K and work with array S of 256 
			different bytes.
		
			Keystream is generated from S.


		i, j = 0;

		while (true) {

			i = i+1 mod 256;
			j = j + S[i] mod 256;
			swap(S[i], S[j]);
			t = (S[i] + S[j]) mod 256;
			k = S[t];
		}

		Use k to increment the next byte.
		k will keep getting generated. 


		RC4: susceptible to known plaintext attack.

		Result: should use the key k only once.


Performance comparison:
-----------------------

(Using openssl speed command)

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2               2000.58k     4250.09k     5860.69k     6436.18k     6665.93k
mdc2                 0.00         0.00         0.00         0.00         0.00
md4              16718.98k    56213.46k   142114.93k   237169.66k   288281.94k
md5              11397.78k    41356.01k   123076.61k   244466.35k   340404.91k
hmac(md5)         8777.07k    32427.71k    88665.34k   205941.76k   331164.33k
sha1             14244.14k    34355.78k    80533.15k   124394.42k   147156.36k
rmd160           11549.65k    30350.81k    66911.01k    93505.63k   105096.76k
rc4              96304.37k   106050.07k   108503.45k   109524.36k   109830.92k
des cbc          33030.84k    35314.88k    35731.86k    35992.40k    36055.76k
des ede3         12944.22k    12995.66k    13108.40k    13113.04k    13101.72k
idea cbc             0.00         0.00         0.00         0.00         0.00
rc2 cbc          13161.51k    13114.21k    13480.38k    13578.31k    13652.26k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00
blowfish cbc     81783.14k    86933.24k    88271.32k    88761.49k    87504.68k
cast cbc         32425.33k    33813.42k    34961.39k    35267.18k    36122.55k
aes-128 cbc      51984.80k    55026.02k    55702.26k    56455.62k    56483.70k
aes-192 cbc      46555.25k    48190.32k    48404.98k    49830.21k    49457.80k
aes-256 cbc      41744.95k    43098.52k    44063.36k    44193.65k    44223.06k
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0008s   0.0001s   1231.9  14428.6
rsa 1024 bits   0.0040s   0.0002s    248.3   4821.2
rsa 2048 bits   0.0239s   0.0007s     41.8   1409.9
rsa 4096 bits   0.1634s   0.0025s      6.1    399.3
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0007s   0.0008s   1506.9   1254.5
dsa 1024 bits   0.0020s   0.0024s    499.4    419.5
dsa 2048 bits   0.0067s   0.0082s    148.3    121.3
  

Note that RC4 has better throughput than block ciphers.
(Also, hash functions and HMAC are also much faster than block ciphers).  


Say two people share a key k. You can use that to provide confidentiality.

A->B: E(k, m)

Only A and B should be able to read m, provided k is known only to those two. 


Some design principles:

1. Use a key that is large enough, given the type of data being protected
2. Use an appropriate ncryption algorithm 

Why is the following not such a good idea?

	-- using RC4 to encrypt sessions between two users using their
		common key (password)?


3. To guard against cryptanalysis, use the "master secrets" sparingly. 
	In other words, don't use to encrypt data, only to generate 
	session keys that are session-specific.
		

	A->B:  E(master key, session key)

	B->A: E(session key, data1)
	...
	A->B: E(session, key, data2)

	(This is vulnerable to attacks however -- so don't want to use this
	without adding freshness guarantees)

4. In practice, for the session key, often RC4 is used for web traffic.


Integrity:
----------


For integrity, we could embed a value in the message and use that to 
authenticate.

1.	E(k, M|checksum)
	OR
2.	E(k, m) | checksum(E(k,m))
	OR
3.
	E(k,m) | checksum(m)

Problem is that encryption is often shower than what is required here.
Also, perhaps we don't really care for confidentiality of M.

Also, checksum may not have sufficient redundancy -- if it is a 1-bit checksum,
one can get lucky and a tampered message can still verify.

Is there a difference between 1, 2, and 3 from an attacker's perspective?

    Which one is easy to manipulate by Mallory so that the message m 
	would validate?


MAC (Message Authentication Code)

	MAC(k, m) = a fixed-length string that validates m. (usually 128 or
						160 bits)

	A->B: m, MAC(k, m)

	B verifies m.

	HMAC is a common MAC function


	The above can be used to authenticate m.

	
	Combining with confidentiality:

	A->B: E(k, m), MAC(k, m)
		OR E(k, m || MAC(k,m))
	(Authentication tied to plain-text)

	Could also authenticate encrypted text:

	A->B: E(k, m), MAC(k, E(k,m))


One-way hash functions:
-----------------------

1. H can be applied to a block of data of any size.
2. It produces a fixed-length output
3. H(x) is fast and easy to compute for any x.
4. One-way property: For any value h, it is computationally infeasible to find an x
	such that H(x) = h.
5. Weak-collision resistance: Given a block x, it is computationally infeasible
	to find a y not equal to x such that
		H(x) = H(y)
6. Strong-collision resistance; It is computationally infeasible to 
	find any pair of x, y, x is not equal to y, such that
		H(x) = H(y)


Hashing methods:
	One way is to use block ciphers
	Start with a known value (e.g., 0). Keep encrypting that value
	with the blocks in the message.

	MD5: 128-bit digest.
		Works on 512-bit blocks at a time in the input (it
		pads the last block with a fixed value).

	SHA (Secure Hash): 
		Generates a 160-bit digest. Works on 512-bit blocks at a time.

		SHA-1: 160-bit digest
		SHA-256: 256-bit digest
		SHA-384: 384-bit digest
		SHA-512_ 512-bit digeset

		(Security: about 1/2 the # of bits due to Birthday attack)

	RIPEMD-160: generates 160-bit digest	


Application:
	A->B: M, H(M||S)

	Where S is a secret value shared between A and B.

	The above provides origin integrity (authenticates) 
	that M originated from A. 

It turns out that there is a more robust algorithm ("trusted" algorithm)
for doing a "keyed hash". It is called HMAC and works with any encryption
key and any cryptographic hash function.

HMAC-SHA1 with 56-bit keys: Keying using a DES key (not a good idea,
	  since key is too small).

HMAC-SHA1 with 128-bit key: E.g., keying using a AES key; SHA1 hash function

HMAC-MD5: HMAC with MD5 as the underlying hash function. Key and key length
	  must also be given.


HMAC:
	HMAC(k,M) = H[(K+ XOR opad) || H[(K+ XOR IPAD) || M]]
	
	K+ = K padded with zeros to the left so that it results in
		b bits in length for the XOR
	ipad = 00110110 (36 Hex) repeated b/8 times
	opad = 01011100 (5C) repeated b/8 times.

	H can be MD5, SHA1, RIPEMD-160, etc.

	M = message with any padding required by H (so that it is a multiple
	     of b bits).
	H can be MD5, SHA1, RIPEMD-160. 
	||: concatenation operator.

	b is a block size. Hashing algorithm usually dictates
	the block size. It is 512 bits for MD5, SHA-1, and RIPEMD-160.


Using HMACs for integrity:
---------------------------


A->B: m, HMAC-SHA1(k, m)

B can verify that m has not been tampered with and sent by a person
who knows k.

This is called authenticating a message. Hence the name "MAC" or message
authentication code. 

Hash: also known as message digest.
MAC: message authentication code. Same as a keyed hash or keyed message digest.


Combining integrity and confidentiality:
------------------------------------------


Two essential ways:

A -> B:  E(k1, m || HMAC-SHA1(k2, m))

i.e., add the MAC first to authenticate a message and then encrypt.
(Could also just store message digest, rather than MAC). k1 and k2
could be the same key, though using different keys could provide
a little more security.


A -> B: E(k1, m) || HMAC-SHA1[k2, E(k1, m))

k1 and k2 could be the same. Here, we are encrypting the message first and
then attaching a MAC of the ciphertext.

Other possibilities exist also.


Which one is better?


      Performance perspective?

		  HMAC computation is faster than encryption/decryption
		  typically.


		  Which of the above formats would be easier to verify for
		  the receiver?

       Security perspective?
		  Probably both very hard to break, though there is some
		  research that shows that 2nd alternative has an edge.

	Result: "ETA" principle.

	Encrypt, Then Authenticate.


Generating hashes in practice:
------------------------------


In general, one can hash arbitrary length data. But, how do you generate the 
hash of a large file?

     Possibility one: read the entire file into a buffer.
		      hash the buffer.

		      Problem: buffer has to be really large. Could
			       lead to performance problems (thrashing, etc.).


     Better solution in OpenSSL Library:

		     EVP interface:

			 initialize a "context" for hashing/encrypting data
			 hash-update first part of data	--> first part of hash
			  hash-update 2nd part of data --> 2nd part of hash
			 ...
			 hash-update last part of data --> next to last part
							   of hash
			 "finalize" the context --> last part of hash

			 
`