Doc/AEAD_API.txt - external/github.com/dlitz/pycrypto - Git at Google

 AEAD (Authenticated Encryption with Associated Data) modes of operations
 for symmetric block ciphers provide authentication and integrity in addition
 to confidentiality.

 Traditional modes like CBC or CTR only guarantee confidentiality; one
 has to combine them with cryptographic MACs in order to have assurance of
 authenticity. It is not trivial to implement such generic composition
 without introducing subtle security flaws.

 AEAD modes are designed to be more secure and often more efficient than
 ad-hoc constructions. Widespread AEAD modes are GCM, CCM, EAX, SIV, OCB.

 Most AEAD modes allow to optionally authenticate the plaintext with some
 piece of arbitrary data that will not be encrypted, for instance a
 packet header.

 In this file, that is called "associated data" (AD) even though terminology
 may vary from mode to mode.

 The term "MAC" is used to indicate the authentication tag generated as part
 of the encryption process. The MAC is consumed by the receiver to tell
 whether the message has been tampered with.

 === Goals of the AEAD API

 1. Any piece of data (AD, ciphertext, plaintext, MAC) is passed only once.
 2. It is possible to encrypt/decrypt huge files withe little memory cost.
    To this end, authentication, encryption, and decryption are always
    incremental. That is, the caller may split data in any way, and the end
    result does not depend on how splitting is done.
    Data is processed immediately without being internally buffered.
 3. API is similar to a Crypto.Hash MAC object, to the point that when only
    AD is present, the object behaves exactly like a MAC.
 4. API is similar to a classic cipher object, to the point that when no AD is present,
    the object behaves exactly like a classic cipher mode (e.g. CBC).
 5. MACs are produced and handled separately from ciphertext/plaintext.
 6. The API is intuitive and hard to misuse. Exceptions are raised immediately
    in case of misuse.
 7. Caller does not have to add or remove padding.

 The following state diagram shows the proposed API for AEAD modes.
 In general, it is applicable to all block ciphers in the Crypto.Cipher
 module, even though certain modes may put restrictions on certain
 cipher properties (e.g. block size).

                               .--------.
                               |  START |
                               '--------'
                                   |
                                   | .new(key, mode, iv, ...)
                                   v
                            .-------------.
 .-------------.-------------| INITIALIZED |---------------+-------------.
 |             |             '-------------'               |             |
 |             |                    |                      |             |
 |             |.encrypt()          |.update()             |.decrypt()   |
 |             |                    |                      |             |
 |             |                    v                      |             |
 |             |            .--------------.___ .update()  |             |
 |             |            | CLEAR HEADER |<--'           |             |
 |             |            '--------------'               |             |
 |             |               |  |   |  |                 |             |
 |             |     .encrypt()|  |   |  |.decrypt()       |             |
 |             v               |  |   |  |                 v             |
 |        .--------------.     |  |   |  |     .--------------.          |
 |        | ENCRYPTING   |<----'  |   |  '---->|  DECRYPTING  |          |
 |        '--------------'        |   |        '--------------'          |
 |             |   ^ |            |   |             |  ^ |               |
 | hex/digest()|   | |.encrypt()* |   |             |  | |.decrypt()*    |
 |             |   '''            |   |             |  '''               |
 |             |                  /   \             |                    |
 |             |     hex/digest()/     \hex/verify()|hex/verify()        |
 |hex/digest() |                /       \           |                    |
 |             |               /         \          |        hex/verify()|
 |             v              /           \         v                    |
 |        .--------------.   /             \   .--------------.          |
 '------->| FINISHED/ENC |<--               -->| FINISHED/DEC |<---------'
          '--------------'                     '--------------'
                ^ |                                  ^ |
                | |hex/digest()                      | |hex/verify()
                '''                                  '''

  [*] this transition is not always allowed for non-online or 2-pass modes

 The following states are defined:

  - INITIALIZED. The cipher has been instantiated with a key, possibly
    a nonce (or IV), and other parameters.
    The cipher object may be used for encryption or decryption.

  - CLEAR HEADER. The cipher is authenticating the associated data.

  - ENCRYPTING. It has been decided that the cipher has to be used for
    encrypting data. At least one piece of ciphertext has been produced.

  - DECRYPTING. It has been decided that the cipher has to be used for
    decrypting data. At least one piece of candidate plaintext has
    been produced.

  - FINISHED/ENC. Authenticated encryption has completed.
    The final MAC has been produced.

  - FINISHED/DEC. Authenticated decryption has completed.
    It is now established if the associated data together the plaintext
    are authentic.

 In addition to the existing Cipher methods new(), encrypt() and decrypt(),
 5 new methods are added to a Cipher object:

  - update() to consume all associated data. It takes as input
    a byte string, and it can be invoked multiple times.
    Ciphertext / plaintext must not be passed to update().
    For simplicity, update() can only be called before encryption
    or decryption starts. If no associated data is used, update()
    is not called.

  - digest() to output the binary MAC. It can only be called at the end
    of the encryption.

  - hexdigest() as digest(), but the output is ASCII hexadecimal.

  - verify() to evaluate if the binary MAC is correct. It can only be
    called at the end of decryption. It takes the MAC as input.

  - hexverify() as verify(), but the input is ASCII hexadecimal.

 The syntax of update(), digest(), and hexdigest() are consistent with
 the API of a Hash object.

 IMPORTANT: method copy() is not present. Since most AEAD modes require
 IV/nonce to never repeat, this method may be misused and lead to security
 holes.

 Since MAC validation (decryption mode) is subject to timing attacks,
 the (hex)verify() method internally performs comparison of received
 and computed MACs using time-independent code.
 A ValueError exception is issued if the MAC does not match.

 === Padding

 The proposed API only supports AEAD modes that do not require padding
 of the plaintext. Adding support for that would make the API more complex
 (for instance, an external "finished" flag to pass encrypt()).

 === Pre-processing of associated data

 The proposed API does not support pre-processing of AD.

 Sometimes, a lot of encryptions/decryptions are performed with
 the same key and the same AD, and different nonces and plaintext.

 Certain modes like GCM allow to cache processing of AD, and reuse
 it such results across several encryptions/decryptions.

 === Singe/2-pass modes and online/non-online modes

 The proposed API supports single-pass, online AEAD modes.

 A "single-pass" mode processes each block of AD or data only once.
 Compare that to "2-pass" modes, where each block of data (not AD)
 is processed twice: once for authentication and once for enciphering.

 An "online" mode does not need to know the size of the message before
 encryption starts. As a consequence:
  - memory usage doesn't depend on the plaintext/ciphertext size.
  - when encrypting, partial ciphertexts can be immediately sent to the receiver.
  - when decrypting, partial (unauthenticated) plaintexts can be obtained from
    the cipher.

 Most single-pass modes are patented, and only the less efficient 2-pass modes
 are in widespread use.

 That is still OK, provided that encryption can start *before* authentication is
 completed. If that's the case (e.g. GCM, EAX) encrypt()/decrypt() will also
 take care of completing the authentication over the plaintext.

 A similar problem arises with non-online modes (e.g. CCM): they have to wait to see
 the end of the plaintext before encryption can start. The only way to achieve
 that is to only allow encrypt()/decrypt() to be called once.

 To summarize:

 Single-pass and online:
  Fully supported by the API. The mode is most likely patented.

 Single-pass and not-online:
  encrypt()/decrypt() can only be called once. The mode is most likely patented.

 2-pass and online:
  Fully supported by the API, but only if encryption or decryption can start
  *before* authentication is finished.

 2-pass and not-online:
  Fully supported by the API, but only if encryption or decryption can start
  *before* authentication is finished. encrypt()/decrypt() can only be called once.

 === Associated Data

 Depending on the mode, the associated data AD can be defined as:
  1. a single binary string or
  2. a vector of binary strings

 For modes of the 1st type (e.g. CCM), the API allows the AD to be split
 in any number of segments, and fed to update() in multiple calls.
 The resulting AD does not depend on how splitting is performed.
 It is responsability of the caller to ensure that concatenation of strings
 is secure. For instance, there is no difference between:

    c.update(b"builtin")
    c.update(b"securely")

 and:

    c.update(b"built")
    c.update(b"insecurely")

 For modes of the 2nd type (e.g. SIV), the API assumes that each call
 to update() ingests one full item of the vector. The two examples above
 are now different.

 === MAC

 During encryption, the MAC is computed and returned by digest().

 During decryption, the received MAC is passed as a parameter to verify()
 at the very end.

 === CCM mode
 Number of keys required:	1
 Compatible with ciphers:	AES (may be used with others if block length
                                 is 128 bits)
 Counter/IV/nonce:		1 nonce required (length 8..13 bytes)
 Single-pass:			no
 Online:				no
 Encryption can start before
 authentication is complete:	yes
 Term for AD:			"associate data" (NIST) or "additional
                                 authenticated data" (RFC)
 Term for MAC:			part of ciphertext (NIST) or
 				"encrypted authentication value" (RFC).
 Padding required:		no
 Pre-processing of AD:		not possible

 In order to allow encrypt()/decrypt() to be called multiple times,
 and to reduce the memory usage, the new() method of the Cipher module
 optionally accepts the following parameters:

  - 'assoc_len', the total length (in bytes) of the AD

  - 'msg_len', the total length (in bytes) of the plaintext or ciphertext

 === GCM mode
 Number of keys required:	1
 Compatible with ciphers:	AES (may be used with others if block length
                                 is 128 bits)
 Counter/IV/nonce:		1 IV required (any length)
 Single-pass:			no
 Online:				yes
 Encryption can start before
 authentication is complete:	yes
 Term for AD:			"additional authenticated data"
 Term for MAC:			"authentication tag" or "tag"
 Padding required:		no
 Pre-processing of AD:		possible

 === EAX mode
 Number of keys required:	1
 Compatible with ciphers:	any
 Counter/IV/nonce:		1 IV required (any length)
 Single-pass:			no
 Online:				yes
 Encryption can start before
 authentication is complete:	yes
 Term for AD:			"header"
 Term for MAC:			"tag"
 Padding required:		no
 Pre-processing of AD:		possible

 === SIV
 Number of keys required:	2
 Compatible with ciphers:	AES
 Counter/IV/nonce:		1 IV required (any length)
 Single-pass:			no
 Online:				no
 Encryption can start before
 authentication is complete:	no
 Term for AD:			"associated data" (vector)
 Term for MAC:			"tag"
 Padding required:		no
 Pre-processing of AD:		possible

 AD is a vector of strings. One item in the vector is the (optional) IV.

 One property of SIV is that encryption or decryption can only start
 as soon as the MAC is available.

 That is not a problem for encryption: since encrypt() can only be called
 once, it can internally compute the MAC first, perform the encryption,
 and return the ciphertext. The MAC is cached for the subsequent call to digest().

 The major problem is decryption: decrypt() cannot produce any output because
 the MAC is meant to be passed to verify() only later.
 To overcome this limitation, decrypt() *exceptionally* accepts the ciphertext
 concatenated to the MAC. The MAC check is still performed with verify().
	AEAD (Authenticated Encryption with Associated Data) modes of operations
	for symmetric block ciphers provide authentication and integrity in addition
	to confidentiality.

	Traditional modes like CBC or CTR only guarantee confidentiality; one
	has to combine them with cryptographic MACs in order to have assurance of
	authenticity. It is not trivial to implement such generic composition
	without introducing subtle security flaws.

	AEAD modes are designed to be more secure and often more efficient than
	ad-hoc constructions. Widespread AEAD modes are GCM, CCM, EAX, SIV, OCB.

	Most AEAD modes allow to optionally authenticate the plaintext with some
	piece of arbitrary data that will not be encrypted, for instance a
	packet header.

	In this file, that is called "associated data" (AD) even though terminology
	may vary from mode to mode.

	The term "MAC" is used to indicate the authentication tag generated as part
	of the encryption process. The MAC is consumed by the receiver to tell
	whether the message has been tampered with.

	=== Goals of the AEAD API

	1. Any piece of data (AD, ciphertext, plaintext, MAC) is passed only once.
	2. It is possible to encrypt/decrypt huge files withe little memory cost.
	To this end, authentication, encryption, and decryption are always
	incremental. That is, the caller may split data in any way, and the end
	result does not depend on how splitting is done.
	Data is processed immediately without being internally buffered.
	3. API is similar to a Crypto.Hash MAC object, to the point that when only
	AD is present, the object behaves exactly like a MAC.
	4. API is similar to a classic cipher object, to the point that when no AD is present,
	the object behaves exactly like a classic cipher mode (e.g. CBC).
	5. MACs are produced and handled separately from ciphertext/plaintext.
	6. The API is intuitive and hard to misuse. Exceptions are raised immediately
	in case of misuse.
	7. Caller does not have to add or remove padding.

	The following state diagram shows the proposed API for AEAD modes.
	In general, it is applicable to all block ciphers in the Crypto.Cipher
	module, even though certain modes may put restrictions on certain
	cipher properties (e.g. block size).

	.--------.
	\| START \|
	'--------'
	\|
	\| .new(key, mode, iv, ...)
	v
	.-------------.
	.-------------.-------------\| INITIALIZED \|---------------+-------------.
	\| \| '-------------' \| \|
	\| \| \| \| \|
	\| \|.encrypt() \|.update() \|.decrypt() \|
	\| \| \| \| \|
	\| \| v \| \|
	\| \| .--------------.___ .update() \| \|
	\| \| \| CLEAR HEADER \|<--' \| \|
	\| \| '--------------' \| \|
	\| \| \| \| \| \| \| \|
	\| \| .encrypt()\| \| \| \|.decrypt() \| \|
	\| v \| \| \| \| v \|
	\| .--------------. \| \| \| \| .--------------. \|
	\| \| ENCRYPTING \|<----' \| \| '---->\| DECRYPTING \| \|
	\| '--------------' \| \| '--------------' \|
	\| \| ^ \| \| \| \| ^ \| \|
	\| hex/digest()\| \| \|.encrypt()* \| \| \| \| \|.decrypt()* \|
	\| \| ''' \| \| \| ''' \|
	\| \| / \ \| \|
	\| \| hex/digest()/ \hex/verify()\|hex/verify() \|
	\|hex/digest() \| / \ \| \|
	\| \| / \ \| hex/verify()\|
	\| v / \ v \|
	\| .--------------. / \ .--------------. \|
	'------->\| FINISHED/ENC \|<-- -->\| FINISHED/DEC \|<---------'
	'--------------' '--------------'
	^ \| ^ \|
	\| \|hex/digest() \| \|hex/verify()
	''' '''

	[*] this transition is not always allowed for non-online or 2-pass modes

	The following states are defined:

	- INITIALIZED. The cipher has been instantiated with a key, possibly
	a nonce (or IV), and other parameters.
	The cipher object may be used for encryption or decryption.

	- CLEAR HEADER. The cipher is authenticating the associated data.

	- ENCRYPTING. It has been decided that the cipher has to be used for
	encrypting data. At least one piece of ciphertext has been produced.

	- DECRYPTING. It has been decided that the cipher has to be used for
	decrypting data. At least one piece of candidate plaintext has
	been produced.

	- FINISHED/ENC. Authenticated encryption has completed.
	The final MAC has been produced.

	- FINISHED/DEC. Authenticated decryption has completed.
	It is now established if the associated data together the plaintext
	are authentic.

	In addition to the existing Cipher methods new(), encrypt() and decrypt(),
	5 new methods are added to a Cipher object:

	- update() to consume all associated data. It takes as input
	a byte string, and it can be invoked multiple times.
	Ciphertext / plaintext must not be passed to update().
	For simplicity, update() can only be called before encryption
	or decryption starts. If no associated data is used, update()
	is not called.

	- digest() to output the binary MAC. It can only be called at the end
	of the encryption.

	- hexdigest() as digest(), but the output is ASCII hexadecimal.

	- verify() to evaluate if the binary MAC is correct. It can only be
	called at the end of decryption. It takes the MAC as input.

	- hexverify() as verify(), but the input is ASCII hexadecimal.

	The syntax of update(), digest(), and hexdigest() are consistent with
	the API of a Hash object.

	IMPORTANT: method copy() is not present. Since most AEAD modes require
	IV/nonce to never repeat, this method may be misused and lead to security
	holes.

	Since MAC validation (decryption mode) is subject to timing attacks,
	the (hex)verify() method internally performs comparison of received
	and computed MACs using time-independent code.
	A ValueError exception is issued if the MAC does not match.

	=== Padding

	The proposed API only supports AEAD modes that do not require padding
	of the plaintext. Adding support for that would make the API more complex
	(for instance, an external "finished" flag to pass encrypt()).

	=== Pre-processing of associated data

	The proposed API does not support pre-processing of AD.

	Sometimes, a lot of encryptions/decryptions are performed with
	the same key and the same AD, and different nonces and plaintext.

	Certain modes like GCM allow to cache processing of AD, and reuse
	it such results across several encryptions/decryptions.

	=== Singe/2-pass modes and online/non-online modes

	The proposed API supports single-pass, online AEAD modes.

	A "single-pass" mode processes each block of AD or data only once.
	Compare that to "2-pass" modes, where each block of data (not AD)
	is processed twice: once for authentication and once for enciphering.

	An "online" mode does not need to know the size of the message before
	encryption starts. As a consequence:
	- memory usage doesn't depend on the plaintext/ciphertext size.
	- when encrypting, partial ciphertexts can be immediately sent to the receiver.
	- when decrypting, partial (unauthenticated) plaintexts can be obtained from
	the cipher.

	Most single-pass modes are patented, and only the less efficient 2-pass modes
	are in widespread use.

	That is still OK, provided that encryption can start before authentication is
	completed. If that's the case (e.g. GCM, EAX) encrypt()/decrypt() will also
	take care of completing the authentication over the plaintext.

	A similar problem arises with non-online modes (e.g. CCM): they have to wait to see
	the end of the plaintext before encryption can start. The only way to achieve
	that is to only allow encrypt()/decrypt() to be called once.

	To summarize:

	Single-pass and online:
	Fully supported by the API. The mode is most likely patented.

	Single-pass and not-online:
	encrypt()/decrypt() can only be called once. The mode is most likely patented.

	2-pass and online:
	Fully supported by the API, but only if encryption or decryption can start
	before authentication is finished.

	2-pass and not-online:
	Fully supported by the API, but only if encryption or decryption can start
	before authentication is finished. encrypt()/decrypt() can only be called once.

	=== Associated Data

	Depending on the mode, the associated data AD can be defined as:
	1. a single binary string or
	2. a vector of binary strings

	For modes of the 1st type (e.g. CCM), the API allows the AD to be split
	in any number of segments, and fed to update() in multiple calls.
	The resulting AD does not depend on how splitting is performed.
	It is responsability of the caller to ensure that concatenation of strings
	is secure. For instance, there is no difference between:

	c.update(b"builtin")
	c.update(b"securely")

	and:

	c.update(b"built")
	c.update(b"insecurely")

	For modes of the 2nd type (e.g. SIV), the API assumes that each call
	to update() ingests one full item of the vector. The two examples above
	are now different.

	=== MAC

	During encryption, the MAC is computed and returned by digest().

	During decryption, the received MAC is passed as a parameter to verify()
	at the very end.

	=== CCM mode
	Number of keys required: 1
	Compatible with ciphers: AES (may be used with others if block length
	is 128 bits)
	Counter/IV/nonce: 1 nonce required (length 8..13 bytes)
	Single-pass: no
	Online: no
	Encryption can start before
	authentication is complete: yes
	Term for AD: "associate data" (NIST) or "additional
	authenticated data" (RFC)
	Term for MAC: part of ciphertext (NIST) or
	"encrypted authentication value" (RFC).
	Padding required: no
	Pre-processing of AD: not possible

	In order to allow encrypt()/decrypt() to be called multiple times,
	and to reduce the memory usage, the new() method of the Cipher module
	optionally accepts the following parameters:

	- 'assoc_len', the total length (in bytes) of the AD

	- 'msg_len', the total length (in bytes) of the plaintext or ciphertext

	=== GCM mode
	Number of keys required: 1
	Compatible with ciphers: AES (may be used with others if block length
	is 128 bits)
	Counter/IV/nonce: 1 IV required (any length)
	Single-pass: no
	Online: yes
	Encryption can start before
	authentication is complete: yes
	Term for AD: "additional authenticated data"
	Term for MAC: "authentication tag" or "tag"
	Padding required: no
	Pre-processing of AD: possible

	=== EAX mode
	Number of keys required: 1
	Compatible with ciphers: any
	Counter/IV/nonce: 1 IV required (any length)
	Single-pass: no
	Online: yes
	Encryption can start before
	authentication is complete: yes
	Term for AD: "header"
	Term for MAC: "tag"
	Padding required: no
	Pre-processing of AD: possible

	=== SIV
	Number of keys required: 2
	Compatible with ciphers: AES
	Counter/IV/nonce: 1 IV required (any length)
	Single-pass: no
	Online: no
	Encryption can start before
	authentication is complete: no
	Term for AD: "associated data" (vector)
	Term for MAC: "tag"
	Padding required: no
	Pre-processing of AD: possible

	AD is a vector of strings. One item in the vector is the (optional) IV.

	One property of SIV is that encryption or decryption can only start
	as soon as the MAC is available.

	That is not a problem for encryption: since encrypt() can only be called
	once, it can internally compute the MAC first, perform the encryption,
	and return the ciphertext. The MAC is cached for the subsequent call to digest().

	The major problem is decryption: decrypt() cannot produce any output because
	the MAC is meant to be passed to verify() only later.
	To overcome this limitation, decrypt() exceptionally accepts the ciphertext
	concatenated to the MAC. The MAC check is still performed with verify().