Cipher Chunking
Content on this page was adapted from It was contributed by Brendan O'Brien (@b5), and modified by @expede
IPFS stores data in Merkle DAGs: directed acyclic graphs that refer to content by hash-of-content. Turning an arbitrary byte sequence into a Merkle DAG requires breaking up the input sequence into sub-sequences that can be hashed and arranged into a tree. The process of breaking the input sequence into sub-sequences is called chunking:
Cipher Chunking (CCK) is a modified size-chunking strategy that passes chunked data through an encryption step before hashing into blocks.
We construct a chunker that wraps a cipher built from a single symmetric key (usually a random number 32 bytes in length). The key must be kept secret, and stored separately from the data. Each chunk gets a cryptographicly random initialization vector (IV) when it's constructed. The IV is often referred to as a nonce. Nonces are not secret, but must be retained and combined with the key to decrypt the ciphertext.
By convention the nonce is prepended to the first bytes of the block ciphertext. This is a common convention used when storing block-encrypted data because the nonce is required before decryption can begin. The Go language crypto/cipher standard libary package prepends nonces to ciphertext by default. Ciphers use a nonce size that is based on the length of the key, making the process of separating nonce from ciphertext trivial.
CC has a few notable properties:
  • The DAG structure is not encrypted, only the byte sequence is.
    • This is a small compromise in secrecy. Encrypted data is normally stored in an ordered sequence, where the block size is itself a secret
      • In this case an attacker knows that each block begins with a nonce.
    • All existing IPFS code will interpret cipherchunked as a valid UnixFS file, the data will be illegible without the key.
    • IPFS UnixFS files support Seeking to byte offsets by skipping through the dag. cipherchunked files inherit this property.
    • Security could be added by randomizing & encoding special instructions for walking the DAG. It's not clear this would provide much benefit.
  • CCKing is DAG-layout independent. Balanced & Trickle DAG structures can be constructed using existing dag layout construction code.
  • Hashing the same content multiple times will produce different hashes, because nonces are randomly generated and stored within blocks. This is required for keeping encrypted data secure, regardless of how encrypted data is stored.
    • Because of this encrypted data stored on IPFS will never "self-deduplicate" through hash collision the way plaintext does.
  • CCKing does not support UnixFS Directories. Directories must be defined in plaintext.
  • CCKing files can be stored within existing plaintext directories
  • CCKing can be used equally with both block and streaming ciphers. It aligns byte length to IPFS chunk size
When writing to IPFS data must be chunked. If we want to write encrypted data to IPFS, it must be encrypted, cipherchunking combines these two processes into the same step, decreasing the overall amount of memory allocations required when compared to encrypting data before chunking by re-using memory allocations already required by the chunker for encryption.
Combining these steps drives optimizations in either chunking or encryption into the other. A parallelized DAG constructor when combined with a cipherchunker will yield parallelized encryption, even if the chosen cipher doesn't support parallelized encryption.
Export as PDF
Copy link