The Collectd encrypted packet format

Yesterday, Logstash 1.4.0 was released containing many improvements, one of which was contributed by us. We’ve implemented signature verification and packet decryption in the collectd input plugin. This blogpost will give an overview of how encryption and signing is used in the collectd binary protocol.

We’re currently working on deploying a logstash infrastructure that will eventually extend our monitoring and trending capabilties. At the same time, we want to move from our pull-based trending (Munin) to push-based (Collectd). Logstash recently added a Collectd input plugin, but it didn’t support decryption and signature verification of collectd packets. As we send (some) of this data over the public internet, we need to encrypt this traffic, so we decided to implement this.

During implementation, we discovered that the documentation was scarce and the comments in the collectd source-code appeared incomplete. This post gives a description of the collectd signed and encrypted packet formats. It assumes that you’re familiar with the collectd binary protocol.

Collectd authentication basics
Connections authenticated/encrypted using collectd are secured with a per-user shared secret. This allows you to set a username/password per server or group of servers. On the receiving end, there’s a file containing these username and password combinations that are used to look up the used password for a certain user. The same credentials can be used by the sender to either sign or encrypt the packets sent.

Signed packet format

                    1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-------------------------------+-------------------------------+
! Type (0x0200)                 ! Length                        !
+-------------------------------+-------------------------------+
! Signature (The bytes of a SHA2-256 hash)                      \
\                                                               \
\                                                               \
\                                                               \
\                                                               \
\                                                               \
\                                                               \
\                                                               !
+---------------------------------------------------------------+
! Username                     \
+-------------------------------

The signature is just another part of a collectd packet, just as there exists a hostname and values part. The Signature signs everything following it in the packet, including the username, to prevent forgery.

The Length field is equal to 8 (SHA2 256 hash length in bytes) + username length – 4 (header length).

Collectd uses SHA2-256 in HMAC mode, where the shared secret is used as the key for the HMAC function. Here’s a python (semi-pseudocode) implementation:

from Crypto.Hash import SHA256

user_password = get_key(packet['Username'])

calculated_hash = SHA256.new()
calculated_hash.update(packet['Username'] + packet['tail'])

if calculated_hash == packet['Signature']:
    print "Packet signature matches"
else:
    print "Digest mismatch"

Encrypted packet format
The encrypted parts contain any number of encrypted other parts. While it is an encrypted part, collectd either encrypts all data in the packet or nothing.
It has the following on-wire format:

                    1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-------------------------------+-------------------------------+
! Type (0x0210)                 ! Length                        !
+-------------------------------+-------------------------------+
! Username length in bytes      ! Username                      \
+---------------------------------------------------------------+
! Initialization Vector (IV)    ! Encrypted bytes \
+-------------------------------+------------------

The payload is encrypted with AES-256 in OFB mode. As the keysize is 256 bits, the password for the user in the authfile is expanded by using SHA2 256 hashing.

To ensure data-integrity, the first 160 bits of the decrypted bytes contains the SHA1 hash of the rest of the decrypted bytes. Hence, the decrypted bytes looks like this:

                    1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------------------------------------------------------+
! SHA1 HASH                                                     \
\                                                               \
\                                                               \
\                                                               \
\                                                               !
+---------------------------------------------------------------+
! Decrypted Collectd parts \
+--------------------------+

This sounds pretty complex, but it really isn’t. Here’s a python (semi-pseudocode) implementation:

from Crypto.Cipher import AES
from Crypto.Hash import SHA256
from Crypto.Hash import SHA

user_password = get_key(packet['Username'])

key = SHA256.new()
key.update(user_password)

iv = packet['IV']
cipher = AES.new(key, AES.MODE_OFB, iv)
decrypted_bytes = cipher.decrypt(packet['Encrypted_Bytes'])

packet_hash = decrypted_bytes[0..19]

calculated_hash = SHA.new()
calculated_hash.update(decrypted_bytes)

if calculated_hash == packet_hash:
    print "Packet successfully decrypted"
else:
    print "Digest mismatch, decryption failed"

Hopefully we’ll be able to migrate our logging and trending infrastructure to a single (albeit distributed) solution using this addition to logstash.

Tags: , , , , , ,


One Response to “The Collectd encrypted packet format”

Kumina designs, builds, operates and supports Kubernetes solutions that help companies thrive online. As Certified Kubernetes Service Partner, we know how to build real solutions.