Thursday, 22 November 2012

ZeroMQ Pattern: Pub/Sub Data Access

One problem that comes up with some regularity is controlling access to a stream of rapidly changing data, particularly over multicast. For example, there may be a stream of updates which is being broadcast out to many users (possibly a tree of users with repeaters at certain points), but we would like to control which ones can see those updates independently of that data transmission. 

There are a tonne of ways of doing this, but one of my favourites is to take a note from the book of everyones favourite copy protection technology on DVDs and Blu-Rays. Rather than trying to restrict each users access to the data, we encrypt and freely share the data, but share the decryption key only with our approved readers.

The publisher holds a list of keys which are shared between it and individual consumers. It generates a data encryption key which will be used to symmetrically encrypt the messages as they are sent. The publisher encrypts this key under each of the consumer keys, and sends out one bundle which everyone receives. Each consumer can pull out their own code, and decrypt it with their consumer key to get the data key. 

If the publisher needs to revoke access, it simply generates a new code and sends it out to all the users except the one who no longer can receive the data. This is particularly convenient for PGM transports as it means that the publisher really can push data out without worrying too much about who is in the group, with the access management being done in a side channel. 

For the film industry this meant that each disc was effectively a message, and came with its content key encrypted under the keys for each player (or set of players) that needed to play the disc. In their case the amount of information leaked by one key getting out is pretty major, so it didn't work all that well. However, if our messages are smaller, and we're more concerned with preventing access to future data once we have revoked access rather than preventing access to past data, it's a good fit. 

As a pure example, lets look at how you might implement such a thing in the PHP ZeroMQ binding. The code in all its noddy glory is on github

First, lets take a look at the client, the consumer of the data. 

$code = openssl_random_pseudo_bytes(8); $decode_key = null; $myName = uniqid()
We're going to start by setting up some variables. We generate a random code for ourself for the encryption key, and a random string for the name using uniqid. Next we actually need to do some work. For this example we're just going to grab a bunch of data, and then exit. 

// Insecure key exchange! Fnord!
$ctl->sendMulti(array("ADD", $myName, $code)); for($i = 0; $i < 10000000; $i++) {     $data = $sub->recvMulti();     if($data[0] == '' && $decode_key != null) {         echo $data[1], " ", plaintext($decode_key, $data[2]), "\n";     } else if($data[0] == "vital.config") {         $keys = json_decode($data[2], true);         $decode_key = plaintext($code, $keys[$myName]);         echo "Code update: ", $data[1], " ", bin2hex($decode_key), "\n";     } } $ctl->sendMulti(array("RM", $myName));
First thing we do is register our key with the server. In this case we're just passing the key straight across the wire, which is not great if there are middlemen snooping on us - we could use a Diffie-Hellman key exchange, or perhaps have some preshared keys or async crypto to secure this, especially since often you're implementing this because data is going across some untrusted network (such as cloud hosting). In practice, I've found that actually pre-arranging the key list is generally fine (and perhaps pushing it out through a config management tool), as consumers don't get modified very often, but for the example its easier to just fire it across another socket. 

Once we've enrolled with our ADD command, we then listen on our SUB socket for messages. If the message is vital.config we need to extract our data key and decrypt it. The data in this case is sent via JSON (which isn't really the best choice here, but was me being a bit lazy!) so we JSON decode it to get a hashmap of client identities to encrypted data keys. We look up our entry, and then decrypt the data key using the private key we shared with the producer. 

In the other case, we receive a data message. In that case we use the data key we received (as long as it has been set by then), decrypt the message, and print it out. The decrypt function is straightforward and looks like this: 

function plaintext($code, $data) {     $data = base64_decode($data);     $iv = substr($data, 0, IV_SIZE);     $data = substr($data, IV_SIZE);     return openssl_decrypt($data, CRYPTO_METHOD, $code, false, $iv); }
Most of this is actually boiler plate because of the use of JSON in one case. We base64_decode our value, extract the initialisation vector (kind of like a salt) and the data, then use the openssl_decrypt function to decode the data. Simples. 

On the producer side, it's not much harder. Here's our main loop:
while(true) {     $poll->poll($read, $write, 0);     if(count($read)) {         // We have new control messages!         $msg = $ctl->recvMulti();         if($msg[1] == "ADD") {             $client_codes[$msg[2]] = $msg[3];         } else if($msg[1] == 'RM') {             unset($client_codes[$msg[2]]);         }         $code = openssl_random_pseudo_bytes(8);         $data = get_codes($client_codes, $code);         $pub->sendMulti(array("vital.config", $code_sequence++, $data));         echo "Code update: ", $code_sequence, " ", bin2hex($code), "\n";     } else {         $data = secret($code, vital_data());         $pub->sendMulti(array("", $sequence++, $data));     }          // Slow things down to give readable output     usleep(10000); }

In this case we are polling in case we receive control messages. If we get any in we either add a client code to our list if its an ADD or remove it if its an RM type. Once we have updated our client list we then need to regenerate our secret data code (the $code variable) and encrypt that under each one of the consumer keys for the consumers we want to allow to see it. 
Note: this doesn't take into account someone naughty adding their own code - in any realistic situation you'd want to verify the person could actually create the code they said - public key encryption could help here, or just skipping this step and having it triggered via a backchannel. 
If we don't have any control messages, we just grab the next bit of data to send (calling the vital_data function in this case) and encrypt it under our current data key. The encryption function is straightforward:

function secret($code, $data) {     $iv = openssl_random_pseudo_bytes(IV_SIZE);     return base64_encode($iv . openssl_encrypt($data, CRYPTO_METHOD, $code, false, $iv)); }

We just generate an IV, encrypt the data, concat the two and base64 encode the result (again because of the JSON encoding - there's not real need to do this if just sending across ZeroMQ. The output shows the process in action: 
$ php producer.php
Code update: 0 
Code update: 1 86e9c30313c07ded
Code update: 2 a0330e14c78fa18b
Code update: 3 ad0ed74fff073532
The producer just echoes whenever it generates a new key (which is printed here). 
$ php client.php Code update: 0 77e95b9e4e6603444 My important data is this: 8541531365 My important data is this: 19263273576 My important data is this: 18437864387 My important data is this: 1643275369
The client gets the data key, and can review it. A bad client that isn't in the list can just see the data as it is on the wire:
$ php badclient.php
Code update: 0
4 eD7nCjH+uhm8727kucE722s1UHdjUjJqYVhIVkFpT...
5 u8xwhZpuYejplenQ3hFjP1paYnEvN25penJyTFQ2a...
A note on performance: There is a great deal that could be done to make this more efficient, particularly in the encoding! Still, on a two year old macbook air I could comfortably push 20k+ messages per second with this setup, so even with a naive implementation this is still not too much overhead for many uses.