The feared (because of the huge penalties for non-compliance) GDPR, or General Data Protection Regulation, sets certain rules to ensure that private data within EU is protected. While we usually deal with legal aspects, today we will go technical. We will talk about ENCRYPTION. We will give our insights about how encryption can be best utilized in order to comply with the GDPR goals – to protect your clients’ personal data.
There are two types of encryption: one that will prevent your sister from reading your diary and one that will prevent your government.Bruce Schneier
In this article, we will try to stay away from the mathematical component of the encryption techniques, for as much as we can. Still, if math has never been your strength, don’t blame us for not understanding everything. Cryptography is hard to comprehend, no matter how hard we will try to make it look simple.
Basically, cryptography is the technique of scrambling plaintext (ie message, file, etc.) with a secret string (ie password, keyfile, etc.) so it becomes impossible for anyone to restore the plaintext without the secret string.
So far so good. To perform this, there are two basic techniques – substitution and transposition.
Substitution is when certain parts of the plaintext (bits, bytes) are replaced by different chunks of data derived from the password.
Transposition is when the chunks of data are moved in to a different position, much like when you shuffle a deck of cards. This is basically how encryption works, in its oversimplified form. Unfortunately (or not?), things are much more complicated.
The hardest about encryption is that there is no mathematical way to prove any encryption algorithm to be “unbreakable”. The only encryption that can never be broken is the so called “one-time pad”. The problem with it is however that one needs to use password that is of the same (or longer) length than the text that needs to be encrypted. Therefore, this technique is unsuitable for any practical encryption operations.
Did you know that there are 52! (fifty two factorial, that will say 1 x 2 x 3 x 4 …….. x 51 x 52) ways you can arrange a deck of 52 playing cards. It is hard to comprehend the magnitude of this number. To give you an idea of how big it is, lets assume you shuffle really good a deck of cards. Then you ask everyone on the planet (seven billion people) to do nothing else, but shuffle a deck of cards, 24 hours a day, seven days a week. So imagine 7 billion people (babies and elderly too) shuffling their own decks of cards and checking every 3 seconds (really fast shufflers) their cards’ arrangements. How long do you think will take until someone (anyone of the 7 billion shufflers) gets the same card arrangement as your deck?…. Well, it will take about 548068704955859144458446380031010593167601010008176 years until someone from the shufflers gets your combination. In other words, if you shuffle your deck really good you can be pretty sure that no other deck of cards has ever been arranged in the same way, and it will never be…. Or at least, it will take 39147764639704224604174741430786470940542 times the age of our Universe until someone from the shufflers gets your combination (and that if the whole world does nothing else, but shuffling cards). Frightening huh?
Or why using asymmetric encryption is not such a bad idea to protect your clients’ personal information
Symmetric encryption is exactly what the word stands for. You take plaintext “Hello Bob”, encrypt it with your secret key (password) and you get cyphertext “H%Rj9*KL1”.
Plaintext Symmetric cypher Cyphertext ----------- ---------------- ---------- "Hello Bob" -> [secret key] -> "H%Rj9*KL1"
To decrypt the message, the recipient needs to reverse the process by applying the same password (that the message has been encrypted with) to the cyphertext in order to restore the plaintext:
Cyphertext Symmetric cypher Plaintext ----------- ---------------- ---------- "H%Rj9*KL1" -> [secret key] -> "Hello Bob"
The problem with symmetric encryption requires the parties to use the same password in order to exchange encrypted data. This is trivial if you are corresponding with a friend or business partner whom you have direct contact with, but is not feasible otherwise.
Using symmetric encryption is pretty much like locking your door with a key. In order to grant access to another resident, you have to physically provide him a key (yours or a copy of it). Not a wonder that we often leave the key under the mat or in other accessible place, which by itself poses serious security concern. In the digital world, this is absolutely unacceptable.
Imagine trying to establish secure email communication channel with a business prospect from the other end of the world. How will you agree on the “password” that will be used to encrypt and decrypt your messages? You can send the password by DHL for instance, but this is highly insecure, costly and inconvenient. Or you can communicate the password over the phone, but again, anyone who may be taping the phone traffic will be able to read all your encrypted messages later.
Or you can send the password to your counterpart through Whatsup or other similar program that uses “end-to-end-encryption“. This is of course possible and it is indeed secure (for as far the app is configured correctly), because these applications use the so-called “asymmetric encryption“. Such encryption is used by your browser too, every time you connect to secure website (HTTPS). Ever wondered how can be your online conversation on Facebook secure when you have not agreed on any shared password with any of your contacts?
Without going too much into the details (yet), asymmetric encryption is done by encrypting a message with one key, which key is known to everybody (aka “public key”). The decryption is done with another key, known only by the recipient (aka “private key”).
But how can the private key be publicly known, while its corresponding secret key is kept secret? Imagine having two prime numbers, say 5 and 13. These will be your private key(s). Then multiply the two numbers, 5 x 13 = 65. Your private key in this case is 65, which you will share with everybody. In this way, nobody can find out what the factors of 65 are, namely 5 and 13. Right? Not really, because 65 is a tiny number and everybody can factor it by guessing it in seconds. But this process of guessing is in fact what we call “brute force” as there is no algorithm that can factor 65 in 5 and 13. Now imagine picking huge (random) prime numbers to be your private key(s) and then multiplying them to get your public key. For an adversary to find your private keys by factoring the public key will take billions of years of computer work, even by using billion computers simultaneously.
Although the given prime numbers factoring example is overly simplified, this is more or less how RSA cryptosystem works. Without RSA, we would be probably never been able to communicate securely over the internet.
The beauty of the asymmetric syphers is that they do not require the parties to securely exchange private keys in advance. Instead, they create each its own keypair (private + public) and exchange the public part of the key openly. Encryption is done with the public key, while decryption is made with the secret (private) key.
The encryption process works like that:
Plaintext Public cypher Cyphertext ----------- ----------------------- ---------- "Hello Bob" -> [recipient's public key] -> "H%Rj9*KL1"
There is one caveat however. Public cyphers are extremely slow. So slow that the encrypting process becomes impractical. Therefore, the “real world” public cryptography works a little different. Namely, to encrypt a message, the system creates a random session key, which is used to encrypt the plaintext with symmetric (and fast) cypher. Then, the random key (which is short, ie 256 bits) is encrypted with the public key of the recipient. Then, the symmetrically encrypted plaintext is sent alongside with its encryption key. This (session) key is encrypted with the public key of the recipient. Sounds complicated? Not really. Here is how the encryption works:
Phase 1: Plaintext Symmetric cypher Cyphertext ----------- ----------------------- ---------- "Hello Bob" -> [randomly generated key] -> "H%Rj9*KL1" Phase 2: Randomly generated key Public cypher Encrypted key ---------------------- ----------------------- -------------- "g7I#A&80KJH100LqO-5L" -> [recipient's public key] -> "g7I#A&80KJH100LqO-5L"
Upon receipt, the other party reverses the process by first decrypting the randomly generated key with its secret key (pure asymmetric cryptography here). Then he uses the decrypted session key to decrypt the cypher text so he can get the plaintext. Simple.
In this way, the main drawback of the asymmetric cyphers, namely their (horrible) speed is eliminated, as the only thing that needs to be encrypted with the public key of the recipient is the random (session) key.
Key size is very important. In general, the larger the key the stronger the encryption.
Work in progress, to be continued…