Thursday 2 February 2017

Understanding Base64 Encoding #3

Tier 3

In Tier 2 I found myself with some binary data (a 10 x 10 pixel image) and a text-based medium (pen and paper) with which to communicate that image to another computer. The image is about a thousand bytes large and I didn’t fancy having to write down eight thousand ones and zeros in order to communicate that image. I’d decided I need to encode the raw data to save myself some pain.

My initial instinct to encode the raw data is that I’ll use a character to represent each possible byte value. This means I can reduce the characters I have to write out from 8000 to 1000. i.e. instead of having to write the byte value ‘00000000’, I could instead write ‘A’. As long as the recipient of my encoded image knows the encoding, e.g. ‘A’ = ‘00000000’, then they can decode the image. I start to write out my encoding key:

00 | 00000000 = A
01 | 00000001 = B
02 | 00000010 = C
24 | 00011000 = Y
25 | 00011001 = Z
26 | 00011010 = a
27 | 00011011 = b
50 | 00110010 = y
51 | 00110011 = z
52 | 00110100 = 0
53 | 00110101 = 1
61 | 00111110 = 9

However, as you might be able to see, by the time I’ve covered byte values 0 to 61 I’ve run out of standard alpha-numeric characters (A-Z, a-z and 0-9). I’m going to have to start using some less recognised characters – and/or possibly even fabricating new ones – in order to get all the way to 256 (the distinct values which can be represented by an 8-bit byte: 0 to 255). This gives me pause for thought. It feels like there’s potential for confusion if I start using arcane or made-up characters.

I stop and have a think. I’ve got 62 alphanumeric characters I’m confident any decoder can easily recognise. I also suspect I could probably be fairly confident using a handful of other characters, e.g. ‘=’, ‘!’, ‘+’, ‘:’, ‘&’, ‘/’, ‘\’, ‘%’, etc. But that doesn’t bring me anywhere near to the 256 characters I’d need for this encoding method.

While I’m ruminating on the problem a thought appears: 62, the number of easily recognised characters I have, is close to the binarily-significant number 64 – the distinct values which can be represented by 6 bits: 0 to 63 or 000000 to 111111. Perhaps I can use this? If I picked a couple of my additional characters at random, say ‘+’ and ‘/’ that would bring me up to a encoding set of 64 easily recognisable characters. I bank the thought.

Then comes the flash of inspiration! Ultimately, I’m just trying to communicate a series of ones and zeros from A to B. When thinking about those ones and zeros I’ve always naturally separated them into 8-bit bytes, but for the purpose of transmission there’s no inherent reason to do so; as long as the correct sequence of ones and zeros reaches the other end the interpretation of that data as 8-bit bytes is the receiving computer’s decision.

I start to jot down my thinking. Imagine the first 3 8-bit bytes of my 10 x 10 image are as follows:

00000010 – 0011011 – 00110100

For transmission, I could split those 24 bits any way I like. Into two bit chunks, for example:

00 – 00 – 00 – 10 – 00 – 01 – 10 - 11 – 00 – 11 – 01 – 00

Or - going back to my previous thinking! – as 6-bit chunks:

000000 – 100001 – 101100 – 110100

And with 6-bit chunks, I can use my recognisable character encoding key!

A – h – s – 0

I could send you "Ahs0" and, as long as you knew the decryption key, you could reverse the encryption and retrieve the bits.

And this is the bare bones of base64 encoding. I’ll fill in the gaps and attempt to extricate the tortured analogy from this explanation, applying the real world, in Tier 4.

Next >> Understanding Base64 Encoding #4

No comments:

Post a Comment