Tier 3
In
Tier 2 I found myself with some binary data (a 10 x 10 pixel image) and a
text-based medium (pen and paper) with which to communicate that image to
another computer. The image is about a thousand bytes large and I didn’t fancy
having to write down eight thousand ones and zeros in order to communicate that
image. I’d decided I need to encode the raw data to save myself some pain.
My
initial instinct to encode the raw data is that I’ll use a character to
represent each possible byte value. This means I can reduce the characters I
have to write out from 8000 to 1000. i.e. instead of having to write the byte
value ‘00000000’, I could instead write ‘A’. As long as the recipient of my
encoded image knows the encoding, e.g. ‘A’ = ‘00000000’, then they can decode
the image. I start to write out my encoding key:
00
| 00000000 = A
01
| 00000001 = B
02
| 00000010 = C
…
24
| 00011000 = Y
25
| 00011001 = Z
26
| 00011010 = a
27
| 00011011 = b
…
50
| 00110010 = y
51
| 00110011 = z
52
| 00110100 = 0
53
| 00110101 = 1
…
61
| 00111110 = 9
However,
as you might be able to see, by the time I’ve covered byte values 0 to 61 I’ve
run out of standard alpha-numeric characters (A-Z, a-z and 0-9). I’m going to have to start
using some less recognised characters – and/or possibly even fabricating new
ones – in order to get all the way to 256 (the distinct values which can be
represented by an 8-bit byte: 0 to 255). This gives me pause for thought. It
feels like there’s potential for confusion if I start using arcane or made-up
characters.
I
stop and have a think. I’ve got 62 alphanumeric characters I’m confident any
decoder can easily recognise. I also suspect I could probably be fairly confident using a
handful of other characters, e.g. ‘=’, ‘!’, ‘+’, ‘:’, ‘&’, ‘/’, ‘\’, ‘%’,
etc. But that doesn’t bring me anywhere near to the 256 characters I’d need for
this encoding method.
While
I’m ruminating on the problem a thought appears: 62, the number of easily
recognised characters I have, is close to the binarily-significant number 64 –
the distinct values which can be represented by 6 bits: 0 to 63 or 000000 to
111111. Perhaps I can use this? If I picked a couple of my additional
characters at random, say ‘+’ and ‘/’ that would bring me up to a encoding set
of 64 easily recognisable characters. I bank the thought.
Then
comes the flash of inspiration! Ultimately, I’m just trying to communicate a
series of ones and zeros from A to B. When thinking about those ones and zeros I’ve
always naturally separated them into 8-bit bytes, but for the purpose of
transmission there’s no inherent reason to do so; as long as the correct
sequence of ones and zeros reaches the other end the interpretation of
that data as 8-bit bytes is the receiving computer’s decision.
I start to jot down my thinking. Imagine
the first 3 8-bit bytes of my 10 x 10 image are as follows:
00000010
– 0011011 – 00110100
For transmission, I could split those 24 bits any way I like. Into two bit
chunks, for example:
00
– 00 – 00 – 10 – 00 – 01 – 10 - 11 – 00 – 11 – 01 – 00
Or
- going back to my previous thinking! – as 6-bit chunks:
000000
– 100001 – 101100 – 110100
And
with 6-bit chunks, I can use my recognisable character encoding key!
A
– h – s – 0
I could send you "Ahs0" and, as long as you knew the decryption key, you could reverse the encryption and retrieve the bits.
And
this is the bare bones of base64 encoding. I’ll fill in the gaps and attempt to
extricate the tortured analogy from this explanation, applying the real world, in Tier 4.
Next >> Understanding Base64 Encoding #4
Next >> Understanding Base64 Encoding #4
No comments:
Post a Comment