Tier 5
This tier is aimed at
filling in a few gaps, showing the wider applicability of base64 encoding, and
pointing to further reading.
Padding: The Trailing
Equals Character
When I first looked at
the characters used in base64 encoding I noticed there was a cheeky 65th
character (‘=’) sometimes appearing once or twice at the end of encoded data.
It’s actually a special character used for when source binary data doesn’t
divide neatly into three byte blocks. A quick example to illustrate.
Imagine I want to
base64 encode the following four 8-bit bytes:
01000001 01100100
01100001 01101101
I take the first three
octets:
01000001 01100100
01100001
Represent them as four
sextets:
010000 010110 010001
100001
And encode using my
encoding key, producing: QWRh
But now I have a
lonely, final octet left to encode: 01101101
In base64 encoding it’s
simply padded out with trailing zeros until we have another three octets:
01101101 00000000
00000000
And converted it to
sextets as normal:
011011 010000 000000
000000
Any sextet which contains nothing but padded zeros gets represented as ‘=’.
So the rest of the encoded
data becomes: bQ==.
The ‘=’ character is a
bit of a courtesy and not every implementation of base64 encoding uses it; it
is possible to recreate the original binary data without using ‘=’ for padding,
it’s is just more explicit to include it.
Other Uses:
Base64 encoding is typically used in scenarios where
representing binary data as a limited set of ASCII characters is desirable.
This could be when using an 8-bit (or greater) character encoding isn’t viable,
or when you wish to embed binary data in a explicitly text-based medium, or when
sending non-alpha-numeric characters could be an issue.
Attachments to emails are base64 encoded, as are the
username and passwords sent for basic HTTP authentication. The specifics of why
base64 encoding is used in these scenarios is beyond this series, but reading
about https://en.wikipedia.org/wiki/8-bit_clean and https://en.wikipedia.org/wiki/Email_attachment gives you a good idea of why this is the case. The
below quote taken from the Email Attachment Wikipedia page gives a good sense
of the history:
“Originally Internet SMTP email
was 7-bit ASCII text only, and attaching files was done by manually encoding
8-bit files using uuencode, BinHex or xxencode and pasting the resulting text
into the body of the message.”
Further
Resources:
Once
you grasped the basics of base64 encoding the Wikipedia article actually
becomes useful. To my mind it’s missing a Tier 1 style explanation but it
otherwise quite passable.
There’s
an Oracle blog post which is also good – again, if you’ve got some base knowledge
to work from.
And
when you want to go full nerd there’s the IETF spec!