Tuesday, 7 February 2017

Understanding Base64 Encoding #4

Tier 4

For this tier I’m going to start to push the strained and sanitised analogy into the background and, hopefully, bring the hard edges of base64 encoding into focus.

First, a quick recap on what we’ve established:
  • Base64 encoding is a methodology by which we can represent arbitrary binary data (an image, in our example) as a string of ASCII characters.
  • The 64 characters used when base64 encoding are a subset of the full ASCII character set. In our case: A-Z, a-z, 0-9, +, and /.
  • 64 characters can be neatly represented by a block of 6 bits.
  • When base64 encoding, the binary source data is broken into 3 octet blocks (24 bits) which is then parsed as 4 sextet blocks (also 24 bits); 24 being the first common multiple of 8 and 6. 
And our encoding key looked like this:


So far we’ve been using a contrived example – a world with no digital communication – in an attempt to remove the contextual complexity of base64 encoding, concentrating on the essence of subject instead. But this only takes us so far. Let’s take a real world example of where base64 encoding could be used: embedding images in XML.

Occasionally, it may be useful to be able to create an XML document which contains images – not references to images stored elsewhere, but the actual images themselves. I’ve seen this kind of thing done when archiving orders in an e-commerce context: a business wishes to archive orders made over five years ago, however, it also wants some reasonable level of access to that data should a pressing need to retrieve it arise.

One approach to take could be to create an XML document for each order, one which contains a complete record of the transaction: top-level order details, items details, invoice address, delivery address, etc. All this is relatively straightforward. But the company may also decide, for completeness sake, that they wish to store a copy of the primary product images alongside the order. This causes a problem for a developer who doesn’t know about something like base64 encoding. For one who does, it’s fairly trivial. It could look something like this:


Here you have co-opted a medium which is designed to carry text to also carry binary data, although it doesn't even necessarily know it! Those characters between the image nodes are just text characters as far as the XML is concerned. But if the reader knows they're base64 encoded binary data, then the images can be retrieved.

Tier 5 will look to fill in a few of the gaps we've glossed over, briefly give a couple of other examples, and point at some further reading.

Next Understanding Base64 Encoding #5

No comments:

Post a Comment