Thursday, 26 January 2017

Understanding Base64 Encoding #2

Tier 2

If Tier 1 is about establishing the scantest familiarity with a subject – hoping to avoid looking glassy-eyed whenever it’s mentioned – then Tier 2 is about beginning to understand the topic; perhaps a cursory interest has been kindled and/or you’d like to be able to do a bit more than just identify the subject by sight.

To that end, one of the first questions I like answered when getting to grips with a new topic is “why does this thing exist?”. I’m going to begin to attempt to answer this question for base64 encoding by giving a disingenuous, rather long-winded, somewhat tortured analogy. I promise I’ll make amends in later tiers.

Imagine a strange parallel universe in which inter-computer communication has never happened. The parallel universe’s computers work in the same manner as ours, just no one ever bothered to invent the technologies which allow computers to communicate: no Internet, Bluetooth, portable digital devices – no floppy discs, CDs, DVDs, USB drives, etc. Essentially, each computer is a lonely digital island.

In this reality, if I create a super-cool bitmap image in the alternative universe’s version of MS Paint, you’d physically have to come over to my house and look at it on my screen; I have no digital means by which to transmit the data to you. To add to my misery, you live on the other side of the country and, despite my enthusiasm and entreatment for you to come visit, you’re not going decamp for the sake of one bitmap image.

So, scratching my head, I begin to think about the problem and in a fit of pique I come up with my first – and worst – solution to this problem: I’m going to write the binary code out on pieces of paper and send the code in the post to you. Every single one and zero. And then when you receive the paper full of bits you can key them all in at your end and recreate the image. Perfect!

However, I soon find, even if I only wanted to send the small 10 x 10 pixel image from Tier 1 it’s ~1000 bytes. And given there are 8 bits in a byte that’s ~8000 ones and zeros I’ll have to transcribe! I’m not so keen on this and imagine you’re even less keen about having to key 8000 binary digits in at your end. We need a shortcut.

I’m convinced the part about mailing you the code still has merit but I’m also certain that raw ones and zeros aren’t the answer. What I need is some sort of shorthand way of representing the same raw binary data; I need to encode it.

This is the essence of the problem base64 encoding looks to solve: how can a text-based medium, in our case pieces of paper, be re-purposed to effectively transmit binary data.

Tier 3 will, hopefully, begin to straighten this all out…

Next >> Understanding Base64 Encoding #3

Tuesday, 24 January 2017

Understanding Base64 Encoding #1

Disclaimer: I’m writing this blog post in an attempt to present a tiered approach to learning a new subject. It's also to solidify my understanding of the topic of base64 encoding as well as to act as an aide-memoire. I’m not presenting this information as infallible fact.
                                                     
Preamble: Personally, learning a new programming concept (or any complex topic for that matter) requires me to take a very particular approach if I want gain and maintain a comprehensive understanding of it, and I don’t see resources which represent and facilitate my learning process very much in evidence.

Learning for me involves moving from the general to the specific and for my sources of information to assume as little as possible while establishing context and purpose quickly. Producing this type of learning resource usually manifests in tiered levels of explanation. To my mind, Tier 1 is where the biggest shortage of good resource on a topic generally is. It should be what the opening paragraph of the Wikipedia topic strives to attain: a succinct and clear overview of the topic that someone immersed in the relevant field can read and feel more illuminated right away. Further tiers of explanation should elaborate on what previous tiers have established.

Let me try presenting the first couple of tiers for base64 encoding in the style I'm talking about.

What I assume: you have a programming background and that you’re looking to better understand base64 encoding.

Tier 1:

Okay, Tier 1 explanations might be relevant if you’ve just heard someone say “base64 encode” in a meeting and you’re thinking “I should probably have some idea what on Earth they’re talking about”; you’re googling about for five minutes to see if you can shed some light on the topic.

Wikipedia’s Base64 opening salvo is: “Base64 is a [...] binary-to-text encoding scheme that represent[s] binary data in an ASCII string format”.

This isn’t particularly illuminating on its own but there are a couple of clues in there: it’s something do with binary data being represented as ASCII characters.

Warning: rather unhelpfully, it is possible to immediately jump down the rabbit hole with base64 encoding and you may be thinking, as I was, “hang on a minute, everything eventually boils down to binary data - including ASCII characters - so that seems like a bit of a nonsense”. Or perhaps you have come across an example whereby someone is showing you how they converted a sentence (one string of characters) into base64 encoded text (another string of characters) and are thinking “what could possibly be the value in that!?”. If you’ve done either (or both) of these things, please, for the moment, put those thoughts on ice – don’t worry, I’m with you comrade, I feel your pain.

A concrete example might help. Imagine I have an 10 x 10 pixel jpeg image (some binary data) and I want to represent it (for some ungodly reason) as ASCII characters. Up steps base64 encoding. In fact, here is a base64 encoded 10 x 10 jpeg:

/9j/4AAQSkZJRgABAQEAYABgAAD/4QBmRXhpZgAATU0AKgAAAAgABAEaAAUAAAAB
AAAAPgEbAAUAAAABAAAARgEoAAMAAAABAAIAAAExAAIAAAAQAAAATgAAAAAAAABg
AAAAAQAAAGAAAAABcGFpbnQubmV0IDQuMC45AP/bAEMAAQEBAQEBAQEBAQEBAQEB
AQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB
Af/bAEMBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB
AQEBAQEBAQEBAQEBAQEBAQEBAQEBAf/AABEIAAoACgMBIgACEQEDEQH/xAAfAAAB
BQEBAQEBAQAAAAAAAAAAAQIDBAUGBwgJCgv/xAC1EAACAQMDAgQDBQUEBAAAAX0B
AgMABBEFEiExQQYTUWEHInEUMoGRoQgjQrHBFVLR8CQzYnKCCQoWFxgZGiUmJygp
KjQ1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoOEhYaHiImK
kpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4eLj
5OXm5+jp6vHy8/T19vf4+fr/xAAfAQADAQEBAQEBAQEBAAAAAAAAAQIDBAUGBwgJ
Cgv/xAC1EQACAQIEBAMEBwUEBAABAncAAQIDEQQFITEGEkFRB2FxEyIygQgUQpGh
scEJIzNS8BVictEKFiQ04SXxFxgZGiYnKCkqNTY3ODk6Q0RFRkdISUpTVFVWV1hZ
WmNkZWZnaGlqc3R1dnd4eXqCg4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1
tre4ubrCw8TFxsfIycrS09TV1tfY2dri4+Tl5ufo6ery8/T19vf4+fr/2gAMAwEA
AhEDEQA/AP5/fg7rH7M+nWv7EPjn4nfsT/ACH49fsz/D/wDZbvPgR+xr4R+Hf7Qn
x1tf+C7/APwvX9o7x3D4p8R+Ivi14A+I3jn4ZeAviB8FbK8Xw/P4K8QeGfil/bn7
R3hz4i/sufEfwBqngX4ZaX+xF8FPxB+LFn/Z3xT+Jen/APCN/D/wd9g+IHjKz/4R
H4T+Nf8AhZXws8K/ZfEepQf8I58NPiL/AMJ/8V/+E++H+h7P7M8G+Nf+FpfEr/hK
vDlrpuu/8J/4x+3/APCRaj6B4A/ax/an+FHws8a/Av4W/tLftAfDX4JfEr/hI/8A
hYvwd8AfGT4i+DvhZ4+/4THw5Z+D/F3/AAmvw+8O+I9O8JeKv+Eq8Jadp/hbxH/b
ukX/APbnhyws9E1P7VplrBap8/0Af//Z


Sceptical? If you copy that text and save it into a new text file (called, say, “encodedJpg.txt”) and then navigate to the folder the file is saved in from a Windows command prompt, you can run the following command certutil -decode encodedJpg.txt 10x10.jpg and you should see the jpg recreated.


You can turn the jpg back in to the text above by running the alternative certutil -encode "input" "output" command.

And that’s Tier 1. For the moment we’re not going to worry about the mechanics of the operation, it’s enough to know that base64 encoding changes binary data into text that looks like the above. Why you'd want to do such a thing and how it's achieved are Tier 2 explanations. N.B. the binary data doesn’t have to be a jpeg image, it could be anything: an executable, a zip file, a Word document, etc.


Monday, 28 November 2016

Visualising Sorting Algorithms

Stumbled across a few really good videos for visualising sorting algorithms. I've seen a few which show the sorting happening but not the logic behind it. I think these convey both aspects really well.





Friday, 17 June 2016

Stack and Heap Refresh


Brilliant refresher on how the stack and heap are used. Also gives an insight into when and why variables are and are not thread-safe.

Friday, 26 February 2016

Commenting Code

My rules-of-thumb for code commenting:

Don't. If you find yourself writing a comment ask yourself "why am I writing this comment?". Most of the times I've found myself writing a comment is because the code isn't self-commenting: the method/class wasn't small enough; naming - at class, function and/or variable level - was poor and obscured intention; I'd written code which could have been written in a more expressive fashion. I was adding a comment to something which should have been extracted to its own method.

Why not What. As far as possible it should be patently obvious what your code is doing, even at a glance. You may not immediately know how it goes about it, but you often won’t need to: a properly named function in a codebase you trust will tell you what it’s doing; a properly name variable will tell you what is it and what it’s being used for. If the domain logic is a bit peculiar, it might be worth documenting why the thing is being done. But be careful and always take a minute to reflect on whether the domain logic truly is peculiar or whether you are just doing something a bit odd.

Monday, 1 February 2016

Is TDD Dead?

A brilliant and informative discussion for anyone who's interested in TDD and its utility: https://www.youtube.com/watch?v=z9quxZsLcfo

Wednesday, 2 December 2015

oAuth2: A Conversation

I sometimes try to view protocols as conversations between actors in order to aid my comprehension - the anthropomorphising of computer interactions, if you will. I imagine oAuth2 to go something like this (in the context of a web server)...

The actors:
  • You
  • A Desired Service (ADS) - a service you'd like to use
  • oAuth2 Implementer (oAI) - a service you've trusted with your details
You to ADS:  Hello, I'd like to use your service

ADS: Okay, in order to use the service I provide I need you to create an account. I can make this easier for you if you already have an account with (Google | FB | Twitter | etc.) - someone who already knows the information I need to know.

You: I have a Google account, we can use that.

ADS: Cool, in that case I'm going to send you to over to Google to login and they'll send you back to me when you're done.

ADS to oAI: Hey, Google, it's ADS, I'm sending you someone and I want to know their email address, name and phone number. Send them back to this address when you're done.

 ~ you arrive at the oAI (Google)~

aOI: Okay, so who are you?

You: I'm me, I'll login to prove it.

aOI: Hello You. The service that sent you here wants to know your email address, name and phone number, is that cool?

You: Yes, that's fine.

aOI: Alrighty.  When ADS registered with me they specified after people have logged in successfully and agreed to the things it wants access to, there are a predefined list of URLs I can send you back to, of which https://ads.com/oauth2-return-page, which arrived alongside you, is one. I'll send you back there with this authorisation code which ADS can exchange for an access token in order to ask me about your email address, name and phone number.

~ you arrive back at ADS ~ 

ADS: Nice to see you again. I can see you've logged in with Google successfully. I'll just use that authorisation code to request an access token which I'll use to request your details, then I'll create you an account.

ADS to oAI: Hey, I've got this authorisation code, can I get the associated access token.

oAI to ADS: Sure, here you go.

ADS to oAI: Hey, I've got this access token. Can you tell me the email address, name and phone number associated with it?

oAI to ADS: Yup, here you go.

And that, crudely, is how I understand oAuth2 works when web servers are talking to each other.