Monday 17 September 2018

Understanding Streams (in .NET) #2

In Part 1 we looked at streams from a conceptual point of view. We learnt that streams are an abstraction over moving data from point A to point B. A very simple example of reading from a stream can be used to demonstrate this:

The GetStream() method returns a stream object from which we can read bytes until we are told there are no more bytes to read, which is indicated by -1 being returned. The stream abstraction is already working for us here as we've no idea - and potentially don't care - about where the data is coming from: we can simply keep asking for data until we're told there's no more data to be had.

To peek behind the curtain a little here is the GetStream() method:

All I'm doing here is converting a string into an byte array where each byte is the ASCII representation of a character in the string. I then create a new MemoryStream object passing the byte array in to the constructor. Through the power of inheritance and the Liskov substitution principle we can treat the MemoryStream as it's parent Stream object.
On its own this doesn't seem terribly useful. But the data in the Stream doesn't have to be from an in-memory source. I could change the GetStream() method to the following and still read from it in the same way, even though the data now exists in a file:

I could even be reading from a stream whose bytes come over the internet:

These examples are a little contrived and not awfully useful as all I do with the byte I've read is write it out to the console and move onto the next one. That being said, bytes are the essential nature of all data, so we have an solid starting point to do more interesting things...

Friday 14 September 2018

Understanding Streams (in .NET) #1

I'm going to attempt to explain streams - with C# as the example language - using the same tiered approach I used to explain base64 encoding previously. Caveat emptor: this series is, in part, about getting the topic straight in my head, so please don't take anything here as gospel.

Tier 1.

Current Understanding: 

You may have heard someone talk about "streaming data" or "writing to a stream" - perhaps you've even used the term(s) yourself - but you're only, at best, dimly aware of what it means.

N.B. If you have a greater understanding than the above it may make sense for you to skip over this tier.


Moving data about is useful and we do it a lot! You requested the movement of data by asking your browser to display this website: a Blogger server somewhere has this webpage (or knows how to assemble it) and you asked for a copy of that data. Ultimately, that involved the transmission of binary digits (bits) but that's rarely the level at which anyone wishes to work. To avoid doing so we invent higher level models of abstraction to help us reason about and perform such tasks. Streams are one of these such abstractions.

Terminology & Pre(r)amble:

I completely agree that one of the two hard things in computer science is "naming thing". Two Hard Things

However... with that said, as an analogy I'm not sure a stream is the best one for thinking about this topic. It's certainly not how I think about it. The term"stream" seems to have been chosen to convey a flow of data - "river", "brook", or "creek" could equally have been used. And to that extent it has utility, however, I'm not sure its explanatory power holds out as one explores the subject further.

I've occasionally thought a more instructive way of thinking about reading from a stream would be drinking from an unseen cup via a straw. Here, you are sucking up liquid and don't know how much is left until at some point you go to suck up a mouthful and there's no liquid left. This is how reading from a stream works: you don't how much data there is to be read until you go to get the next chunk of data and there is none. Strictly speaking, this isn't always the case - we'll cover that later.

The term "stream" is both a noun and a verb in computing: you can have "a stream" of data; I can "stream data to you"; you might be "streaming data from me".

Why should I care?

Streaming, at it's most fundamental, is about moving data from one place to another; streaming is taking data which exists at A and moving it to B. It's a concept common to all programming languages, and in computing more widely, so understanding it has broad utility.

So, if you want to move data about as part of your application it may well help to know about streams!