After writing and uploading the previous post about a computer programmer stylized rendition of “The Three Bears” (https://www.aniamosity.net/if-authors-wrote-stories-the-way-programmers-write-code/), I spent a good deal of time reflecting on what I had done. And there ended up being some interesting aspects to the process I went through that might cast some light on what we do as programmers when writing and refactoring code.
So I wanted to dive into that a bit…
The first question might be, “What was the process you used to arrive at that?” And there were different aspects to that.
High-level Structure
The initial step was to look at the overall story and see what the meaningful chunks were. It ended up being roughly along paragraph boundaries, but not exactly. In fact, the initial paragraph made more sense to split into two, semantically, since they’re actually about different aspects of the story. (That could be considered a “bug” in the original story’s use of paragraphs.)
I actually think the high-level steps in Story give a pretty good overall sense of the progress of the story – that is, if you know what they mean. So that can be useful in computer code as well: by pushing lower-level details down into functions, you can allow someone to get a good sense of what a function is doing at a high level.
This raises an interesting point, which I hadn’t thought of before:
It’s easier to understand the lower level details when you know where they fit into the higher level structure.
Someone who has read The Three Bears, for example, can know exactly where this fits in:
define Sitting_In_My_Chair:
Someone's been sitting in my chair
They can see that small piece and understand its role in the overall story.
There is the counterpart to that as well:
It’s easier to understand the higher level structure when you know what the lower-level details do and where they fit.
Something like the overall structure of Story is only as clear as the step names can offer – and you can only put so much information into a name. That is one of the problems I have with the idea of “self-documenting code” as a sort of excuse for having virtually no comments in code: identifiers are of necessity limited. They can only contain and convey so much information.
However, once you know what they mean, then they can be good shorthand for things. Once I know what happens in Girl_Chairs, for example, I can just look at it as “the part where she interacts with the chairs”, and if I later want to find where the bears discover she has eaten the porridge, I can quickly jump to Bears_Food – once I know that that’s where it is. On the other side, if I know the overall arc of the Three Bears story, I could probably jump right there even if I had never seen this particular “code” before. I can map what’s in my head onto the story’s structure.
You can move your level up and down within the code. I think it could be argued that decomposition works best when the view level of the code goes down as you go down into sub-pieces and vice versa. Beware of decomposition where the result is actually at the same level. That can point to an arbitrary creation of concepts rather than a refinement of concepts.
Extracting Common Constructs
Moving on, another aspect of the “codifying” was to extract some constants from the code. Now, that wasn’t necessary, but it can have advantages later. It might be a bit silly to generalize “porridge” as {Food} or to allow the name of the girl to not be “Goldilocks” – “Ms. Locks”, perhaps? On the other hand, I have seen the bears named “Papa Bear” and “Mama Bear” instead “Daddy Bear” and “Mummy Bear”. By having constants outside of the main body that can be changed, then all of their references will change automatically, if so desired.
The structure of the story is the same, but minor details can be easily changed, on a whim.
This is the first part of what is typically referred to in software development circles as “DRY” or “Don’t Repeat Yourself”. Consolidating repeated values like names into overarching constants or variables (or doing so with bits of code into common functions) offers at least two advantages:
1) You can easily change the value of all instances of one of them at once by changing the higher-level definition.
2) By making them all refer to the same thing (for example, Girl for “Goldilocks”), you are saying, “These things are all the same.” That might seem obvious in this case, but there will be cases where that isn’t true. Having that additional clue when looking at the code makes it easier work with, because you know what is meant to be the same and what isn’t.
When refactoring, we need to differentiate between things that happen to be the same and things that actually are the same, especially when we consider coding them as the same thing.
Consider, for example, my injection of Bear_Scene to replace the three repetitive bear sections. On the surface, it seems reasonable: if you look at those sections, they are basically the same as each other, structurally, with just some minor differences in wording. However, I made a mechanical decision, which is that I would make them all be expressions of the same pattern simply because it worked to do so. I really don’t know if the author deliberately intended that they would be the same or should be the same or if it just worked out that way. In other words, I don’t know if the pattern I ascribed to them is a deliberate pattern or just something accidental.
That might seem like a very nuanced (and maybe pointless) point, as the code works, but when you’re working with software, the distinction in semantics can become important if things need to change later. By forcing the text to fit the pattern (and I sneakily did that by changing Daddy_Bear’s dialogue tag from “growled” to “said” in the chair section to make it fit – does that violate the requirements?), it then becomes much more difficult later to change things if, for example, we need to add an additional line into one case but not the others.
The pattern works while it’s a pattern. But if things need to change in one case, then the question becomes, “Do I need to remove this case from the pattern, or do I need to extend the pattern to cover this varying case?” And you can typically do it either way, though if you do the latter too much, it can lead to horribly complicated code with lots of exceptions and variability, trying to account for variations in a pattern that might not actually be a pattern anymore.
This is where it really helps to understand what the code actually means. But we can’t always have that insight, especially when it’s code written by others.
Objectifying the Bears
After some initial breaking down of what varied in the various scenes, I discovered I had a number of constants like “Daddy_Chair_State”, “Mummy_Chair_State”, “Daddy_Chair_Size”, etc. where all three bears had the same set, and I had unique calling cases for each bear. At that point, I saw I could invert things a bit by dividing and consolidating the constants into structures, one for each bear. Then the other chunks could look at which bear was in play and use its values. I could just pass the bear around instead of the values within, and the underlying chunk could pick out the part it needed.
So “Daddy_Chair_State”, for example, became “bear.chair_state”, where “bear” could be one of the Daddy_Bear, Mummy_Bear or Baby_Bear “things”.
This isn’t really “object oriented”, in that there is neither encapsulation nor even any inheritance. It’s really more “structured data”. In fact, I made a point of using “thing” instead of “object” (which had connotations) or “struct” or “structure” (which sounded techie and even language specific).
There is possibly more that could be done along those lines. But then, there’s a limit to the gains you make, and doing too much can lead to code being harder to understand, even if it “works”.
This leads us to some of the difficulties I noticed during this exercise.
The Difficulties with Compression
As I mentioned before, it’s easier to understand the lower-level pieces when you know where they fit into the higher-level structure. That is one reason why the person writing the code is in a better place to understand the decomposed, semantically compressed code, as they (at least when they wrote it) have the full picture in their mind of what it all means and where it all fits together.
Someone coming onto the code for the first time won’t have that advantage. And that is something I think we need to be aware of as programmers: that someone else won’t have the same mindset that we do, even if “someone else” is us 5 years down the road. (Though, in all fairness, I tend to find it easier to get back into a mindset I once had, even if I’m not in it at first when encountering old code.) It might make sense for us as the all-knowing programmer to keep breaking the code down into smaller and smaller pieces, as we know how they all fit and – more importantly – what they all mean. But someone else won’t, at least not at first. At that makes the code harder to understand, if the pieces become so small that they have little semantic information on their own, or if the divisions are along syntactic lines rather than semantic lines, where it becomes hard to work out what something actually means.
Take, for example, the Said_Food_Is chunk. That is exactly one line, and it’s used in exactly one place. That came into being because I originally replaced a few separate lines with that (doing a sort of textual replacement), and then later when I compressed the resulting structures using those lines into one thing, it became a single instance again.
The question is, “Is this chunk useful or does it make the code harder to understand?” Initially, it had a use, as it replaced several common sections. But I would postulate that, now that it’s back to a single use, not only does it not serve a purpose, but it makes the code harder to understand, as the name for it doesn’t add any useful information and it’s just another level of indirection. It becomes just another concept to have to deal with when understanding what is happening. The decomposition has gone too far.
What’s interesting to consider is how “helpful” decompositions differ from “harmful” decompositions. If you look back at the original Story breakdown, it felt “helpful” because it allowed us to operate at a higher level and gain an understanding of the code at that level, without having to plow through all the low-level details. It actually added information, by providing a structure that we might not have noticed otherwise. However, the Said_Food_Is chunk doesn’t have that benefit – it doesn’t take us up or down levels. It’s just a replacement with no value. It is introducing an extra step to go through, but it doesn’t offer any additional insight, whether it be structural or “these things are all the same”, which is what you get when replacing things used in multiple places. It’s barely a separate thought, and yet it’s trying to be one.
The Difficulties with Abstraction
I wanted to look at one more chunk, which is the Bear_Scene one. This is really a template to be filled in. And it works for what it needs to do. However, if you were to hand that to someone outside of the context of this code, it would be hard to get a good sense of it. I mean, you could see what it does, but you may not know exactly what it means. And this is something I have noticed often in code, which isn’t a 100% generality, but it happens often enough to make it worth watching out for:
While it can feel good to find patterns and generalize the code through common abstractions for those patterns, abstractions tend to be harder to initially grasp than concrete code.
Again, I wouldn’t say it’s generally true. Things like templated or generic containers, for example, have good semantics that make immediate sense. However, other abstractions – especially if they don’t have a unifying concept behind them – can be harder to grasp until they can be placed into context so their usage can be seen. We can extract the pattern out, but not all patterns have good semantics outside of the code that uses them, which would allow them to stand on their own in our minds.