Blog
Black box thinking: The importance of abstraction in technology
10 minute read
It's no secret that technology is complicated. In fact, few things are more complicated. Despite this, every day people all over the world are able to create bigger and better tech.
So, how do they do it? How can people operate within such an endlessly complex arena? Do tech experts know everything about all tech? No. They use abstraction to navigate and manage the complexity. The main way this is achieved is by organising complexity into black boxes
What is a 'black box'?
A 'black box' refers to a system, or component of a system which performs some function, but the way the system functions internally is unknown and invisible to the user. The user interacts with such systems based only on the relationship between inputs and outputs
Let's take a familiar example. Imagine a cab driver in their car: The cab driver may not necessarily understand how the car works underneath the hood, nor do they need to. All they need to know is how to provide input to the car with the pedals and steering wheel to make the car go where they want it to. In this way, the car can be thought of as a black box - the complexity of the engine, gears and steering mechanism are abstracted away from the user.
Now let's take the example a step further. Imagine there is a passenger in the cab - the passenger does not drive the vehicle, or even decide the route taken. All they know is that if they provide the input of telling the driver the destination (plus some money of course), the result will be that they end up where they need to be.
In our example, adding the passenger has introduced a third layer of abstraction. The complexities of the route planning and operation of the vehicle are abstracted away from the passenger by the driver. The complexities of how the car makes the wheels go round and steering work is in turn abstracted away from the driver by the car.
What does this have to do with tech?
These concepts of black boxes and abstraction are fundamental to almost every facet of information technology. Lets use memory management in a computer as an example: You've probably heard of Random Access Memory, or RAM for short - it's a hardware component which provides short term memory needed for programs to run. It comes in modules of varying sizes, often in multiples of 4 Gigabytes. RAM stores 1s and 0s using transistors and capacitors, and includes redundancies to account for hardware failures and mechanisms for detecting and correcting corrupted data in memory. When you plug this RAM module into a computer, The computer's operating system (OS) becomes integrated with the RAM module. As far as the OS is concerned, there is a continuous and reliable set of memory addresses: From 0 up to the maximum (depending on the size of the memory).
The underlying mechanisms of error correction and redundancy in the RAM module are abstracted away from the Operating System
As memory addresses are allocated and de-allocated, the OS uses clever algorithms to decide which physical addresses to allocate to different processes in order to use the physical address space most efficiently. This results in the memory which is allocated to a process being split into pieces all over the physical memory space, which would not be easy for an application to keep track of! To simplify matters to the processes it manages, the OS abstracts away the complexities of physical memory address management by assigning a continuous block of memory to each process, for example 1GB large - this is known as the process's virtual memory. Under the hood, the operating system keeps a mapping of memory addresses in the virtual memory to the actual physical memory addresses they correspond to, but the process is able to treat its assigned memory as a continuous block of addresses. Nice!
How does abstraction relate to writing code?
You may have seen programming languages like Python or Go being described as 'high level' and languages like C being described as 'low level'. Does this mean that high level languages are in some way 'better' than lower level languages? Not at all. In this context, higher and lower refers to the degree of abstraction which the language provides to its user (the developer). In general, lower level programming languages allow the developer's code to control the underlying resources such as memory more directly. This has the advantage of allowing the developer to potentially write code which makes more efficient use of the resources (provided they write good code). On the flip side, this not only means that the developer can manage resources effectively, but that they must manage them effectively. Badly written low level code can lead to many difficulties which can be very hard to solve, and in the past have resulted in catastrophic failures in safety critical systems such as aircraft control systems and medical treatment devices.
In many programming languages such as Go (high level) and C (low level) a build tool for the language is a piece of software which takes the source code written by the programmer and converts it into executable code which can be run by the computer. We can indeed think of the build tool as a black box, as it is a system which takes some input, ie some text which must obey the rules of the language, and which describes the behaviour required by the programmer and outputs 1s and 0s which the computer hardware can run. The build tool abstracts away the details of the operating system on which the program is running so the developer can be concerned only with writing sensible logic. As you may have already guessed, with high level programming languages there is a higher degree of abstraction away from the underlying resources than in lower level languages.
Let's link this back to our discussion on memory management - remember how we said an operating system assigns a continuous block of virtual memory to a process? In the C language, there is a low degree of abstraction of the memory management by the C compiler - the programmer's code directly interacts with this virtual memory space, and is responsible for manually allocating and deallocating memory in the space to store variables and data structures using functions such as malloc and free respectively. The programmer is responsible for managing this memory explicitly, and this can lead to more cumbersome code to achieve a given task. For use cases where the programmer needs to implement more abstract and complex logic and where such fine-tuned control over the memory management is not as important, a low level language like C may not be the best choice.
So, what about high level languages? Taking the Go language as an example, The build process still takes the code written by the programmer in the Go language, and spits out bytecode which the machine can understand. The difference here is that the memory management is abstracted away from the programmer - they can define and use variables in their code without a second thought as to how the underlying data in memory is being managed. The build process enriches the resulting machine readable code with clever memory management mechanisms such as 'garbage collection' to do the heavy lifting for the programmer. The upside to this is much more readable code which can be written more rapidly, at the expense of a high degree of control of how memory is managed which may be desirable for some use cases.
What are some real world examples?
Phew. I realise the last couple of paragraphs may be quite dense, so lets frame the concepts with a couple of real life examples of when using languages with high or low degrees of abstraction would be appropriate.
Let's say you are creating a web server to distribute content, which needs to fetch and process data from different sources and handle multiple users at once, what do you think is more important; having fine grained control over how the variables' memory is managed, or being able to quickly write readable code which correctly performs the required tasks? If you were to opt for the second choice, you'd be with the majority.
Web servers usually run on relatively powerful hardware, so we are probably not concerned with squeezing out every drop of performance with finely tuned memory usage. What is far more important is being able to express more complex functionality in a readable way that can quickly be built upon and improved. For these reasons, using a high level language such as Go or Python would be the sensible choice. Whilst it would absolutely be possible to achieve the same functionality with a low level language like C, the slow development speed and the resulting clunky, hard to maintain code would probably equal some very unhappy developers.
Now consider a completely different scenario. Imagine we need to use a tiny computer (microprocessor) embedded in a thermostat to control the temperature in a house. The microprocessor is very low capability - only 4 Kilobytes of RAM, and needs to continually collect input from the temperature dial and thermometer, then output adjustments to a heater until the measured temperature matches the desired temperature. In this case, the need to implement relatively simple logic, whilst making best use of the precious little memory available means that using a language like C is the better choice - the memory management included when using high level languages would simply not be efficient or predictable enough for this use case.
I hope this post has given you some insight into how crucial abstraction is to navigating the complex space of technology. The examples discussed here however have hardly scratched the surface - such patterns manifest themselves in all technology, and conceptualising problems at the appropriate level of abstraction is a core part of the problem solving thought process used by technologists everywhere.
How does HighLowFlow teach abstraction?
At HighLowFlow, we believe that abstraction is one of the most important skills for success in tech, and as such, the learning experience has been designed with abstraction front and center. Immerse yourself in our interactive simulations and you'll see all different kinds of tech in action at multiple levels of abstraction. Learn how web servers work as part of a larger architecture and then uncover the inner workings - seeing how the internal processes work to perform a wider function. Watch how data moves between different networks on the internet towards its final destination, and then dive deep into how messages are passed between routers in an ISP network. To read about the philosophy and teaching strategy of the wider platform, click here
Written by Michael Cleary