Understanding Other People’s Code

If your thoughts automatically went to code reviews when reading the title, you are not alone. However, this post is not about code reviews, quite the opposite. I’m going to ask you to approach other people’s code without judgment, with the purpose of understanding, not evaluating. It might surprise you how difficult this is. But let’s jump right in!

So… You Got Someone Else’s Code?

Someone else’s piece of code. Even worse, thousands of lines, maybe hundreds of files of other people’s code.

When we read our own code, we have a mental model of how things are connected and how they work. When faced with “someone else’s code” we don’t have that. We are faced with pages and pages of code. And often written in a style that is dissimilar to our own.

This can lead us to think that the style of the code is the problem. That if only the style was “correct” (read: like my own) then it would be “easier” to understand.

I want you to put that aside. The fundamental problem with reading someone else’s code is the lack of a mental model. That is neither your fault nor theirs.

You might then turn to the lack of documentation (it’s funny how most programmers find documenting their own code to be a chore but can be furious at the lack of documentation of other people’s code).

Learning a code base or a module is, however, the perfect time to write documentation. Throughout the process I will describe, I encourage you to take notes and draw diagrams. If you later polish these up a bit, your learning process might actually yield a useful artifact: some documentation.

At many points in this process, you will probably find yourself not very confident in your understanding or even still thoroughly lost. That’s fine. Leave that part of the code, approach the code base from a different angle, and when you come back to that bit later, it will probably make more sense.

Before You Start

So before you even begin looking at a code base, I recommend getting as much tooling on your side as possible. That means downloading the code, getting it into a “smart” IDE that can make sense of it, try to build and run it (preferably in a debugger). If you can’t do all of this, do as much as you can. Some code is, unfortunately, very hard to run outside of its environment.

If you intend to make local changes to the code (which is pretty likely if you’re running it), and it doesn’t come with source control, I recommend putting it into a local git repo (git init .; git add *; git commit -m “Baseline”;). It will make git tools available to you when you forget what you’ve done and makes it easier to remove local changes.

10 Techniques To Understand Other People’s Code

How do you tackle this avalanche of files and lines? Code is not like a book, it’s not linear. You can’t just start at the beginning and read through to the end. It’s rather like many balls of yarn untangled on the floor. You need to find an interesting end and pull on it.

The first task is always to find the code that drives execution in the part of the application that you’re interested in. Using the image of yarn on the floor, you need to find the right end to pull on.

#1 Grepping

One way you will often do this, is to look for a string you can see from the outside, this could be in the GUI, in the command line options, in an error message, anything that shows from the outside of the application.

I call this “grepping”, but you will most often not be using grep, but rather the search function in your IDE, possibly the “usages” context menu item or Ctrl-clicking names to jump to their definition.

#2 Where is this button?

Let’s start with a button or another GUI element, preferably one with a string you can see. Grep for the string, if this is a localized codebase you will often find it in some localization mapping file where it maps to some constant value. So then you grep for the constant. Hopefully, you will find the button definition this way. GUIs are generally organized in tree-like structures where each widget has a parent (unless it’s the root), siblings and child widgets. So here we’d like to traverse the widget tree to the top to see the organization of the GUI around our button.

Now, if you haven’t pulled out your debugger yet, now is a good time. The button probably has an onClick handler of some sort. Try to put a breakpoint there. The debugger will show you two things: the stack trace all the way back to main and a runtime view of the button widget. The stack trace should reveal how events are dispatched, so look at the functions in the stack, copy the stack out and print it if you can. It will be useful as you learn more.

For now, however, use your debugger to traverse the widget hierarchy. As you are going up through “parent” relationships, write down any widget name you find until you reach a widget that does not have a parent. This is your root, often your window, dialog or page. The names you’ve written down will often make some sense when looking at the GUI.

#3 Following input events

In more integrated, possibly cross-platform or embedded, applications, the code might need to integrate a (possibly custom) GUI framework with the platform. In such applications, following input events can give away a lot of the underlying platform integration architecture.

A simple example is keyboard or mouse events, but things like focus events will be much more revealing of the low-level GUI and interaction design. This means following events from the moment they are emitted by the OS and seeing how they are propagated in the application and how unhandled events are processed. Focus events, for example, will demonstrate the mechanism for tracking the widget which currently has input focus.

In this same category, but even more advanced, you have rendering and graphics. How is painting to the screen handled? Although an advanced topic, investigating and documenting this architecture can form the basis of important features such as smooth animation, real-time rendering and low latency touch input.

#4 What do the tests do?

Integration tests or system tests can be extremely useful in understanding how the application is supposed to work, even how to run it properly. Looking at tests (and code examples for libraries) is a way to get a feel for the boundaries, main access points and the use cases of the code base. These types of tests have been called “runnable documentation” for good reason.

As you feel more confident in your mental model, writing tests is a good way to confirm your assumptions. This will often lead you to understand even more and will guide you for digging in more important places. Don’t feel like you have to keep those tests. It’s fine to write code just for discovery, and then throw it away.

#5 Refactoring

Another way to approach a code base is to code your way to understanding, by refactoring the code. I really recommend that you consider the actual refactoring as “throw-away”. It’s hard not to become attached to one’s refactoring, but I implore you to try.

There are many ways of doing this, ranging all the way from high-level architectural changes to style guided refactorings. Both of these extremes are, however, a bit dangerous since they tend to make one a bit arrogant and might make one blind to underlying reasons for why things are the way they are. I would not recommend sharing your refactorings with the projects maintainers, that might start you off on the wrong foot.

#6 Reading “main”

For a high-level overview of the execution of the application, a good place to start is “main”. “main” is in quotes because it might not actually be called main in your case. It is the function that drives execution of your module/program.

If it is the actual “main” it will often have your mainloop and event handling. Many frameworks will hide this from you, however, but you will see traces of the mainloop in event handlers and the like.

Assuming you have a “main-like” function, read it very carefully from top to bottom. Try to write down what seems to be important objects in use, important function calls. Have a look at these classes and try to write down in a sentence or two what their responsibilities are. Are there many objects allocated of this class or is this something that there is only one or a few of? Often you will see objects created here that are meant to last throughout the lifetime of the program. They are likely important. Try to see how they relate to each other, particularly “has-a” type relationships. Try to draw it out.

Now you should have an idea of some of the big players, but most likely you are seeing some function calls that seem like they hide the bulk of the logic, so the next step is to apply the same procedure on them. I wouldn’t recurse like this for long, because it can get confusing. Always try to go back to your notes and try to draw it out.

When looking at these “functionally important” calls you might see some objects being passed around. These often contain key information or represent central concepts in the application and are worth a second look. I would recommend that you include them in your notes and drawings.

#7 The graphical layout

In a GUI application, you might want to start with the main layout. In most GUI applications you will have a part of the code that decides how to layout the widgets for your main window.

This will often be connected to an explicit or implicit state machine. This state machine will often reflect a user experience situation, like “inputting new case” or “searching for open cases”, in which the main window will have a very different layout and look.

Finding the code that does this layouting and the state machine that decides which layout to use, will often yield some of the most central pieces to a GUI application.

#8 Runtime Investigation

If you are lucky then not only do you have the source locally, but you’re also able to run it. How to do that with an uncooperative codebase could be the topic of a whole other blog post, but I will assume for this section that you can.

Having the source, and being able to run it, opens up another level of tools at your disposal. Especially logging and the debugger, but also possibly the test runners. These can be used for passive analysis (reading, setting breakpoints, navigating), but I would recommend getting your hands dirty and making changes: add logging, add tests, add assertions and maybe be ambitious and do some refactoring. For many programmers, learning is best done by doing.

To go even further, trying to add a feature could be a great way to try to learn how this machinery works.

#9 Reading a class

Assuming the techniques above have narrowed down the focus to just a few classes, the next step is reading a class.

Before reading the implementation of a class, however, I recommend you study its interface. Start with looking at the classes it inherits from, or the interfaces it implements. This will often show you how the surrounding code views this class. You can grep for includes/imports, or use your IDE, to find uses of your class. It’s very easy to get lost, so take notes and draw it out.

When you have gotten a feel for how the surrounding code views this class, start looking at the public functions. The public functions will most likely be the command interface for your class. The private functions are usually utilities for these. Use your “main” strategy from before on the public functions and try to understand the flow.

#10 Retelling or Rubber Ducking

Using your notes and drawings you can now try to explain what you have learned to another person or write it down for a (possibly fictional) blog post. This will often reveal missing pieces and misunderstandings. This process is often called Rubber Ducking, but in my experience, a real person or a blog post is more motivating to explain to than a rubber duck.

This can also be the starting point of some documentation, and you’d be surprised by how grateful a project can be for some documentation!

Different is good

Now you might think that learning how to read other people’s code is not such a big deal. I sure did, when I started out. And truthfully, most junior programmers when they start their first job, have only really read their own code.

Often, faced with thousands, maybe millions, of lines of other people’s code, programmers have often labelled the code as “legacy” or “spaghetti code”, often coupled with dreams of “greenfield projects”.

Reading other people’s code is an opportunity to get to know someone else through their code. We all express ourselves through our programs. For fun, I’d recommend you pick someone you admire and get to know them through their work. A big part of this process is accepting people (and their code) as they are. Different is good. See this as an opportunity to learn techniques, get inspired and try to copy what you feel fits and would improve your own style and expression.

Finally, this is not an evaluation. What you think is easier to understand can be harder for others. It is a gift to get to know someone through their code, and remember: code is just a snapshot in time, they probably had plans they never got around to, or the requirements changed after the code was written. Be compassionate and be kind.

Code is never finished.

* * *

This blog post has been written by Vivaldi’s developer Patricia Aas and was first published as a guest blog on 6th June 2018 on Jonathan Bocarra’s blog, Fluent C++.

10 Techniques That Will Make You Understand Other People’s Code Better