On Comments in Source Code

From the perspective of a Software Engineer

BowTiedCrocodile

May 21, 2023

This is the full article from BowTiedRaptor’s collab substack post.

Data Science & Machine Learning 101

is posting parts of this article on his SubStack in collaboration with

BowTiedCelt

from Software Architecture.

The Purpose of Comments

If you want to find your shades of gray in the world of programming look no further than discussions over comments in source code. The spectrum of what is a good comments ranges from no comment is a good comment, to all our documentation is comments in source code. There is significant nuance here coming from the perspective of a Software Engineer who has maintained legacy code of 20+ year old apps, but also worked in modern, cloud first software with the latest features available for conveying the meaning of code to outsiders. The goal for this writing is to understand what comments are for, and to encourage everyone to write better and higher quality comments in source code. This will result in more readable code to fellow engineers, explain the context of your work, allow for high quality documentation, and most importantly make sure comments don’t backfire when maintaining software in the future.

Before we dive in that, we need to fundamentally understand what comments are for. Comments are a mechanism for the software developer to directly write their thoughts into the source code at a specific location to be read, generally trying to add more context to the code. Often they are used with a special syntax of “//”, “#”, “/*”, “;'“ and many more, depending on language. Comments are used in almost every language and come in a few forms as shown in C#’s format below, and all can be scrutinized together regardless of comment type and language specific comments.

// single line comment

/* 
* A Multi 
* line comment
*/

/// <summary>
/// Language specific documentation comment (Javadoc, C# XML docs, etc..)
/// </summary>

For the purposes of this article documentation via comments are all under the “readability” umbrella here, so it’s all fair game when discussing good and bad comments.

Comments in source code go back a significant amount of time from the days of assembly language when the developer was writing direct CPU instructions into memory with no assisted tools. This often required some form of notation since operating with bits and bytes directly is not a productive way to work. In fact punch cards can even be seen with writing on them to mark the various subroutines (early functions) needed to execute a program.

A program on a punch card with sections labeled, an early form of comments: Source - Arnold Reinhold

As you can see, one of the first uses of comments was to demarcate boundaries between different areas of concern of the code. Often these would be subroutines, special locations in memory where specific functionality would be achieved, such a sorting code, a mathematical algorithm, or many other functionalities. The value add here was to allow the programmer to easily find the location of specific code in a sea of unreadable code. The value add of the comment was to remind the reader of the code of specific contexts of the code. This is analogous to the modern standard of Public API’s should be documented, for which comments often suffice as seen in tools like Javadoc, the standard for Java documentation. See example below for proper style of API documentation.

/**
 * Provides the classes necessary to create an  
 * applet and the classes an applet uses 
 * to communicate with its applet context.
 * <p>
 * The applet framework involves two entities:
 * the applet and the applet context.
 * An applet is an embeddable window (see the
 * {@link java.awt.Panel} class) with a few extra
 * methods that the applet context can use to 
 * initialize, start, and stop the applet.
 *
 * @since 1.0
 * @see java.awt
 */
package java.lang.applet;

If you have not noticed, I am making a clear statement about the readability of the code being important. As a software dev, we spend significant amounts of time reading code on a daily basis. Often we are tasked with building features in existing code bases which means understanding existing functionality and also not breaking it. When I was working on older, more legacy products (20+ years of history) in a massive monorepo code base it was common to spend up to a whole day of work just reading and understanding the purpose of the existing code/functionality before I ever wrote a line of code. These deep and complex apps demand your understanding, or else bugs easily show up as unit tests are often not a luxury in legacy apps. The point is, comments affect the readability of the code, for better or for worse. Our goal as a software developer is to maximize readability of the code, which in turn means we need to write high quality comments that move our understanding forward, not backwards.

Think about how many times you’ve seen a 50+ line function and instead of trying to fundamentally understand the code, you immediately focus on the closet comment to understand what is going on. It’s a natural reaction of course, you’re hoping, praying, the author took the time to convey the entire meaning of the function in a neat summary so you don’t have to do the thinking. In essence, you want to be lazy, the natural state of a software dev, the people who get computers to do the work for them. This of course is the exact pitfall we run into with comments. Comments are a crutch to lean on, and when that crutch turns out to be rusted on the inside and we fall on our face, we lose time, effort, and our sanity trying to pick ourselves up. It is a right of passage to be tricked by a comment in the field, resulting in lost productivity, or much worse, bugs in production. After all, software is inherently a collaborative effort and almost certainly the code you are working with will not be written by you.

Even if the code was written by you it always takes time to refresh yourself on what context you were working in at the time when you wrote that specific code. Memories are faulty, and often that code you wrote 6 months ago is muddled by scope creep, requirement changes, current and future enhancements, and your own personal understanding of the way things should be at the time. Again, ensuring you write readable code makes this process much easier so that you can execute faster. But even so, often the author of the code themselves cannot recall that exact reason why they implemented their code in that way. If you can’t recall, what hope do others have?

Do not worry however, as we are in the crux of the battle here on readability. The fundamental issue of whether comments are good or bad is actually an argument over the best way to maximize the readability of the code. Think about it this way, if you wrote the code in a perfectly legible way to the reader, a comment would be completely unnecessary. That is due to a comment inherently being a failure to express yourself correctly in the code. We comments to clean up any misunderstanding, but as we will see, this ends up hurting is in the long run.

We will further explore this in the next section.

Section 2: Good Comments vs Bad Comments

Now that we’ve established what we are arguing over, it’s time to share what makes a good comment verses a bad comment. The ideal state for a comment is to enhance our knowledge of the code via readability, where the opposite state is true for bad comments.

Let me start with bad comments, since it’s easier to comprehend what goes wrong, and how wrong it can go. Let’s imagine a fairly common and simplified scenario of two devs working on a team. One dev writes some complex logic for implement a feature, write some comments to help readers out, tests and ships it, and management is happy. Down the road, months or even years later, the code has evolved significantly and a bug has been introduced in the feature. The initial dev is long gone onto a new company, and the second dev is leftover to fix the problem. The dev, with no other help, looks at the code, sees the complexity, and uses the comments from the previous dev to understand the code. They’re happy it exists, they go onto fix the bug, but the bug fix doesn’t work right. Management is mad since the fix is taking a while and production is having issues. The dev is under pressure and frustrated why they can’t trace the issue. They finally backtrack to areas of code they thought were ok to reevaluate their assumptions, and of course the comment they read was stale. It was out of date, never updated in the maintenance and enhancements that system had over time, and gave the dev a false sense of security. Once the code is fixed, the comment is adjusted, and life goes on.

As you can see, the comment that was trusted turned out to be incorrect which caused pain for the maintainer. We’ve all been there as devs, if things can’t be expressed clearly in code, then we use a comment. This is especially true for novice or solo devs who have not worked in a professional team before. The issue as we’ve seen is that comments can lie. Code does not lie, it does exactly what you tell it to do. We may not understand all the code, but as imperative languages everything we write gets translated to explicit CPU instructions. Comments on the other hand are up to interpretation. Communication breakdowns are all too common between people, and this extends to comments. In this instance, the comment was out of date, and since the original dev left the company there was no way to gain additional information besides the comment that appeared trustworthy.

When reading other’s comments and trying to fix bugs

Comments lie by either the author miswriting their intentions, or more often, by becoming stale due to the evolving system. Any system worth working on (meaning they employ developers) will be churning ahead, adding new code and new features on top of the old. This results in maintenance costs as code gets older, libraries get deprecated, functionality needs to be restored, etc.. and are always something that a software developer needs to consider when working on projects. Comments result in a hidden maintenance cost, since it doubles the effort needed to maintain software. You need to maintain the code and the comment over time, and you always need to make sure they are in sync with each other. You have married the code to natural language and now have to think in two domains on how to express something correctly.

This becomes far worse when you maintain other’s comments for which you are not the original author. There are times when I want to refactor some code, but a comment is prohibiting me from doing so as it is not aligned to the code. Now I need to decide if the comment or the code is wrong, or both. This results in less willingness to make changes since it’s easier to not fix what’s not broken. The example below is beyond silly, but it gets the point across.

// This multiplies a number by two
public double MultiplyByTwo(int input)
{
     return Math.Pow(input, 2);
}

// See the problem here? One of the code or comment is wrong. What is the right 
// choice? What was the original intention? Multiply by two or square the //number?

In another point against comments, modern IDE tools allow for code to seamlessly be mass updated with the help of modern compilers and language servers. You can rename a method used by 20+ code paths without a care, since the IDE handles mass changes like this. However this technology as great as it is, is still limited in ways to also updating any comments that come along for the ride. We have the tools to sculpt code as we please, but inline and multi line comments are still a step behind in terms of easy maintenance with our tools. More modern tooling like Javadoc and C#’s system have also integrated with comments to allow for them to be changed with code when refactoring, but this isn’t the norm I would say for most languages. Editing comments is still a manual burden for most people. Code has logic and order behind it, a massive syntax tree of the entire path your code can take, this can be traversed and edited comfortably within software. Comments don’t enjoy this structure as they are mostly arbitrary strings, rather than defined pointers to methods, variables, and classes. After all, comments are for humans, not the compiler (this doesn’t include comment based compiler directives like in Typescript). Given how powerful our tooling is not, it’s not an excuse to write code well in the first place with proper style, formatting, syntax, and naming conventions to fully express our actions in code.

Now that I’ve explained the philosophy around bad comments, let’s do the same for good comments.

The best comment you can make is the one that you delete right after refactoring for more clear code. You add some new functions with clearer names, you encapsulate busy and verbose logic in a class, you write some stand alone static methods to do computation, and you end up with code that looks more like a poem than a long differential math equation done in crayon. As you see, the best comment is really the one you don’t write, since you took the time to clarify in code than in comments.

Let’s look at a very simple example:

int age = 25;
string occupation = "Student";
bool isEligible = false;

// Check if the student is eligible based upon their age and occupation
if ((age >= 18 && age <= 30) && (occupation == "Student" || occupation == "Unemployed"))
{
    isEligible = true;
    Console.WriteLine("You are eligible for the program.");
}
else
{
    isEligible = false;
    Console.WriteLine("You are not eligible for the program.");
}

This code is fine right? And truthfully it is, but it can be better. We can refactor this to make the boolean logic a function, and then move the comment’s intentions into the code itself.

int age = 25;
string occupation = "Student";
bool isEligible = CheckEligibilityOnAgeAndOccupation(age, occupation);

if (isEligible)
{
    Console.WriteLine("You are eligible for the program.");
}
else
{
    Console.WriteLine("You are not eligible for the program.");
}

public static bool CheckEligibilityOnAgeAndOccupation(int age, string occupation)
{
    return (age >= 18 && age <= 30) && (occupation == "Student" || occupation == "Unemployed");
}

As you can see, the function name itself documents and explains context rather than the code. We’ve eliminated the comment that could code stale as eligibility could expand to add even more criteria in the future.

The second best comment is one where modifications of the code result in less readability than more, or when the coding language itself is not expressive enough to convey the meaning. The reality is comments will be needed from time to time, and that is OK. I am not against comments, but they need to be thoughtfully written in moments where context provided to the user cannot be done well in the code, or the costs are too much to refactor much.

I have a plethora of comments I’ve read, especially in legacy apps, that explain very obscure or highly technical issues with a useful summary in order to understand the decisions made. This is where I find significant value in comments. These are comments that fully explain the design and choices made by the following code. Often an implementation of a highly technical specification such as an RFC will add some details of the requirements. Our resident reverse engineer BowTiedCrawfish has a great example in the Win32 API, something that is decades old with significant amounts of undocumented functionality. They use comments as their documentation system to help further their knowledge of the codebase.

Additionally, comments are great when the syntax of the language or design of an API just inherently fails you. Languages with a long history often have leftover syntaxes, API, designs, and conventions that are now seen as crude or outdated, but they live on to maintain compatibility. A good example is this is formatting Date Times to strings in C#. They are many one character formatting’s that result in a completely different string representation of a datetime, so unless you memorize this table, you’ll have to refer to the documentation page, or ideally just an inline comment for easy remembering as shown below.

// Display using pt-BR culture's short date format
DateTime thisDate = new DateTime(2008, 3, 15);
CultureInfo culture = new CultureInfo("pt-BR");
Console.WriteLine(thisDate.ToString("d", culture));  // Displays 15/3/2008

// Taken from https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-date-and-time-format-strings#how-standard-format-strings-work

Another extension of this is using any form of math equation directly into a comment such that the code can mirror it. Ironically, software bends math to its conventions, which results in poor readability. Putting your math equation into a comment can make code significantly more clear, as long as you maintain the code well.

double area = Math.PI * Math.Pow(radius, 2); // Calculate the area of a circle using the formula A = πr^2

Regex patterns also fall into this use case as you can see below:

string pattern = @"^(\d{3})-(\d{3})-(\d{4})$"; // Matches a phone number in the format XXX-XXX-XXXX

Additionally, other situations when comments are useful are as follows:

Legal, Copyright, or Software License information, or metadata
When the author makes a technical implementation decision based upon their best understanding that impacts the code
In testing classes, explaining test cases and their intent
When there are unintuitive or unexpected consequences of invoking code
A crude form of linking your Issue tracking system to the code in legacy code

Your Guide to Good Comments

Now that the philosophy is out of the way, we can talk more concrete details. Writing comments depends on a lot of things. When writing a comment, understand the following items down to help shape your comment into the best it can be:

Personal preferences of comments
1. From this article.
2. Elsewhere from your favorite devs (Martin Fowler, etc..).
3. From your personal experiences.
4. Clean Code by Bob Martin
Language preferences of governing body for comments
1. Microsoft C# conventions.
2. Python PEP8 conventions.
3. Solidity.
Team / Org preferences
1. Specific to where and who you work with.
2. Start a discussion or read current code base to understand.
3. Ask for feedback on Pull Requests for your Repo.
Domain specific needs
1. Legacy code
  1. Often has poor documentation, so the comments become the de facto docs.
2. Technical requirements
  1. RFCs, ISOs, and other standards and specs to follow.
3. Hacking and Reverse Engineering

For Single Line / Inline comments:

Limit your single line comments to your overall line width limits. No one likes lines scrolling horizontally to read comments
Don’t combine complex code with a comment following it on the same line
- Apply the comment on the line before to make reading easier
Write comments immediately before or on the same line as the code to co-locate logically. Avoid trailing comments
Avoid comments that explain the obvious or are redundant
- // Check if the person is eligible
Avoid comments that are trivial
- // ex: this adds two numbers together
Don’t try to do art or fancy titles via comments
- ////////////// By BowTiedCrocodile //////////////

Multi Line Comments

These are not an excuse to write a story. Be concise and say no more and no less than what is needed.
Don’t add empty ones to pre-place them in code for future use. This adds noise to the document.
Be consistent with formatting the start of every line horizontally
Do use your language multi line syntax documentation of choice (Javadoc, etc..)
Do you the multi line convention when you have multi lines, not single lines.

Other Best Practices

Do write Public API documentation comments such as Javadoc
- See later example below.
Don’t write for non public APIs, non public methods and variables.
I prefer not to write the all caps, swearing, and other forms of comments that distract the reader
Avoid code attributions in comments, GIT handles that for you now
Avoid low effort or drive by comments.
- Think before you write any comment as it is a “tell” to write better code
Don’t write comments that explain the code incorrectly
- Break the code into smaller chunks, and explain the smaller chunks correctly instead
Don’t comment out code unused code, delete it!
- Version control can retrieve it
Don’t have your comments require maintenance from external programs
- Let Intellisense and Javadoc do the work for you
Avoid todo comments unless they are only transient within your unit of work. Use your favorite issue tracking system instead to maintain future work.

Additionally, it’s important to document public APIs, here is Microsoft’s guide to doing so:

For the sake of consistency, all publicly visible types and their public members should be documented.
Private members can also be documented using XML comments. However, it exposes the inner (potentially confidential) workings of your library.
At a bare minimum, types and their members should have a <summary> tag because its content is needed for IntelliSense.
Documentation text should be written using complete sentences ending with full stops.
Partial classes are fully supported, and documentation information will be concatenated into a single entry for each type.

The only advice I disagree with here is documenting certain public members, as many are very simple and obvious. Instead write a more verbose name that describes the member better as a first choice.

To wrap it all up, remember the following. When writing a comment, brevity is your friend. Be succinct and say no more and no less than is needed to add your context. Always ensure that when writing a comment to actually improve the readability of the code. Trust but verify comments as needed, and always place a premium on the readability of your code.

Happy Commenting!

Please subscribe to my Substack The Bit Shift if you appreciated this post!