# Analysis of findings

# Objective measurement is possible

Although the individual findings for code constructs are interesting, the main success of this experiment has been to show that it is possible to objectively measure code readability without the need for subjective opinion. Specifically, we have managed to detect statistically significant differences in the average time taken and accuracy between code snippets where the only difference was the use of a specific construct.

Various individual factors could affect the accuracy of prediction, such as the experience of the developer or their level of concentration. However, these factors would be expected to affect both snippets in a pair equally when averaging across all the data points. A similar case could be made for time taken to read the code. Individual developer causes, such as their proficiency reading code, would be expected to affect both snippets equally.

Another factor that could affect reading time would be the quantity of code contained in a snippet. This was not always identical between the snippets in a pair, as some of the constructs are more verbose than others. However, the differences in time taken were too large to be attributable to this, and often the faster snippet was the one with more code in it.

The results for operator precedence, for example, saw a significant improvement in speed and accuracy with the addition of just two parenthesis characters. This is highly suggestive that the differences we are measuring are attributable to the underlying readability of the code constructs themselves.

However, It is unlikely that these differences can safely be interpreted quantitatively. For example, our results for the order of if statements were:

Metric Positive first Negative first P value
Time taken (s) 17.3 19.3 0.042

While we can be confident that this indicates an increase in reading speed when the positive case is presented first, we cannot justify claiming an 11% increase.

# Specific recommendations

Some of the findings suggest possible guidelines for readability. For example, our most clear-cut result showed gains in both speed and accuracy when adding parentheses to expressions that depend on operator precedence. The readability gained by the addition of a pair of characters should self-evidently outweigh any aesthetic considerations or a desire for terseness. It is frustrating, therefore, that opinionated code formatters, such as prettier, insist on removing "unnecessary" brackets.

Equally, the results for the order of if statements and boolean algebra point to recommendations that promise readability benefits with little downside. If statements can be written with the positive case first, and boolean expressions can be presented in expanded form, without much effort and few negative consequences.

The results for pure functions and function extraction are harder to form into recommendations. It is widely claimed that these constructs improve readability, but in both cases, we instead found a speed overhead. It could be that the constructs have benefits that offset this overhead, but that cannot be measured within the limitations of our experiment. The contradictory result is interesting, and we feel that further investigation is warranted.

Lastly, the result for chaining methods is fascinating as it suggests a trade-off between speed and accuracy, with chained methods improving speed, and intermediate variables improving accuracy. This suggests that it might be sensible to combine both approaches; chaining methods for ease of reading, but adding occasional intermediate variables for clarity.

# Limitations of the methodology

The metrics we have used of accuracy and time taken are somewhat simplistic and unlikely to tell the full story when it comes to readability. For example, being able to correctly predict the output of a code snippet does not necessarily imply a full understanding of how it operates. Equally, the time taken to read code may not be a direct reflection of the ease with which it was read.

Another limitation of this method is the necessity for small, self-contained code snippets with a well-defined answer. These restrictions inevitably lead to code that feels contrived and unrealistic. Real-life code snippets would rarely involve calculations that could be done mentally and would be likely to have external dependencies. Our results might have been different had we been able to present realistic code.

There were also limitations on the types of construct we were able to test. Large-scale factors such as code organisation, consistency and familiarity, are thought to affect code readability. However, these factors are impossible to encapsulate within self-contained code snippets and are outside the scope of this methodology.

# Future research

There are various improvements and alternative approaches that could form part of a future iteration of howreadable. These include:

  • Drastically increasing participation. There are estimated to be over 20 million developers in the world, and in that context, the figure of 545 participants could be greatly improved.
  • Allowing participating developers to interact with and alter code rather than just read it. This could give a better indication of the maintainability of the code, which is ultimately the goal of code readability.
  • Directly measuring the readability of real-world source code. If sufficiently sophisticated searches could be developed, it might be possible to directly discover rules for readability by analysing the corpus of publicly available code on services such as GitHub. This would more closely map the kinds of techniques used in linguistics research.

# Conclusion

Because of the limitations of the methodology, and the lack of quantitative metrics, the results that we have presented are more indicative of the potential of this process than representing definitive readability rules.

We have been able to propose some reasonable recommendations based on these findings, and we have demonstrated that it is possible to draw conclusions concerning the readability of code snippets from direct observation of developer behaviour.

These results only scratch the surface of the potential descriptive rules that could exist for code readability. Importantly, however, they represent findings based on empirical data and are therefore more meaningful than subjective opinions based on preference or anecdotal evidence.