by Patrick DeVivo
Kubernetes is a big project. Not only because it’s a big deal, but also in terms of its source code. At the time of writing, there are 86k+ commits, 2k+ contributors, 2k+ open issues, 1k+ open PRs, and 61k+ stars. This is accessible from the project’s Github page.
scc counts 4.3M+ lines of go source code (5.2M+ total lines), 3M+ lines of “actual” vs. 700k+ lines of comments. 16k+ files in total. This includes the
We’ve been working on a project that surfaces TODO comments in a codebase to help developers do basic project management workflows within that codebase.
We decided to point our little TODO finder at the Kubernetes source code to see what would turn up. Here are some of the results.
tickgit against source code from commit 9bf52c2. The CSV output was then imported into SQLite to run queries against. Note that the tool only finds TODOs in the tree of the checked-out commit; it will not account for TODOs that were added and subsequently removed. Therefore, the numbers reflect only the TODOs still “live” in the code, at the commit.
Totals (for 9bf52c2)
- 2,380 TODOs across 1,230 files from 363 distinct authors
- 460 TODOs with an assignee e.g.
// TODO (patrickdevivo) Fix the ...
- 489 TODOs were added in 2019 so far
- 860 days (or 2.3 years) is the average age of a TODO
- The oldest TODO is from Jun 6, 2014 (from “First commit”)
- The most recent TODO is from Dec 9, 2019
- This file has the most TODOs at 33
- deads2k has added the most (current) TODOs (git blame) at 147
- This commit added the most TODOs (that are still in the source) at 64
count,file_path 33,cluster/gce/util.sh 25,pkg/apis/core/types.go 23,staging/src/k8s.io/api/core/v1/types.go 21,staging/src/k8s.io/legacy-cloud-providers/aws/aws.go 20,staging/src/k8s.io/code-generator/cmd/conversion-gen/generators/conversion.go 20,pkg/apis/core/validation/validation.go 16,test/e2e/network/service.go 16,pkg/kubelet/kubelet.go 14,test/e2e/framework/util.go 14,pkg/kubelet/kubelet_pods.go
author,count deads2k,147 Clayton Coleman,105 Chao Xu,99 Dr. Stefan Schimanski,93 Jordan Liggitt,81 David Eads,60 Random-Liu,54 Wojciech Tyczynski,50 Yu-Ju Hong,43 Prashanth Balasubramanian,38
count,sha 64,6a4d5cd7cc58e28c20ca133dab7b0e9e56192fe3 19,e01ff1641c7321ac81fe5775f6ccb21aa6775c04 19,4fb28dafad121e163fa86dc90067ce3d14415811 18,adb75e1fd17b11e6a0256a4984ef9b18957d94ce 14,963c85e1c807efcdbb82dd44439dc3c55f6a0bfd 14,8b17db7e0c4431cd5fd9a5d9a3ab11b04e2f0a7e 13,f0f78299348afcf770d4e8d89dcea82f80811b28 11,d0b94538b9744d0c06df6ddec2604be168568f9d 10,f1248b9c829e225138ab6d6234221c63092f7592 10,cd663d7ad00937cffa8a09e4761acb95d34c89a3
count,year 34,2014 249,2015 523,2016 650,2017 435,2018 489,2019
To produce similar results, try
tickgit todos --csv-output to get raw TODO data. We used SQLite to query for the above summaries.
Conclusions and Questions
These results are from a fairly off-the-cuff look at what TODO comments in the Kubernetes source code look like. We get a sense of the top TODO creators, which tracks more or less with the top contributors to the project.
We also see that for “large” source code, developer behavior around TODO comments doesn’t seem to be out of the norm, there’s just more of it.
An important observation is that there are more TODO comments than there are Github issues. This is interesting, in that it indicates a significant amount of latent “work”…or to-do items, which are not easily accessible unless you spend time in the source code itself.
Core contributors likely have a good idea of their area of the codebase and strong intuitions about their own TODOs and “latent work.” This is fairly opaque to outside observers, though. Github issues (or other public ticket trackers) are more easily accessible to those not “in the weeds” of the project.
As most developers understand, software projects “live and breathe.” There’s frequent change, continuous improvement, constant imperfection and lots of discussions. Workflow and process are very important because good code requires continual reflection. We see a part of this in action through the use of TODO comments in the Kubernetes source. Without a benchmark, though, an average TODO age of 2.3 years does seem quite high. Those closer to the code will be much better able to pass judgment; perhaps it would be interesting to see how this source code compares to that of other big open source projects.
A more in-depth analysis of a codebase’s TODOs might involve a look at all of the TODOs in the history, not just the ones currently in the source code.
- What’s the rate at which TODOs are closed over time?
- What’s the average lifetime of a TODO comment?
- How do popular codebases compare to one another?
Does it Matter?
TODO comments typically cover the type of work that might be too small for a ticket, but important enough to note and describe in a code comment (though plenty of TODOs will reference issues/tickets). Since they are part of the code, they are often “closer” to the work that needs to get done. They are easy to add, but, it seems, just as easy to lose (there are 1.8k+ TODOs added prior to 2019 still in the Kubernetes’ source).
We hope that by creating a tool that surfaces metadata about code, we can make it easier for software developers to get work done, in projects of any size. Surfacing TODOs is just one piece of that.