The Bystander Effect in Open Source

John Ohno on 2020-01-10

Big projects with thriving communities are how non-developers (and surprisingly many developers) picture open source, but it’s hard to square these kinds of projects with ESR’s idea that source visibility & the ability to submit patches results in a substantially greater number of fixed bugs. Certainly, these kinds of projects benefit core maintainers socially, and certainly, source visibility helps with organized audits & makes it easier for individuals to fix bugs, on top of making drive-by fixes possible, but this conventionalized structure does not optimize for drive-by fixes.

When a project is large — especially when it is non-modular or when modules have complex interrelations — it takes a lot of effort just to understand things like project structure, dependency graphs, and control flow. These things need to be understood to a certain extent in order to produce a bug fix. If a project is visibly maintained — when it has a named maintainer & a high commit rate, and especially if it has a community of active developers around it — there is the sense that fixing a bug is somebody else’s job. After all, the drive-by developer doesn’t already have deep knowledge of the codebase, while all these active contributors do.

On top of that, the drive-by developer isn’t necessarily aware of or comfortable with the social structure built around especially large projects: complex norms around code style and best practices (rarely shared between unrelated projects), formal and informal social hierarchies among developers, tacit knowledge that maintainer X is a jerk or caustic or that maintainer Y is generally OK but has a hangup about licenses or maintainer Z will reject any patch that has an enum in it due to early life trauma. Actually fixing the bug means learning all these things or dealing with a waste of time & effort on both sides as the results of not learning them get negotiated. So the drive-by developer submits a bug report.

But wait! It gets worse! Because some bugs are really features. One must have an even deeper understanding of the code base and the community around it in order to determine whether some apparent bug is actually an intentional behavior, or an obscure joke kept for lore reasons, or bug-for-bug compatibility with something else, or the best of several bad possibilities necessitated by structural or dependency decisions made early in the project’s development. Or, it may be a perverse manifestation of some author or maintainer’s personal philosophy. So maybe even reporting the bug is a waste of everybody’s time. Checking would involve trying to find similar reported bugs & reading the discussion around them, and the drive-by developer has better things to do than to read thirty pages of semi-related bug reports on the off chance that it’s been filed before and marked not-a-bug.

Reporting the bug might be a waste of time for other reasons as well. If I installed the application from my distribution’s package manager rather than building from source, it might have been fixed upstream years ago, or it might have actually been introduced by a distro maintainer. Checking would require building from source, which would require getting the right versions of dev packages for all dependencies & maybe building other things from source too, and the drive-by developer would do that as part of constructing a bug fix but is not willing to do that for a bug report.

The worst case scenario here looks like an apache project — the bane of all drive-by bug-fixes. Apache projects are typically former internal corporate projects that were at some point dumped on the apache foundation. Apache projects are big and tangled and have complex interdependencies. They often have decades of mailing list & bug tracker discussion, involving the same approximately one hundred core developers whose tangled interpersonal history is preserved in bureaucratic amber. Apache projects are highly actively maintained, but not by anyone you know, and you have a hard time imagining the mind of a being who can fully understand the class hierarchy of some of these projects. The tangled weeds of apache projects are full of strange counterproductive decisions — slow code, counterintuitive behavior — protected by a dense web of complicated dependencies and tight coupling between seemingly unrelated classes. No mortal drive-by developer can contribute to an apache project and live — or at least, that’s the impression any drive-by developer gets upon investigating whether or not to fix (or even report) a bug for about ten minutes.

It’s possible to have a full-featured and complex project that is inviting to drive-by developers, though.

A project that looks unmaintained will encourage forks (which can be selectively merged — you don’t need to wait for a push request, and can simply grab good-looking diffs). Seemingly-unmaintained projects also are poison to corporate users (who are looking for somebody to do free labor & who generally won’t contribute back changes made in-house), so while your code may get fewer users in total, those users it does get are more likely to be meaningful contributors.

A project that’s highly modular, and where the modules are simple and very loosely coupled, will be inviting to drive-by developers. Not only is identifying bugs easier, but the fact that identifying bugs will be easier is obvious before one even starts. There are fewer opportunities for complications like bugs-that-are-really-features, and those cases can be more easily explained.

If maintainers must be visible, a casual structure for talking to all the maintainers helps. Formalized bug tracking mechanisms lower the labor on maintainers to deal with large numbers of bugs by putting the onus on those filing the bugs to obey complex and often opaque rules. Such systems are absolutely necessary when you have hundreds or thousands of bug reports a day, but they make it impossible to say “hey, I noticed this weird behavior — do you guys already know about it?” in a way that doesn’t look like an atomized task on somebody’s plate.

A project that handles this quite well is my favorite linux distro, Lunar. Despite being a full-featured linux distribution, Lunar is really just a handful of fairly-readable short shell scripts; the maintainers & developers number less than ten and hang out on IRC. I do not use Lunar as my primary distribution (only because it lacks multilib support, which I very occasionally need for running binaries), but the environment and the community always feels like home, in part because I know that with only a little bit of effort I could not only have a complete understanding of the entire distro’s code & its policies, but get to know everybody involved on a first name basis too. This kind of situation is only really possible because Lunar is fringe and its user base small, but I’ve seen plenty of projects with substantially smaller user bases that have substantially larger barriers to entry for casual contributors.