Measuring programmer productivity is futile.

Published

January 6, 2015

(I’ve typically posted long-form entries but so infrequently … )

The arguments and debates about 10x productivity in “programmers” rage on (this time to defend/reject H1B visas). This debate is doomed to never be concluded. I think the reason why is nicely captured in Andrew Gelman’s post on p-values: they work best when noise is low and signal is high, something which can never be the case when we talk about productivity. As he says,

If we can’t trust p-values, does experimental science involving human variation just have to start over?

Given a random sample of (let’s say) Microsoft software developers, can you devise a test that would show the statistical differences? Are you convinced you would have high power? A big effect size? One person online (via HackerNews) says it is about tool competence. But the recent Latex/Word study leaves me doubting even that conclusion (although I have trouble with that study too, which just reinforces my overall point).

More importantly, I think this calls into question almost any controlled experiment in software engineering. Short of replicated results, I’m skeptical the information content is very high. Instead, I would like more qualitative research. Why do people say there is this difference? What traits are important? Can they be taught? How do we share productivity improvements? These questions seem much more important than trying to attach a p-value to whether one group is better than another.