Should we care about evidence-based software engineering?

test

evidence

software

theory

Published

April 22, 2010

Time for some contrariness. The current rage in the academic software research community is evidence-based practice. It’s in popular magazines, desirable in academic publications, and the subject of a new book.

Does it matter? On the face, one would say of course. Why would you make decisions ignorant of the facts? (set aside for now the reality that almost NO decisions in the world are made based on the facts!)

It would be nice if software researchers were in a position to present facts to people. In climate science, for example, the facts are pretty clear, and certainly much clearer than corresponding literature in software. That’s why Al Gore, among others, probably sees debates on climate change as pointless. But the utility of model-driven development, among many others, is very much worth debating. I think there are five reasons why we shouldn’t be too concerned about evidence in software development:

The field with a long history of evidence-based practice, and the most to gain from it, medicine, often doesn’t adopt the recommended practices, or the evidence chosen is irrelevant. Despitehand-washing or checklists being shown (proven?) to be very cost-effective practices to adopt, doctors still leave washrooms without cleaning their hands, and instruments still get left in patients. And in most software projects, there isn’t anything like that sort of liability.
People don’t understand statistical generalization very well. Is that new pill reducing my risk of heart disease 20% more than the other pill, or 20% more than a regimen of Big Macs? Was this experiment done with non-English speakers? There’s a lot more to it than running a few t-tests and calling it a day. See e.g. “Why most published research findings are false” or a series of critiques on fMRI studies.
Small results don’t say much. A lot of research is evaluated on small numbers of undergrads or focused on one particular organization (pdf). That evidence is useless to most developers. There is a paucity of in-depth, detailed case studies that generalize to meaningful theories. Personally I am in favour of a moratorium on experimentation in software research until more of these case studies are done. Unfortunately, the lure of the easy number is a Siren-call to reviewers and funding agencies.
SEMAT to the contrary, there is no good body of software theory that would provide explanatory power to go along with results. Without a theory facts are descriptive; with a theory they can be predictive.
It simply isn’t that important. Individuals and organizations do many things which research suggests is downright insane – like embarking on projects without clear requirements, or maintaining 30 year old mainframes – and get by. In fact, anecdotal evidence suggests that many excellent companiesstarted with poor practices, then refactored as needed. Probably, this is because evidence-based software development is a case of premature optimization. For example, despite reams of studies suggesting model-driven development is the way of the future, industrial adoption is underwhelming. Is it because they haven’t read the studies? Or that they evaluated the technology and concluded it wasn’t necessary? As academics, we tend to undervalue the benefit of anecdote and gut feelings. Most of the time this is probably correct, but only if we have evidence to support generalization to common scenarios. Most developers were so burned by the CASE tools of the 1980s that they have no interest in repeating the experience with UML.

I think my final point is that rationality is the exception, rather than the rule, in human behaviour. There’s no reason to lose any much sleep over the fact that industry isn’t following evidence-based software practices.

p.s. I’m a complete hypocrite with respect to experimentation.