Semantic Werks

Thoughts on people, machines and systems.

Data and science in enterprise computing

with 3 comments

Tim Bray had a great post recently on the ‘software crisis’. That wasn’t his phrase but it is his subject.

I challenged his conclusions on the basis of poor information. I said:

@lemire too bad @timbray’s post is completely lacking in empirical data (consultants  don’t count).

To which Tim replied:

@alex77 Follow some of the links from my piece. Tons of data. Also, get any COO drunk & you’ll hear.

And I finished with:

@timbray anecdote is not the same as peer-reviewed data, sadly. Software field is light-years from even being able to have a “ClimateGate”

So what did I mean? And what did he mean?

Hacker science

There are a lot of Tims in the industry (and this is a good thing!). For example, his Wide Finder project is a classic example of ‘tinkering’ leading to innovation. I think the best way to characterize his work is engineering science. He starts with an itch, and tries to figure out the best way to scratch it. Along the way, we get new understanding of the problems (e.g., how can non-experts use Clojure effectively) and some data on what works and what doesn’t (in the ‘real world’). This is very similar to the development of the airfoil, and this stream of science is well described by Vincenti in “What Engineers Know and How They Know It”.

But this isn’t enough. Vincenti makes clear that further airfoil improvements come from a good understanding of the physics underlying lift and computer modeling, not iterative tinkering (although there are clear parallels). Similarly, while the Wide Finder project is a great way to start understanding the theories and constructs involved in concurrent programming in large-scale problems, I’m not convinced this is sufficient. What we would need is studies of many programmers working on many different problems. For example, how would the C solution work with non-expert C programmers? Do the concepts of Agents in Erlang make intuitive sense to most people?

Standard of evidence

These questions cannot be answered by individual tinkering. They require larger scale studies and more thorough investigation. But unfortunately, there is a trend in the software blogosphere to use one or two data points as solid evidence that “C sucks” or “Perl is unreadable”.

But why would we accept this standard of evidence for programming, but not for, say,  climate change? In the same way, can we really base conclusions about the state of IT on the musings of some influential bloggers, industry whitepapers, and self-serving consultant reports? Let’s imagine the same situation prevailed in climate science. Then we would have a few people commenting on the basis of anecdotal encounters (“boy, it sure is cold in Toronto this winter!”), Shell Oil writing reports on the oilsands (“only 100 hectares are polluted!”) and a climate mitigation consultant encouraging mitigation strategies. In none of these cases do you have enough information for rational action (not that this is always the goal!).

Maybe you would say these are apples and oranges, that climate science is a much bigger issue, etc. etc. I’m not sure. In both cases we are talking about billions of dollars of costs. The only real difference is that policy action will be taken by collective action (climate change) or corporate action (IT).

Improve the Data

So am I saying that Tim Bray is wrong, that corporate IT is fine, that the lessons of Twitter don’t apply? Not really. What I am saying is that you should base these sorts of conclusions on empirical data that is collected in the best traditions of open science. It would be data that is from a survey whose questions we can see. It would be data from peer-reviewed research journals.

Let’s stop making conclusions using data from Gartner, Standish, et al. These firms are in the business of selling advice, and they are not interested in objective truth. A lot of the reports are based on closed source surveys, talks with “Thought Leaders”, random observations, etc. If you are Standish, are you really interested in a report that says “Most IT projects successful”?

What is needed is a non-partisan study, much like the IPCC, that will examine the relevant scientific research on the issue. Before we can draw conclusions, especially black and white conclusions, we need to know what we don’t know (“unknown unknowns”!). This means raw data, and this means more openness on how well these IT projects are really doing. It would mean allowing researchers access to IT development teams, to perform proper case studies, to see, for example, why the FBI system failed. Too often software researchers are in the position of begging companies to release data. And even in industries where publicly accessible studies are mandated, we find many games being played to prevent negative outcomes being published.

So will there be improvements in empirical data on software? I have my doubts. But it’s necessary if we really want to know what software can do, and what software cannot do.

Written by Neil

2010 January 6 at 11:21

Posted in Uncategorized

Tagged with , , ,

3 Responses

Subscribe to comments with RSS.

  1. You say, “here is a trend in the software blogosphere to use one or two data points as solid evidence that “C sucks” or “Perl is unreadable””, but don’t provide a citation.

    Greg Wilson

    2010 January 6 at 14:10

  2. Here’s how you solve a problem on wikipedia or the Internet at large, watch and learn:

    http://www.google.ca/search?hl=en&source=hp&q=perl+sucks&btnG=Google+Search&meta=&aq=&oq=perl+sucks

    552,000 for perl sucks

    http://www.google.ca/search?hl=en&safe=off&q=C+sucks&btnG=Search&meta=&aq=f&oq=

    28,100,000 for C sucks

    http://www.google.ca/search?hl=en&safe=off&q=Python+sucks&btnG=Search&meta=&aq=&oq=Python+suck
    1,100,000 for Python sucks

    Obviously Python sucks 100% more than Perl sucks. It must be due to Python’s lack of lexical scope (note how a conclusion is made with a complete lack of evidence). While C sucks 28X more than Python.

    Anonymous

    2010 January 6 at 19:44


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 384 other followers