Posts Tagged ‘software’
IT failure statistics
There’s an excellent IT project dashboard from the US government reporting on success/failure rates, project size, and amount of spending (which is frankly jaw-dropping). It is a very useful site, because most of the information we have seems to come from self-interested consultants. It’s certainly in their interests to emphasize how projects are always in a perpetual state of failure. Which doesn’t mean they aren’t, of course. If we look at the very coarse-grained US data, it’s clear that an uncomfortably large number of projects are in trouble. E.g., the figure below shows 7% of projects are in serious trouble, and 34% need attention. 
There’s a reporting bias in the press, too, of course: Man Bites Dog syndrome. If the project works and saves money, it won’t make headlines. For example, the Ontario Telemedicine Network connects physicians, nurses and patients online every single day, with high uptime rates. Every patient that doesn’t have to travel to Toronto from Sudbury saves the government — and the patient — mucho dinero.
There’s another comparison I rarely see, as well: how many projects of any kind are successful? If we look at the recent decision by the Canadian government to dole out ‘stimulus’ money for infrastructure projects (which invariably mean building new roads, for some reason), we see failure rates which are comparable to those in IT. For the 2 year, $4 billion fund, only 25% of projects were finished before the March 2011 deadline, with the most likely scenario being that some 900 projects will not meet the deadline. Manitoba is apparently hoping for favorable weather to meet the target.
Perhaps the difference is that when it is something physical, like a road, it is much harder to leave it half-finished then a piece of software.
Should we care about evidence-based software engineering?
- The field with a long history of evidence-based practice, and the most to gain from it, medicine, often doesn’t adopt the recommended practices, or the evidence chosen is irrelevant. Despite hand-washing or checklists being shown (proven?) to be very cost-effective practices to adopt, doctors still leave washrooms without cleaning their hands, and instruments still get left in patients. And in most software projects, there isn’t anything like that sort of liability.
- People don’t understand statistical generalization very well. Is that new pill reducing my risk of heart disease 20% more than the other pill, or 20% more than a regimen of Big Macs? Was this experiment done with non-English speakers? There’s a lot more to it than running a few t-tests and calling it a day. See e.g. “Why most published research findings are false” or a series of critiques on fMRI studies.
- Small results don’t say much. A lot of research is evaluated on small numbers of undergrads or focused on one particular organization (pdf). That evidence is useless to most developers. There is a paucity of in-depth, detailed case studies that generalize to meaningful theories. Personally I am in favour of a moratorium on experimentation in software research until more of these case studies are done. Unfortunately, the lure of the easy number is a Siren-call to reviewers and funding agencies.
- SEMAT to the contrary, there is no good body of software theory that would provide explanatory power to go along with results. Without a theory facts are descriptive; with a theory they can be predictive.
- It simply isn’t that important. Individuals and organizations do many things which research suggests is downright insane — like embarking on projects without clear requirements, or maintaining 30 year old mainframes — and get by. In fact, anecdotal evidence suggests that many excellent companies started with poor practices, then refactored as needed. Probably, this is because evidence-based software development is a case of premature optimization. For example, despite reams of studies suggesting model-driven development is the way of the future, industrial adoption is underwhelming. Is it because they haven’t read the studies? Or that they evaluated the technology and concluded it wasn’t necessary? As academics, we tend to undervalue the benefit of anecdote and gut feelings. Most of the time this is probably correct, but only if we have evidence to support generalization to common scenarios. Most developers were so burned by the CASE tools of the 1980s that they have no interest in repeating the experience with UML.
I think my final point is that rationality is the exception, rather than the rule, in human behaviour. There’s no reason to lose any much sleep over the fact that industry isn’t following evidence-based software practices.
p.s. I’m a complete hypocrite with respect to experimentation.
De-referencing climate claims
Our group of software scientists, informaticians, and modeling experts at the University has been congregating weekly to discuss software science’s role in {ameliorating|understanding|preventing|debunking} climate change. Steve Easterbrook’s been the inspiration.
We have undertaken the challenge of modeling some of the claims and information used in climate change. We started out with wind renewables, specifically chaps 10/14/18/B in the Mackay book.
I found this figure interesting:
It’s from Mackay, page 107, and shows his estimates for renewable energy outputs, in kWh/day, in the first column, compared with four other sources. What I wanted to do was suss out the reasons for the discrepancies in the numbers. I’ve picked the offshore wind row, which he has as generating up to 48 kWh/d (32 deep, 16 shallow) vs 6.4, 4.6, and 21 kWh/d for other sources. That’s a difference of between 2x and 10x, depending on source and whether you include deep offshore (‘deep’ is water deeper than 25 metres, not including floating generators). Mackay’s numbers are derived from a series of Fermi calculations in his book.
CAT report
The biggest number is 21kWh/d, which was included in the Centre for Alternative
Technology’s (CAT) “Island Britain” plan — Helweg-Larsen and Bull (2007). I found a copy of this online, and tried to figure out the genesis of this number. The first sentence on page 85 gives 3,212 TWh/year as the potential amount of (offshore) wind energy, and estimates harnessing 14% of this by 2027. 14% of 3212 is 450, divided by 365 to give a daily figure, which is 1.23 TWh/day. MacKay gives his figures per person, so I used the population as of 2007, which is 61 million people, and divided by that, then multiplied by 106 (to change from Tera to kilo). Result is 20.16, which is close to the 21 kWh/day MacKay lists (probably due to rounding). Obviously the 2027 population of the UK will be much higher than 61 million, and I believe the figures used in the CAT report are only for Great Britain, which excludes a lot of offshore area around Ireland (and, I suppose, the Isle of Man, etc).
The key datum here is the 3,212 TWh/year estimate of potential wind energy offshore. That number comes from Department of Trade and Industry (Nov 2002) “Future Offshore: A strategic framework for the offshore wind industry“. In that report Table 2.2, page 26, summarizes the various wind energy potentials. This information is derived from a GIS that models water depth, wind speeds, etc. The report uses four strategic regions only, and omits water less than 5m deep, since there is likely to be greater conflicts closer to shore. However, the table calculates values for depths between 5 and 30 metres, and a second value for water between 30-50 m. The total energy production is then listed 3,213 TWh/year for all depths between 5-50m, within the four strategic regions. These regions are selected for location near the grid and also expanse of shallow water. They look comparatively small in size, so I suspect the totals are greatly underestimating the total potential (notwithstanding grid improvements to transfer the energy).
What is the cause of the difference between Mackay’s 48kWh/day and CAT’s 21kWh/day? One is due to this difference in the size of possible area. Mackay calculates the water depths, then arbitrarily takes 1/3 as usable (due to shipping etc). Another is estimates of potential wind energy. He uses average power per square metre of 3 W/m2, while the CAT/DTI report presents no such estimate, but its calculations suggest they are using a factor of 12 W/m2. I cannot find their source for such a huge increase in the average power per unit area. Mackay, on page 60, lists the actual power for a running farm at 2.6 W/m2.
I think what the DTI/CAT report did was to place a ’3MW’ turbine every 500 metres, then multiply that density by the total area, to get total power capacity. I.e., for a one km2 unit, there would be nine turbines possible (using their 500m separation), for a power potential of 27W/m2. The DTI paper, on page 27, then assumes a 40% load factor (Mackay states industrial averages are nearer 30%). If we assume 10 turbines per km2 (not sure why 10 and not 9), we end up with the 12W/m2 figure they used. This is an unsafe assumption, however, as the real world data from Mackay seems to suggest. Certainly the factor of 4 improvement is highly suspect.
Conclusions
It seems clear that this little exercise has pointed out some very interesting inconsistencies. Obviously, calculations for potential energy sources in twenty years will always be subject to some uncertainty — who can say for sure what the possible results will be, or what ‘unknown unknowns’ will exist? Nonetheless, even for these simple analyses assumptions varied greatly — available area, feasible sea depth, load factors, unit power.
To me, it seems clear that more detailed planning than these rough estimates should be performed. The DTI report, for example, seems to be a major policy statement of the UK government! And it isn’t like the information is hard to gather, relative to other sciences. We could surely stick some anenometers out there to improve our knowledge of peak loads, evaluate current projects, perform some economic analysis like risk assessments, etc., in order to derive a more realistic, finer-grained model.
I would like to acknowledge Jon Udell, whose efforts in ‘citizen e-democracy’ have been at the forefront for a long period of time. See, for example, his work tracking down the sources for crime statistics in his hometown. Too often the tidy figure quoted in a published study is the product of sloppiness and lazy research. And all too rarely is this figure fact-checked. But that is a (relatively) painless thing to do in the age of the Internet, and I think it behooves us all as citizens to do so.
I study software, not software engineering
Software research has always been a strange creature. Born in the 60s as computer pioneers realized the immense challenge software posed (where previously most efforts had centered on hardware engineering and design), software covers such a broad scope of domains and possibilities that categorizing it is like categorizing transportation conveyances. Are you building an ocean tanker? A cheap, self-propelled city vehicle? Something to destroy buildings?
That’s why most arguments about software tend to feature wagon-circling by members of particular domains. Are we engineers or craftspeople? The guy who builds websites for local veterinarians is appalled by the demands RUP places on his development process. He finds Java and Tomcat ridiculously over-engineered. At the same time, the multinational IT support person can’t fathom how anyone would develop without a fully elicited requirements model or class diagram. She looks at Ruby and sees a newfangled bastardization of functional programming and Perl.
These aren’t the only camps, either. It often seems to break down over project size, potential harm, and importance. NASA has very few defects in its code, but spends billions of dollars to get to that point. Not something 37 Signals is interested in doing.
This is why I find the term ‘software engineering’ misleading. Sure, back in the 60s, 70s, 80s, venues like ICSE were largely about giant defence projects, things like Star Wars (see here for a great rant about how futile that effort was, though). Software could be engineered, right? Throw peons at the project, a bunch of money, and with a healthy management team the project would succeed.
Since the dawn of the open-source movement, I would say, and the rise of personal software in the late 80s, research on large-scale software was increasingly less useful to a large number of practitioners. This led to a number of non-academics proposing truly innovative and successful methodologies — Scrum, FDD, XP, Lean/Kanban — because, in my view, the researchers just weren’t able to see the change coming. Partly this is because as a professor, you need secure funding, and small firms cannot provide this. Inevitably you end up working on Oracle instead of MySQL, for example.
The exciting things is that doing research in software fascinates me precisely because of this diversity. But that research must be targeted appropriately. Don’t insist on the primacy of UML for every possible project — it just doesn’t make sense. Don’t see statements like “people over process” and assume that process is irrelevant.
I think in order to more fully embrace this awesome scope, to make research (more) relevant, we ought to stop pretending we all do engineering the way they had hoped — although not all of them — at the NATO conference in 1968. We need to more carefully identify the relevance of humans in the software loop. We need to accept the inevitability of change and inconsistency in software and organizations. We need to figure out what’s useful for the ten-person company, as well as the thousand person company. We ought to just call ourselves software scientists, and go from there.
Currently, it seems to me that practitioners of all stripes get their information thusly: first, via colleagues or anecdote; secondly, from websites like Slashdot or Digg; thirdly, from vendors and PR agencies; fourthly, from industrial research firms like Gartner or Forrester; and finally, lastly, from academic research papers. I would really like to move academic research up this list.
There is no such thing as a non-functional requirement
Non-functional requirements, or NFRs, have long been a subject of confusion for me. Typically, NFRs have been used to refer to properties of software that are not directly related to specific task-fulfilling criteria. For example, ‘calculates GST’ is a functional property of a system. ‘Do it quickly’ has historically been considered non-functional, in the sense (and here’s my problem) that it is not directly affecting the ability of the system to do the task. Of course this is hogwash. If I deliver a system to a client that calculates GST owing in seconds, the client will reject it. I think similar arguments can be made for usability, security, reliability, etc.
My advisor, John Mylopoulos, wrote what is considered one of the seminal works on non-functional requirements. In it, he and his co-authors describe NFRs as ‘global requirements on [the system's] development or operational cost, performance, reliability, maintainability, portability, robustness, and the like’. This is essentially an extensional definition (listing the members of the set, rather than the attributes any member must have). Indeed, in the paper they note there is no formal definition of non-functional (citing instead work on software quality).
Martin Glinz, an RE luminary, wrote a tortured summary of the research thinking on NFRs, proposing (of course!) a new framework for NFRs. I think the issue is simpler than that. Dispense entirely with the notion of ‘functional’ vs. ‘non-functional’ — anything the client wants the software to do is functional. Rather, we focus on whether we can quantify a requirement or not — e.g., does the system calculate GST — and divide our requirements into goals and softgoals, where softgoals must be broken into quality spaces (response time, for example) to understand them. I think talking about software quality is a much more fruitful pursuit.
