The industrial fallacy in software research
The industrial fallacy: the perception, common in research and industry, that industry always adopts the most efficient (and therefore correct) solution for a given need. E.g., use excel for requirements management because anything else falls into Agile’s YAGNI/DRY principles.
My problem is that this assumes the need emerges context-free. In reality, the reasons for using excel have more to do with short-term constraints than a sustained analysis of business needs on a cost/value basis. Industry is rife with examples of companies who do something because that’s how it has always been done, rather than a principled decision not to. The issue is that thought-leaders from the industrial side, such as Martin Fowler or 37 Signals or Joel Spolsky, are all located at the far right-end of the industrial maturity bell curve. Applying their ideas to companies at the other end, or even the middle, may not scale at all.
Should we abandon research on alternatives to SQL because it is so pervasive and useful? Or should we continue to challenge orthodoxy with new ideas and new tools, even when the immediate benefits are not obvious?
Small examples
A lot of popular commentators, like Spolsky or 37 Signals, work on systems that are simple relative to the majority of business IT. There’s a deceptive focus in the blogosphere on small-scale computing solutions, in particular early-stage startups. Even the challenge for Facebook, one of the largest websites in the world, has the benefit of a clarity of mission in which the software is ineluctably intertwined with the business mission. Other companies, such as large banks, face the challenge that the software is not the core competency, and subsequently must jockey for position with other business units, like accounting, marketing, human resources, etc. And yet IT is not really another business function, anymore, but central to the ability to deliver value to customers.
Look at this article from the Economist, which makes the point that IT systems at banks have to manage thousands of counter-party obligations and deliver answers in seconds as to whether the bank owes 3 billion $ or 30 billion $. There is just no way that lessons from 37 Signals have anything but the very broadest applicability in this case. Scott Ambler and David Anderson are two people who work on so-called ‘agility-at-scale’, which deals with the unsexy but extraordinarily important issues of software development in complex, multi-stakeholder environments. The industrial fallacy, in this case, would have us believe that agile has won, that there is no need for requirements up-front, no need for software architecture and design (or rather, that Agile = Scrum or XP or ..). And yet I remain unaware of (even anecdotal) case studies of successful agile development for projects in the range of millions of dollars, which are common in industry. Note that I mean agile as in “follows an agile methodology”, not agile as in “ad-hoc”. I’m sure there is plenty of the latter happening!
It doesn’t matter if you finish on-time and on-budget if you don’t deliver value. And conversely, if you go over-budget but deliver tremendous value, isn’t that acceptable? We need to get away from the view that software == file with code in it. Software is the system that enables the requirements to be satisfied. In the same way that ultimately the home owner must oversee that his/her requirements have been met by the various subsystems (plumbing, electrical, framing, etc.), so must the business customer take responsibility for the software requirements.
Some notes on integrating Mendeley, Scrivener, MultiMarkdown and (Xe)Latex
I’m using Scrivener for my thesis. It has good outlining options, full-screen mode, rich text, and this neat feature that lets you mark things as comments (annotations). I can use Multimarkdown to mark up the text (commonly ** to do italics) and lists are much simpler than \begin{itemize}\end{} format of Latex.
To manage my references I have been using Mendeley. It’s a little alpha still. There’s no way to search for specific fields, for example Actually, it is possible: either use search fields or click the arrow dropdown menu. Yay! And it doesn’t integrate at all with Scrivener, although it will with other editors. There is, however, a web API and it uses a SQLite backend, so hacking is possible.
For example, if there’s one thing I hate it is trying to remember the citation key for a reference – it completely kills my flow. So I’ve hacked up an Apple automator script that will take the highlighted text, search Mendeley’s SQLite db for that text, and return a list of possible matches (yes, this is how I procrastinate about actually writing). Then I can stick that in. Here’s the gist (code).
Generally the Latex export works quite well. You must set an XSLT transform in Scrivener which takes the MultiMarkDown export and converts it to Latex. The code I have online shows how to export this to the UofToronto thesis class.
A few tips:
– generally Latex code, e.g. math mode, is passed through with no problems. However, I’ve found that it is a good idea to surround complex Latex with HTML quotes comments.
– special characters like % and & get escaped, which is not always what you want, particularly if you have cut and paste.
– you can’t use double-dash inside HTML comments, so switch to using single dash or Mac’s en-dash (Opt-dash). If you use XeTex, you can specify a font like Times New Roman which will display this properly.
– You can’t directly show characters like φ too easily in Latex, yet. You are better off with $\phi$ until math fonts improve.
IT failure statistics
There’s an excellent IT project dashboard from the US government reporting on success/failure rates, project size, and amount of spending (which is frankly jaw-dropping). It is a very useful site, because most of the information we have seems to come from self-interested consultants. It’s certainly in their interests to emphasize how projects are always in a perpetual state of failure. Which doesn’t mean they aren’t, of course. If we look at the very coarse-grained US data, it’s clear that an uncomfortably large number of projects are in trouble. E.g., the figure below shows 7% of projects are in serious trouble, and 34% need attention. 
There’s a reporting bias in the press, too, of course: Man Bites Dog syndrome. If the project works and saves money, it won’t make headlines. For example, the Ontario Telemedicine Network connects physicians, nurses and patients online every single day, with high uptime rates. Every patient that doesn’t have to travel to Toronto from Sudbury saves the government — and the patient — mucho dinero.
There’s another comparison I rarely see, as well: how many projects of any kind are successful? If we look at the recent decision by the Canadian government to dole out ‘stimulus’ money for infrastructure projects (which invariably mean building new roads, for some reason), we see failure rates which are comparable to those in IT. For the 2 year, $4 billion fund, only 25% of projects were finished before the March 2011 deadline, with the most likely scenario being that some 900 projects will not meet the deadline. Manitoba is apparently hoping for favorable weather to meet the target.
Perhaps the difference is that when it is something physical, like a road, it is much harder to leave it half-finished then a piece of software.
The relevance of CS research
I came across a post by Seb Paquet on Quora.com about the relevance of CS research(ers). Seb’s position seems to be that academics are doomed to failure when it comes to innovation. I think he comes up with some good reasons why academia will struggle to innovate, but he is comparing oranges to pomegranates. I should preface by noting I’m not enamoured with the current situation, but he is unfair – I don’t see CS research as being in any danger of being pushed into a corner of irrelevance.
First problem is who we are comparing. Most people will point to the successful companies in industry as an example of innovation, such as Apple with the iPad. Then, they will dredge up some aging prof who publishes in the same obscure journals each year as an example of academia and irrelevance. But we should really look at the most successful academics, like Duncan Watts or Jim Hendler. Then, we have a survivor bias in industry –companies which don’t innovate get pushed aside (although this is in support of Seb’s position). And finally, I’m not convinced industry is all that innovative anyway. I mean, how many Y Combinator companies are doing some form of social media site? Often, what we mean by innovation is a combination of marketing and excellent product engineering (definitely realms to which innovation is useful, but not what we usually mean).
The second issue is what we are innovating. Do we want research labs at university to compete with Google or Apple on product development? Of course not. In fact, if a lab did come up with a killer product, it would probably move straightaway into industry – witness Bumptop, OpenText, or Google itself.
What we want from research labs are the long-term innovations, things like ArpaNet, hypermedia, REST (a Ph.D. thesis that is just now being fully understood). These enable whole new areas for research and industrial innovation. And let’s not discount the merits of heading down the wrong path. A lot of academic research consists of proving that something is in fact the case (beyond anecdote) or establishing what doesn’t work.
Finally, regarding the original question (about learning from industry), I think in fact academia is highly responsive to industrial innovations (perhaps too much so!). At our department, we have embraced Python, Subversion, Scrum, Wikis, etc. all within a few years of their development. Keep in mind that teaching needs to focus beyond what is immediately useful, unlike a career college. Professors here have worked with IBM to understand DB2 provisioning, used blogging to understand IR, and leveraged Hollywood to create new animation algorithms. There’s a new academic conference on “Xtremely Large Databases” to keep up with Google-scale problems. I think the key is that the tools produced are not necessarily the most useable – and it is this last mile that industry is great at.
If there is one thing that concerns me as a researcher it would be access to data. In the earlier days of computing, a researcher could claim to be working on similar problems to industry, because universities sank millions of dollars into computing infrastructure to maintain this parity. Today, though, it seems as though universities are falling behind when it comes to ‘real-world’ data to work with. Unless you are privileged to work with Google, you will have a very hard time duplicating that scale of problem. I’m not sure what the last multi-million dollar investment by a university in cutting-edge computing infrastructure was, perhaps next generation cloud systems like SHARCNET.
REFSQ summary
The Working Conference on Requirements Engineering (REFSQ) just concluded. It is a great conference with plenty of discussion and provocative ideas.
I tweeted periodically about the conference, and here are some final thoughts:
Statements from the concluding plenary I disagreed with:
- social scientists never do anything with their theories; social science theories are not generalizable; RE should avoid social science techniques.
- You cannot gather data without a theory.
- We shouldn’t wait for data to start creating requirements engineering (RE) theories.
- Replication refers to repeating an experiment, not re-doing a case study.
- Studies shouldn’t just collect data; they should also propose theories.
It was refreshing to be involved in general discussions about the role of theory and empiricism in requirements, as it is something the field has long ignored. Jorge would be happy: there seems to be acknowledgement that we ought to be working towards better theory building in RE. There was also some muted acknowledgement that whatever we did in the past did not work, and that those ‘theories’ — better to call them ‘conjectures’ or just ‘wild guesses’ — need revisiting.
However. Some people don’t seem to understand that there are many ways of doing science in RE. Nearly everyone agrees new techniques are NOT needed; what is necessary is better ways of understanding how existing tools work or don’t work. And social sciences have a lot to teach us here, as a cursory examination of the literature would reveal. This is not physics! And we can’t use “just wing it” as our epistemic theory. Some feel we should jump to wild conjectures about what ought to work, and seek to test that. In fact, what often works better is to adopt grounded theory approaches.
Case in point. Someone mentioned that often in interviews you go to a person and ask about X, and they respond by cursorily mentioning X and then talking about Z five times. Z is the thing you should be interested in! And indeed a grounded theory approach will allow this to appear.
But these are quibbles. I think in general, there is broad acceptance of the need for rigorous empirical techniques, and also acceptance that we need to aim as a community for comprehensive, well-verified explanatory (and perhaps predictive) theories.
I’ll end with a few provocative statements of my own (a theme of the working conference):
I think in requirements it is easy to mistake the trees for the forest. We seem to focus so much on “making RE better” that we lose sight of the ultimate goal, which is to make better (software) products. Every RE theory should tie in to this goal, in my opinion.
And perhaps more controversially, although everyone at the conference probably fears it, it might be the case that all our tools and techniques are irrelevant in the face of the human aspect of the problem. That is, I wager it is easier to remedy poor tools when you have a mature and intelligent organization, that in fact it doesn’t matter what tools you choose. You could do waterfall and be successful (like NASA seems to do). Here’s a great quote from Watts Humphrey to conclude:
[During my time managing complex projects at IBM] I found that the problems were never technical; they were always management problems.