How Writing Code is Like Making Steel

I saw an interesting keynote from Mark Harman recently, on search-based software improvement. Mark’s lab at UCL also pioneered this idea of automatic code transplants using optimization techniques.

I think if you are an engineer who does fairly standard software development you should be concerned. The ultimate vision is to be able to take some specification with thorough tests, written in a language at a high-level of abstraction (e.g., here is my corporate color palette, here are my security requirements) and automatically generate the application.

There are several forces at play here. One is the increasing componentization of large and complex pieces of software. We’ve always had software reuse, but it tended to be at a much smaller level – the ODBC api, or the OAuth framework. Now our frameworks reach much larger areas of concern, particularly when we look at container technology running on commodity hardware. Someone else is maintaining huge chunks of your software base, in those cases: the OS, the backend, the messaging system, etc. If you then take your Rails app and add it to that stack, how much, as a %, have you created? A decreasing amount, in any case.

The other force is the improvements in genetic and other optimization algorithms, combined with the inevitable scaling of computing power. That means that even though you may be really good at crafting code, and the machine generates garbage, it can improve that garbage very very quickly.

How different is it for me to copy and paste the sample code on the Ruby on Rails site to create a new application, than for a computer algorithm to follow those same steps? To be clear, there remain a lot of complex decisions to make, and I’m not suggesting algorithms can do so: things like distributed systems engineering, cache design, and really just the act of taking a user requirement and turning it into a test.

So how is this like the steel industry? I think it reflects commodification and then automation. Steel was largely hand-made for years, but the pressure of capitalism generated rapid improvements in reducing costs – largely labor costs. Process and parts became standardized, so it was possible to set up mills at much lower cost. The difference in quality between US and (say) Indian steel became small enough to not matter. But even in India, the pressures continue downward, so India’s dramatically lower labor costs still cannot compete with automation.

Some of these pressures don’t exist in software, of course: there is still a large knowledge component to it, and there are no health and safety costs in software labor (the hazards of RSI and sitting notwithstanding). So I don’t see any big changes immediately, but the software industry is probably where the steel industry was in the 20s. In 50 years I cannot see software being written by hand at the level it is now, with the exception (like in steel) of low-quantity, high-tolerance products like embedded development. The rest will be generated automatically by algorithms based on well specified requirements and test cases. Silicon Valley will become the rust belt of technology. You realize that Pittsburgh, birthplace of the steel industry, was once the most expensive city in the US, right?

If you doubt this, I think we are really arguing over when, and not what. My simplest example is coding interviews. Why test people on knowledge of algorithms that are well understood, to the point where they are in textbooks and well-used library code? The computer can write the FizzBuzz program faster and more efficiently than a human can. Over the next few decades, I believe Mark Harman’s optimization approach will encompass more and more of what we now do by hand.

Garbage In, Garbage Out

My dad had this great cup from one of his vists to COMDEX (ostensibly to keep up with the latest in the tech world, which at the time COMDEX represented). It said “Garbage in, garbage out” (GIGO), and then had the name of some failed software company.

GIGO mug
GIGO mug (Cafepress)


I read a great blog about intermediate targets and over-optimizing what you measure (Hawthorne’s law) and the unintended side effects. Then I watched a presentation on the future of data visualization.

The commonality to me is this undesirable focus on the simple over the complex. So a dashboard can in a glance tell you how fast your car is going, which is useful because it maps to two concerns you have as a driver: obeying the speed limit laws, and maximizing your time in the car. I should say, “maps directly”, because as an indicator for these two concerns, speed is pretty much a 1-1 mapping. But consider a car indicator with a much poorer mapping to your concern: the “distance remaining” gauge new cars have. This tells you that based on some model of past driving behavior, you can expect to travel X more miles before the fuel runs out. The problem is this indicator is no longer a simple mapping. You have a (possibly non-linear) model of past behaviour (and no idea how far back the model goes); possibly inaccurate sensors (e.g., depending on temperature, the amount of fuel actually remaining might change); and finally, it is predicting future behavior (you will continue to drive to work tomorrow, and not go on a long distance highway drive).

In much the same way I think this fascination with metrics and dashboards confuses construct for concern. If I’m the government CIO, my concern is the value for taxpayer money each project is generating. But the dashboards are probably showing me constructs like estimated time to completion or lines of source code. Furthermore, and this is the data/info vis piece, those constructs are being mapped into visual variables using some arbitrary function. For instance, the decision to turn something from green to red might be based on a simple threshold chosen by an intern.

In broad stroked, constructs like source lines of code can, I think, be useful: logarithmically, perhaps, in the sense that a system with 100 thousand lines is more complex than one with only 10 thousand.

This typically isn’t how dashboards work, though. Thinking about numbers seems so innately arithmetic (5 is halfway between 1 and 9, not 3) that we cannot comprehend how little the dashboard is telling us. The Japanese lean movement has a nice word that captures what i think needs to happen: genchi genbatsu, “management by walking around”. In a factory, just looking at metrics for production speed and inventory is not the whole picture, and so long ago the Toyota production system creators learned that you had to actually walk the shop floor to see for your own eyes.

This is perhaps harder in the non-physical world of software, but I think for most of us we have a sense of project performance innately: are meetings productive? When was the last time you saw a working piece of code? Do you get quick answer to emails? While it is possible to metricize these things, probably it won’t help much more than buttonholing someone in the hallway.

Running a “Critical Research Review” at #RE15

Today we conducted our first attempt at “Critical Research Reviews” (CRR) at our workshop on empirical requirements engineering (EmpiRE) at the 2015 Requirements Engineering Conference.

CRR was introduced to me by Mark Guzdial’s post on the same exercise at ICER last year, which was run by Colleen Lewis. The idea (as I understand it) is to have researchers present work in progress, ideally at the research design stage. The purpose of the workshop is to “leverage smart people for an hour” in improving and stress-testing your research idea and methodology.

The cool part about doing this at EmpiRE is that our proposers got to leverage some of the leading empirical researchers in the RE community. These are the people likely reviewing your full paper, so it makes sense to get their critique up front.

We had three accepted “research proposal” papers as a special category for the workshop call. In the afternoon, (2pm-5.30pm) we had the three presenters do a 15 minute plenary presentation to get everyone in the workshop (25 or so) aware of the work. I restricted any questions so this was almost entirely over in 45 minutes. After a coffee break, I introduced the CRR concept and some ground rules, as well as a list of potential questions to consider. Then, for the next 45 minutes or so, the participants were invited to join the presenter that interested them and have a (polite) discussion about the proposed research.

Finally, I had asked each group to bring some wide-ranging thoughts back for the entire group for the last 30 minutes. My intent here was not to go into specifics on the proposals; rather, to get some other lessons that might be useful for the people who were not part of that particular group. This worked pretty well; it did tend to go into more detail than perhaps warranted, but it did stimulate some interesting discussion.

From what I heard, people quite enjoyed this approach to research evaluation. It’s much more fun trying to poke holes in research approaches when the author on the other end can rebut your arguments!  Look for another edition next year.

You can find my slides introducing the idea here, and our proceedings, with the presenters’ research proposals, will be posted whenever IEEE gets around to it.

Lessons learned

  • The room was terrible: large central conference table. One group retreated to the coffee room which had large circular tables.
  • No one used the flip charts: I think the presenters were writing their own notes down on their laptops anyway.
  • We mostly had established researchers presenting. In the future we are considering perhaps restricting this to early career or Phd students, who likely need the assistance more. But I think the more senior researchers still benefited. The primary difference, I think, will be that the senior researchers will have considered more of the potential threats.
  • I was main facilitator: having one group a 2 minute walk away made this harder. No group really needed help, but I can certainly see possibilities where that would be an issue. For instance, if you get too many people going to one presenter, or one person dominating the discussion, or too much negativity (the usual group dynamics, in other words).

Thoughts from a CodeFest

This past weekend was the Steel City Codefest. The idea is that community non-profits present some problem for which an “app” would help them, and coders spend 24 hours coming up with some solution. It was a lot of fun. You can see our team’s solution at Our challenge was to create an easier way for people to find the city of Pittsburgh’s GrubUp food program, which offers free lunch and breakfast at 80+ sites around the city in the summer (sadly, a lot of Pittsburgh youth are food insecure).

Me and my teammate Monica starting off Saturday morning.

We didn’t win the challenge, but I learned a lot on the way.

We created tons of  technical debt : code clones, code comments, no testing, no design. It was code as fast as possible, get it working, fix the obvious user facing bugs. We shipped. But even during that 24h span the design hit us, as it became harder to change things since logic and UI were wrapped together. Even something as trivial as renaming a media folder became a massive headache. We had no tests, so any change had to be “tested” by running the app and running through a few scenarios. Error handling was likewise left for later work, so if faulty input was entered the whole thing crashed.

It took a long time simply to do infrastructure setup: what Github repository, what web host, what database, how do we communicate together. Part of it was this was only my 2nd time building a node application, so I was unfamiliar with its internal expectations and capabilities. Things like “don’t send headers twice” caused problems for me that a more experienced developer would not have had. In a 24h period this stuff needs to be like riding a bike, so deciding on a framework that I had little experience in was costly. It’s like going to a marathon without having trained at all.

We were three people: two coders and a designer/QA person. Three was the minimum, and it really wasn’t enough. There were tasks like entering data into the database (the Citiparks staff provided excel spreadsheets) that took me a few hours but had zero payoff. In a codefest, data quality is not a factor in the judging (the judges don’t come from the clients). In an enterprise situation, the data is probably as important as anything else, but here it was wasted effort, and sample data would have worked fine.

We had somewhat of an idea how things would work, but wireframing it beforehand, and being much clearer about what steps were necessary, would have been better (you could not write code before, but this sort of sketching was allowed). A simple design plan and backlog would have been easier to work off, and help to resist the temptation to simply start hacking away. A number of times I would push back from the table, and say to myself “do I even need to do this?”

Writing code this way is a great way to learn these lessons. I have a number of academic publications about finding requirements, for example, but it is only when you do it yourself that you realize how much is lost between the quick IM conversations you have with teammates and the actual issue tracker. I do wonder, however, if these 24h codefests promote ‘code first’ over the value of design. For example, my sense is that a lot of what we did simply wouldn’t work in an enterprise environment: there are design guidelines, authentication, security, data integration, lifecycle maintenance concerns, none of which you have the luxury to spend much time with. The cool thing about the Steel City event, however, is that the organizers do make a series of $10k grants available, in order to take the app to a more integrated and polished version.

It was a great event – very well organized, with great food and volunteers. And the Citiparks staff were amazing, sending their director and deputy director to do user testing at 7pm Saturday, and bringing amazing treats for us twice during the event. It also focused on an underserved area, in my view: social justice and not-for-profits. Many have quite simple needs, that in many cases amount to adding data to a Google Map, but even that is beyond their budgets.

Frameworks, libraries, and dependencies

I’ve been doing a little thinking about frameworks lately. They fascinate me as 1) a realization of the vision of ‘pluggable software’ and reusable components desired since probably 1968; 2) what you are getting into when you rely on one. This is prompted by this great post on libraries vs frameworks.

Now, we’ve used libraries for ages, viz. glibc etc. And the notion of ‘code that someone else wrote and maintains that I need’ was likely established in the design of Unix and pipe and filter architectures. But it really seems like the past 10 years have seen this wonderful explosion of creativity in writing ‘little libraries’ for various different systems.

I’ll take a common example. I’ve previously used Node.js for a small visualization I did for my brother’s work on genetics (in progress!). Although an academic, I like to try to stay on top of things, so I tried out Node, the Javascript web server. Now JS itself has at least 60+ frameworks and libraries, and that list doesn’t even include Node or some of the ones I’ll describe below. This is amazing considering although JS has been around a long time, only recently (would we say JQuery is the prototypical case?) has this explosion happened.

The trouble is that like the Cambrian explosion, some of these libraries and frameworks are doomed to extinction. If you are BigCo, that makes choosing one very tricky, in addition to the licensing and security questions you will need to ask.

Consider. I wrote the application for the Node server, using Express as a web framework (that means it automates some of the routing and layout of files and directories for you). To get to the database I used the Node PostGres library. To do UI I relied on JqueryUI and Stylus for CSS, with Jade for templating. Then I used Morgan for logging, Gulp to automate the style generation from the Stylus files, and was toying with D3 to do the display. Not to mention I need a Platform as Service from Heroku, so I have their command line tools installed as well.

So that gives about 10 different libraries to run this app. On the plus side, they automate a ton of code I no longer have to worry about, letting me focus on the key value-add of the app (realized in the SQL code I write and custom request handling code).

But I just upgraded to Express 4, and they’ve broken the back-compatibility, so I must now understand what the changes mean and how to retrofit them. Who maintains these libraries? Will he or she keep updating it? These are by no means new questions, but I think what has changed is that now it is very hard to avoid using them. And once you commit to it, re-architecting for the problems you will inevitably face with leaky abstractions seems challenging, because everything is deeply connected. You cannot just drop in a new back end server with the same libraries.

Now imagine that multiplied times 10 years and instead of my simple app, a mission critical information system, and you start to get a sense of the problem that legacy applications can pose. Fortunately, I work at a place with lots of experience solving those problems, so give us a call if you need help!

The Gap Between User Requirements and Software Capabilities as Technical Debt

One of my favorite graphics is from Al Davis, in 1988. Aside: it is depressing how often we re-invent the wheel in this business.

Al Davis requirements growth

The nice thing is how one can map various software development concepts to parts of the diagram. I actually think there is another thing you can grab there. Well, two things. One, the environment is not captured in this picture, but only user needs and the specification. In most cases (maybe this is what wasn’t clear in 1988) the user requirements are constrained by the environment, that is itself changing. This is part of our re-definition of the requirements problem of Zave and Jackson.

Two, I think you can use this to show how the rate of growth in the gap between needs and system (what Davis calls “inappropriateness”, the shaded area) is also an issue. I think this captures the technical debt problem more succinctly. You will see a growth if, for example, you chose a technology solution that constrains your use of web browser (eg. Activex controls mandating IE8). That forces your red line (development/specification/software) to grow slower. Now the question becomes, at what point do you refactor/reengineer so that the rate of adaptability (the slope) increases again?

(I don’t actually know where I got this — maybe Steve Easterbrook, he likes Comic Sans a LOT — or the original source for this but maybe here.)