Running a “Critical Research Review” at #RE15

Today we conducted our first attempt at “Critical Research Reviews” (CRR) at our workshop on empirical requirements engineering (EmpiRE) at the 2015 Requirements Engineering Conference.

CRR was introduced to me by Mark Guzdial’s post on the same exercise at ICER last year, which was run by Colleen Lewis. The idea (as I understand it) is to have researchers present work in progress, ideally at the research design stage. The purpose of the workshop is to “leverage smart people for an hour” in improving and stress-testing your research idea and methodology.

The cool part about doing this at EmpiRE is that our proposers got to leverage some of the leading empirical researchers in the RE community. These are the people likely reviewing your full paper, so it makes sense to get their critique up front.

We had three accepted “research proposal” papers as a special category for the workshop call. In the afternoon, (2pm-5.30pm) we had the three presenters do a 15 minute plenary presentation to get everyone in the workshop (25 or so) aware of the work. I restricted any questions so this was almost entirely over in 45 minutes. After a coffee break, I introduced the CRR concept and some ground rules, as well as a list of potential questions to consider. Then, for the next 45 minutes or so, the participants were invited to join the presenter that interested them and have a (polite) discussion about the proposed research.

Finally, I had asked each group to bring some wide-ranging thoughts back for the entire group for the last 30 minutes. My intent here was not to go into specifics on the proposals; rather, to get some other lessons that might be useful for the people who were not part of that particular group. This worked pretty well; it did tend to go into more detail than perhaps warranted, but it did stimulate some interesting discussion.

From what I heard, people quite enjoyed this approach to research evaluation. It’s much more fun trying to poke holes in research approaches when the author on the other end can rebut your arguments!  Look for another edition next year.

You can find my slides introducing the idea here, and our proceedings, with the presenters’ research proposals, will be posted whenever IEEE gets around to it.

Lessons learned

  • The room was terrible: large central conference table. One group retreated to the coffee room which had large circular tables.
  • No one used the flip charts: I think the presenters were writing their own notes down on their laptops anyway.
  • We mostly had established researchers presenting. In the future we are considering perhaps restricting this to early career or Phd students, who likely need the assistance more. But I think the more senior researchers still benefited. The primary difference, I think, will be that the senior researchers will have considered more of the potential threats.
  • I was main facilitator: having one group a 2 minute walk away made this harder. No group really needed help, but I can certainly see possibilities where that would be an issue. For instance, if you get too many people going to one presenter, or one person dominating the discussion, or too much negativity (the usual group dynamics, in other words).

Leave a comment

Filed under Uncategorized

A Field Study of Technical Debt

Over on my employer’s blog, I’ve written up our survey results on technical debt.

Leave a comment

Filed under Uncategorized

Thoughts from a CodeFest

This past weekend was the Steel City Codefest. The idea is that community non-profits present some problem for which an “app” would help them, and coders spend 24 hours coming up with some solution. It was a lot of fun. You can see our team’s solution at http://citipark.herokuapp.com. Our challenge was to create an easier way for people to find the city of Pittsburgh’s GrubUp food program, which offers free lunch and breakfast at 80+ sites around the city in the summer (sadly, a lot of Pittsburgh youth are food insecure).

Me and my teammate Monica starting off Saturday morning.

We didn’t win the challenge, but I learned a lot on the way.

We created tons of  technical debt : code clones, code comments, no testing, no design. It was code as fast as possible, get it working, fix the obvious user facing bugs. We shipped. But even during that 24h span the design hit us, as it became harder to change things since logic and UI were wrapped together. Even something as trivial as renaming a media folder became a massive headache. We had no tests, so any change had to be “tested” by running the app and running through a few scenarios. Error handling was likewise left for later work, so if faulty input was entered the whole thing crashed.

It took a long time simply to do infrastructure setup: what Github repository, what web host, what database, how do we communicate together. Part of it was this was only my 2nd time building a node application, so I was unfamiliar with its internal expectations and capabilities. Things like “don’t send headers twice” caused problems for me that a more experienced developer would not have had. In a 24h period this stuff needs to be like riding a bike, so deciding on a framework that I had little experience in was costly. It’s like going to a marathon without having trained at all.

We were three people: two coders and a designer/QA person. Three was the minimum, and it really wasn’t enough. There were tasks like entering data into the database (the Citiparks staff provided excel spreadsheets) that took me a few hours but had zero payoff. In a codefest, data quality is not a factor in the judging (the judges don’t come from the clients). In an enterprise situation, the data is probably as important as anything else, but here it was wasted effort, and sample data would have worked fine.

We had somewhat of an idea how things would work, but wireframing it beforehand, and being much clearer about what steps were necessary, would have been better (you could not write code before, but this sort of sketching was allowed). A simple design plan and backlog would have been easier to work off, and help to resist the temptation to simply start hacking away. A number of times I would push back from the table, and say to myself “do I even need to do this?”

Writing code this way is a great way to learn these lessons. I have a number of academic publications about finding requirements, for example, but it is only when you do it yourself that you realize how much is lost between the quick IM conversations you have with teammates and the actual issue tracker. I do wonder, however, if these 24h codefests promote ‘code first’ over the value of design. For example, my sense is that a lot of what we did simply wouldn’t work in an enterprise environment: there are design guidelines, authentication, security, data integration, lifecycle maintenance concerns, none of which you have the luxury to spend much time with. The cool thing about the Steel City event, however, is that the organizers do make a series of $10k grants available, in order to take the app to a more integrated and polished version.

It was a great event – very well organized, with great food and volunteers. And the Citiparks staff were amazing, sending their director and deputy director to do user testing at 7pm Saturday, and bringing amazing treats for us twice during the event. It also focused on an underserved area, in my view: social justice and not-for-profits. Many have quite simple needs, that in many cases amount to adding data to a Google Map, but even that is beyond their budgets.

5 Comments

Filed under Uncategorized

Frameworks, libraries, and dependencies

I’ve been doing a little thinking about frameworks lately. They fascinate me as 1) a realization of the vision of ‘pluggable software’ and reusable components desired since probably 1968; 2) what you are getting into when you rely on one. This is prompted by this great post on libraries vs frameworks.

Now, we’ve used libraries for ages, viz. glibc etc. And the notion of ‘code that someone else wrote and maintains that I need’ was likely established in the design of Unix and pipe and filter architectures. But it really seems like the past 10 years have seen this wonderful explosion of creativity in writing ‘little libraries’ for various different systems.

I’ll take a common example. I’ve previously used Node.js for a small visualization I did for my brother’s work on genetics (in progress!). Although an academic, I like to try to stay on top of things, so I tried out Node, the Javascript web server. Now JS itself has at least 60+ frameworks and libraries, and that list doesn’t even include Node or some of the ones I’ll describe below. This is amazing considering although JS has been around a long time, only recently (would we say JQuery is the prototypical case?) has this explosion happened.

The trouble is that like the Cambrian explosion, some of these libraries and frameworks are doomed to extinction. If you are BigCo, that makes choosing one very tricky, in addition to the licensing and security questions you will need to ask.

Consider. I wrote the application for the Node server, using Express as a web framework (that means it automates some of the routing and layout of files and directories for you). To get to the database I used the Node PostGres library. To do UI I relied on JqueryUI and Stylus for CSS, with Jade for templating. Then I used Morgan for logging, Gulp to automate the style generation from the Stylus files, and was toying with D3 to do the display. Not to mention I need a Platform as Service from Heroku, so I have their command line tools installed as well.

So that gives about 10 different libraries to run this app. On the plus side, they automate a ton of code I no longer have to worry about, letting me focus on the key value-add of the app (realized in the SQL code I write and custom request handling code).

But I just upgraded to Express 4, and they’ve broken the back-compatibility, so I must now understand what the changes mean and how to retrofit them. Who maintains these libraries? Will he or she keep updating it? These are by no means new questions, but I think what has changed is that now it is very hard to avoid using them. And once you commit to it, re-architecting for the problems you will inevitably face with leaky abstractions seems challenging, because everything is deeply connected. You cannot just drop in a new back end server with the same libraries.

Now imagine that multiplied times 10 years and instead of my simple app, a mission critical information system, and you start to get a sense of the problem that legacy applications can pose. Fortunately, I work at a place with lots of experience solving those problems, so give us a call if you need help!

1 Comment

Filed under Uncategorized

The Gap Between User Requirements and Software Capabilities as Technical Debt

One of my favorite graphics is from Al Davis, in 1988. Aside: it is depressing how often we re-invent the wheel in this business.

Al Davis requirements growth

The nice thing is how one can map various software development concepts to parts of the diagram. I actually think there is another thing you can grab there. Well, two things. One, the environment is not captured in this picture, but only user needs and the specification. In most cases (maybe this is what wasn’t clear in 1988) the user requirements are constrained by the environment, that is itself changing. This is part of our re-definition of the requirements problem of Zave and Jackson.

Two, I think you can use this to show how the rate of growth in the gap between needs and system (what Davis calls “inappropriateness”, the shaded area) is also an issue. I think this captures the technical debt problem more succinctly. You will see a growth if, for example, you chose a technology solution that constrains your use of web browser (eg. Activex controls mandating IE8). That forces your red line (development/specification/software) to grow slower. Now the question becomes, at what point do you refactor/reengineer so that the rate of adaptability (the slope) increases again?

(I don’t actually know where I got this — maybe Steve Easterbrook, he likes Comic Sans a LOT — or the original source for this but maybe here.)

1 Comment

Filed under Uncategorized

Measuring programmer productivity is futile.

(I’ve typically posted long-form entries but so infrequently … )

The arguments and debates about 10x productivity in “programmers” rage on (this time to defend/reject H1B visas). This debate is doomed to never be concluded. I think the reason why is nicely captured in Andrew Gelman’s post on p-values: they work best when noise is low and signal is high, something which can never be the case when we talk about productivity. As he says,

If we can’t trust p-values, does experimental science involving human variation just have to start over?

Given a random sample of (let’s say) Microsoft software developers, can you devise a test that would show the statistical differences? Are you convinced you would have high power? A big effect size? One person online (via HackerNews) says it is about tool competence. But the recent Latex/Word study leaves me doubting even that conclusion (although I have trouble with that study too, which just reinforces my overall point).

More importantly, I think this calls into question almost any controlled experiment in software engineering. Short of replicated results, I’m skeptical the information content is very high. Instead, I would like more qualitative research. Why do people say there is this difference? What traits are important? Can they be taught? How do we share productivity improvements? These questions seem much more important than trying to attach a p-value to whether one group is better than another.

Leave a comment

Filed under Uncategorized

Cults of Personality and Software Process Improvement

I’m a fan of the Cynefin framework. I find it a great tool for understanding what type of problem you are trying to solve. The notion of complex/complicated/simple is quite helpful. You could do worse then to read Dave Snowden’s blog, as he explores each of the domains in the context (most often) of software projects.

Recently Mr Snowden has been critiquing the Scaled Agile Framework (SAFe) put together by Dean Leffingwell. This attack on SAFe is not unprecedented. It’s hard to take attacks like this too seriously when their proponents don’t put forth data, but merely theory.

One of the most difficult parts of doing research in Software Engineering is its inherently uncontrollable, one-off nature. Sure, in some cases—like websites for restaurants, for example—we see repeatability. But the most interesting projects, the ones SAFe is applied to, the complex or perhaps complicated ones, there is no repeatability (by definition). This makes it impossible to say with any degree of accuracy what factors are contributing to the success or failure of the project. 1

In particular, when you have strong, intelligent, experienced consultants like Mr Snowden, or Mr Leffingwell, or various other graybeards, I don’t think you can control for the ‘personality’ factor. That is, what portion of the success of the initiative (say, applying Sensemaker or SAFe or Scrumban or what have you) is due to the tool/process improvement/methodology, and what portion is due to the smart person effect of the consultant? This is made more difficult when that consultant has a very strong economic incentive to point to the methodology as the distinction, since their business is inextricably tied together with that methodology. Furthermore, just the fact that a company has reached out for help indicates some level of self-awareness.

My feeling is that given a successful team, led by an enlightened manager, it wouldn’t matter what methodology they used (which I mentioned previously in the context of tools). And there is some evidence to support this: Capers Jones suggests RUP and TSP have higher quality than Scrum or other approaches. Now that is just one dataset, but it is exactly one more than Mr Snowden has produced, as far as I can tell (the plural of anecdote is not data).

Does all this mean it doesn’t matter if we choose RUP, SAFe, Scrum, Kanban, Six Sigma, or Sensemaker? To some extent, I think that is true. I would guess that your measurable outcomes after implementing TSP would be similar to the outcomes after implementing SAFe. But the point is, one cannot measure these things in isolation! You will never know (Heisenberg-like) whether something else would have been better. The local project context is so important that the principles are more important than the specific practices (e.g., the agile manifesto, Cynefin domains, good organizational practices, etc).


  1. With one exception I am aware of: this paper from Simula in Norway. They paid 4 different companies to develop to the same set of requirements in order to understand the maintainability characteristics of different approaches. But even there, the results are difficult to generalize. Anda, B.C.D.; Sjoberg, D.; Mockus, A, “Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System,” IEEE Transactions on Software Engineering, vol.35, no.3, pp.407,429, May-June 2009 doi: 10.1109/TSE.2008.89 

Leave a comment

Filed under Uncategorized