Impact factors in SE
I confess to not understanding the impact factor in software research. The impact factor for a given year (as used by ISI) measures the mean number of times articles in the preceding two years were cited that year. So an IF of 5 for 2010 means on average, each article in 2008 and 2009 was cited 5 times in 2010.
In SE many of the journals I consider the most prestigious have declined in the past three years [1][2]. Consider:
| Journal | 2010 | 2009 | 2008 |
| ToSE | 2.22 | 3.75 | 3.57 |
| CACM | 2.35 | 2.35 | 2.65 |
| REJ | 0.86 | 0.93 | 1.63 |
| ToSEM | 1.69 | 2.03 | 3.96 |
| IEEE SW | 1.51 | 2.04 | 2.10 |
What exactly do I conclude from this decline? Let’s keep in mind that IF is taking the mean of a non-normal distribution, and that the first destination for new research results is almost certainly not one of these journals. Rather, most of the journals tend to publish longer versions of conference papers, invited issues on special topics, and so on. Here are some stats from my last publication:
Number of citations overall: 31
Number of conference papers cited: 13
Number of journal papers cited: 15
Number of workshop papers cited: 0 (other: 1 thesis, 1 book, 1 ArXiv paper)
Ratio of venues for recent work [3]: 4/5 = 0.8
A decline in IF means that the published papers are seeing fewer people refer to them. Then the question becomes, is that because fewer people are doing SE research, or that fewer people are citing journal papers? I think there are a few observations:
- It is difficult to consider impact in a three year span. That assumes that your journal paper influenced people in the next two years. I’m not sure how many of these journals are read by people immediately. For my work, journal papers tend to be those which have shown to hold up over time, i.e. > 5 years. If I want to keep track of current research I cite papers from conferences. This is particularly true if I look at papers only in this area (since other fields tend to favour journals). My 0.8 figure means that of the citations for that paper, I cited 5 conference papers from 2008-2011 vs. four journal papers (all of which were in other areas than RE). Consider this paper by my colleague Sotirios. It was originally published in September 2010 as a conference paper. It has just now come out (September 2011) in a special issue of REJ. But I doubt I will update my citation strategy to cite his journal paper rather than the conference paper.
- Survey journals (e.g. ACM Computing Surveys) and musings on the future of the field should be excluded. By definition these are broad overview papers, which mean they tend to pop up in plenty of paper introductory sections. But they do not represent ‘research’ as such.
- The other factor is diffusion. It may be that the frustratingly long review times and perceived low impact of these journals has contributed to a negative-feedback cycle. It is natural in an emerging science like software research for the community to seek out other venues and create new journals and conferences. Consider requirements engineering. This is a very diverse field which touches on economics, artificial intelligence, human-computer interaction, sociology and many others. It seems silly to think that a single venue could appeal to all researchers. Many of the papers in CACM, for example, are of very little interest to me – probably less interesting than the papers published in Nature. They are simply too technical and too different than my chosen community. I do not know the numbers, but I suspect that smaller venues like MSR and CHASE, which I perceive as having garnered a lot of momentum, have seen increases in impact.
- And the elephant in the room? The fact that none of these venues makes it easy for people to a) find the papers and b) actually read them. For example, for the longest time just getting a list of recent papers was impossible with any IEEE journals. I subscribed to a RSS feed that always had an error when you tried to link through to the paper. It just isn’t acceptable that publicly-funded research is this difficult to access. Impact factor in the large should be about more than just influence on other researchers. We should consider blog posts, software tools, and especially data repositories.
[1] http://www.cse.chalmers.se/~feldt/advice/isi_listed_se_journals.html
[2] Diomidis Spinellis’s blog post on IF
[3] Number of citations in the past 3 years from journals over the number from conferences.
Writing Complex Latex Documents with Scrivener 2.1 and MultiMarkDown 3
I have another post that discusses my approach to writing my thesis using Scrivener. It’s out of date now because I transitioned to MultiMarkdown 3 (MMD3).
The Latex support in MMD3 is much simpler than the previous version. Instead of complicated XSLT transforms from the HTML formatted Markdown output, the new approach is to transition from the Markdown directly to Latex. It makes customizing the output much simpler – no more editing XSLT files. In the following, I assume you have a Scrivener document that contains the body of your work (e.g., Introduction, Related Work, Observations, Conclusions). Here’s how to get started:
1. Scrivener still ships with MMD2, so you will need to install MMD3 on your Mac. Fortunately this is straightforward. Go to the download page and download MultiMarkdown-Mac-3.0.1.pkg.zip and MultiMarkdown-Support-Mac-3.0.1.pkg.zip (as of July 2011). The support files will seamlessly integrate MMD3 with Scrivener. I’m not sure how easy it is to revert, however, so be careful (update: pretty easy – see this note). I do think it is ultimately easier to work with MMD3.
2. Now we need to add custom metadata to Scrivener to add the ancillary files for MMD3. I’ve found the easiest approach to be adding a Meta-Data text document as the first document in your Scrivener project (right-click the top folder in the Binder, Add-> New Text). Now we will tell MMD3 where to find the extra files for our Latex output. Here’s what mine looks like:
Base Header Level: 2
Bibtex: IEEEabrv,../../bibtex/thesis-new
Latex footer: ut-thesis-end
Bibliostyle: plainnat-nourl
Title: My Big Thesis
Author: Neil Alexander Ernst
Latex input: ut-thesis-begin
Order matters here. See Fletcher’s guide on metadata in MMD3.
Base header level is telling MMD3 that we want to create Chapters for each first-level Scrivener folder (I think Base level 1 is “Part”). Bibtex is the location of the bibtex files, relative to where we will run the “latex” or “pdflatex” commands. Latex footer will be inserted at the end of the last piece of your Scrivener file using the Latex command \input{}. There is also Latex header, but as we will see that doesn’t work well. The next command, Bibliostyle, will define the bibliography style for use with Bibtex. Title and Author are obvious, and I finish with Latex input. This is the beginning Latex of my thesis document, including packages, newly defined commands, etc. Now, because MMD3 will turn the metadata entries into variables in Latex, it is important that the input come after the definition of the title and author (otherwise there is an error). This is also why I avoid the use of the Latex header metadata.
Now it is up to you to define what document class etc. to use for your document: MMD3, nicely, will just stick whatever is in input/footer/header into the appropriate place in the Latex file. There are some nice pre-defined input sections Fletcher Penney created, that you can download as well on the Github site.
Now in Scrivener, compile your document using the File->Compile.. and setting “Compile For…” at the bottom to “Multimarkdown->Latex”. Note that it is important to disable conversion of two hyphens to an en-dash, otherwise HTML comments don’t work, and you cannot escape the Latex properly.
Note: to escape Latex, surround the Latex (e.g., tables, math) with HTML comments (<!– –>). The most useful MMD features, for me, are lists, which are just numbers or bullets (see the Markdown syntax guide). Much simpler than the cumbersome \begin{itemize} syntax.
To use citations with MMD3, you can use the [#citename;] or [#citename] syntax for \citet{} or \citep{} Natbib commands, respectively. You can also do MMD footnotes with [^foot1] and [^foot1]:Footnote text. I haven’t used any other advanced features of Markdown. The number one wish I have is for easy \ref \label syntax, but I don’t know how to do it (edit: see the helpful post here). MMD3 automatically creates a \label{} after each section heading, based on the section name with no spaces.
I don’t mind writing raw Latex: emacs+Auctex+refTex makes it pretty painless. But I found, for my 60,000+ word document, that it was much easier to do revision and editing in Scrivener: moving sections around, for example. It also uses Mac native spell-check which is pretty nice (Emacs’s spelling I find clunky and slow).
My files for reference:
Forcing AucTex to properly show error messages
From the very-detailed-information dept:
I use Auctex with Emacs.app on my Mac to write papers. It’s the best editor I’ve come across for Latex support.
I was having trouble with one feature, however. If your source file contains an error (e.g., an unescaped underscore), the engine via Auctex will complain when you try to compile. In my case, I always use pdflatex to generate PDF output.
At that point you will get "Latex errors in `Document.tex output'. Use C-c ` to display". If you type C-c ` (TeX-next-error), the idea is that Emacs splits into two windows, with one window showing the error message, and the other showing the location of the problem in the source file. For some reason this wasn’t working on my machine, and I kept getting this strange blank window.
After some digging, I’ve determined the problem to be that pdflatex does not report errors using file-name, only line number. This means Auctex cannot determine the proper source file to open.
Following information on this mailing list post, the following steps should fix it (on a Mac):
- Edit the file
/usr/local/texlive/2010/texmf.cnf. This is where user-specific customizations to the general config file for TexLive live. - Add the line
file-line-error = tat the end. This forces pdflatex to output the file name for each error. - Update TexLive using
sudo tlmgr update -allwhich, in addition to updating all the packages in your installation, will recognize your new config option. Possibly there is a shorter way to do this, but who doesn’t want to update their packages? - Done!
Big Requirements Up Front?
Big Requirements Up Front are certainly not in favour with the industry thought leaders in software development. But I think the idea of specifying your requirements ahead of time, as Parnas says in “A Rational Design Process and How To Fake It” is worthwhile.
Consider the case of climate models. Or smaller open-source projects. Here the developers ‘scratch their own itch’ and the requirements are almost always implicit. What to develop next is driven by an internal process, where the Customer (in the XP sense) is always present by definition, since the developer and the Customer are always the same.
And yet these projects often run into trouble because the requirements are not explicitly modeled, where by ‘modeled’ I mean somehow written down and discussed, prioritized, etc. (ed.: Justify this claim!). Backwards compatibility with previous instances is not maintained, reliability is awful, effort is duplicated, and so on. I think only in projects that are being done the same as last time – cf. Aranda et al, 2007 — is there a hope of totally ignoring requirements explicitly.
Why did we start with the BRUF paradigm in the first place? I think it was because like everything in software development, (and computing itself for that matter) things started with the US Defence Department. And their needs are so different than those of a small software development company like 37 Signals that it’s a totally different animal. You have multiple competing vendors, huge safety considerations, multi-year and multi-billion dollar developments. It would be foolish not to do some up-front design. That’s not to say that iterative, frequent release models can’t work in the Army – I think they could. But a little up front planning could save a lot of time designing things you won’t need.
Because the army gives so many dollars in grants, the prevailing academic attitudes are concerned with defence sized issues. The problem is that these concerns are not shared by other sectors. And so small “a” agile started as a reaction to this top-down design.
I’m not defending the IEEE requirements template, or the amount of documentation for CMMI certification. But I very much believe that some form of explicit requirements modeling is important for projects. Look at Scrum — most people are doing user stories with story points. These are requirements models! Simple, but still explicit, prioritized, costed requirements. And people like Scott Ambler argue that to scale Agile, you need a little more – you need ways to organize the stories, to assign responsibility, to manage the stories.
The reality is that when systems get more complex, for most of us it is impossible to keep track of the scope and scale of the system (unless you are Linus Torvalds, perhaps). In that case, to communicate with the rest of the team you will need something to share — and requirements neatly capture what needs doing. I think the word ‘requirements’ itself has come to mean BRUF requirements, when it properly means the set of new properties of the environment your new machine will bring about (Jacksonian requirements (pdf)). Under this definition requirements becomes a much broader term.
One of the big problems is that we understand things in this industry using anecdote. And so many horrible anecdotes have come from “big requirements up front” projects. But in those cases, any methodology would likely fail if the organization lacked maturity. And conversely, any mature organization could use any methodology it chose and succeed. Those organizations know how to motivate people to get excellent work and know not to change scope midway, not to alter budgets, and so on. I think these factors are ultimately much more important than the particular development methodology one chooses.
Workshops I won’t get to … but would like to
Every major conference in CS typically has four or more associated workshops. A workshop is a half-day/full-day affair which (sadly) take on the feel of mini-conferences, with CFPs, program committees, and rejections. Moshe Vardi makes some good points in his essay about how workshops are failing us as a community.
However, they are a great place to chat with people in your area who are interested in the same things, and the coffee breaks, at least, are worth a dozen emails.
Here’s a few that I wish I could (have gone / go) to.
- Living with Inconsistency
- CHASE 2.0 – Cooperative Aspects of Software Engineering
- The Future Of Requirements Engineering For Self-Adaptive Systems
- WSRCC – Software Research and Climate Change
- Web2SE – Web 2.0 and Software Engineering
- SEAMS – Software Engineering for Adaptive and Self-Managing Systems
We should re-invent the workshop model so we can *all* attend the workshops we find interesting, no? Stay tuned for more thoughts on the matter.