iPhone? Am iMissing something?

2010 February 1

It’s not that it isn’t cool, or handy. I get that. I’m into technology, I like new shiny things. But I can’t see the value of the iPhone for someone who isn’t traveling a lot. First off, it’s a phone, capabilities which are easy to find elsewhere — other mobile phones or even a *gasp* landline. And I gather it isn’t even that good a phone.

But secondly, it has these ‘apps’ the value of which I confess to not understanding. Ok, I can check movie times, I can keep a list of todo items, I have access to my contacts … but I get all that on the web or laptop anyway. Why would I pay extra — what is it, like 40$ a month for data — just to have access to the web for the 20 minutes a day I don’t have a computer open? I accept that not everyone has a laptop in front of them as much as I do, of course.

Finally, at this point in my life I am trying to NOT be online as much. I get the feeling my main use for an iPhone or Blackberry would just be to kill time or surf the web. Not things I need to be doing more of (this piece bears witness to that!).

BTW my current phone is a Motorola V66 (c. 2001). But it’s a little hard to type SMS messages on a phone that clearly never anticipated that being a the major use for cell phones.


Open science and workflows

2010 February 1

I was talking to Jon Pipitone about scientific computing. For a long time this field was mired in the relatively obscure (yet vitally important) field of numerical analysis. Now,  however, with the relative interest generated by `ClimateGate’ and open-source software, interest in scientific computing — by which is typically meant computing for scientific disciplines, such as biology, chemistry, physics, and in particular, the software supporting that computing — has grown, particularly with respect to the repeatability of these experiments.  An excellent resource to read for an introduction is the Microsoft research report on “4th Paradigm science”.

Spurred on by a post by programmers who have converted relatively opaque C/Fortran code to Python, I wondered what other such projects might be around. The goal being to make the procedures followed more open and understandable by laypeople (as much as that might be possible — just because we know what rain is doesn’t mean we are all climatologists).

I asked him what might be worth trying to convert:

A … particularly nasty, but possible idea would be to convert a single fortran module from an existing climate model over into python, and then use some fancy python-fortran bridge to make they two talk to each other.  That way you could slowly convert a model over to python. You’d be forced into, at least partially, keeping the original model architecture.  That wouldn’t be ideal, but at least you’d know you were being true to the model (because you could compare output).

Sounds nasty to me.  If you were considering rewriting a chunk of a model, I’d suggest starting with NASA’s ModelE (or a newer version). It’s the simplest and littlest, big GCM I’ve seen.

But then I realized that moving code from C/Fortran to Python gains you a little bit of readability, a lot of maintainability, sacrifices speed, and leaves you, ultimately, back at the same point you started: computer code (procedural at that).

There’s a parallel to ‘literate programming‘. What we would really like to do is write these tools in a language that is platform independent and language independent.

Here’s how I see the transition:
Science workflow1.  Cognitive understanding —> 2. Language of science (mathematics, with bio/phys/chem extensions) —> 3. language of platform (R, mathematica, custom code) —> 4. bytecode ==> 5. computer processing –> 6. output representation

We would like to get rid of having to do the second translation, right? So that you can just write in the language of mathematics and have the output (prediction, in the form of graph, chart, numbers) be correct. So I guess there should be two sides to this workflow: one from the natural language to the bytecode, and the other from the bytecode back out to natural representation.

The assumption I’m making is that the further away from bytecode you get the more people have a chance of understanding your work.

Some of this discussion is (uncomfortably) similar to model-driven approaches, of course. The challenge there, for me, has always been that you cannot represent *all* the problem in the model – so you end up with a bunch of custom code anyway. Jon again:

Yup.  And the climate scientists will tell you that all the time.  There are all sorts of optimisations and workarounds that have to be specified in the code.  Not to mention the fact that the way you decide to discretize the mathematics in the papers and which algorithms you choose as implementations are also dependent on the rest of the model/compiler/platform, etc..  So it’s not that we’re trying to replace the second step, but just make it clear what’s happening along the way.

Thoughts on open notebooks for software scientists

2010 January 28

Open notebook science

People are increasingly interested in open notebooks, lab notes from scientists which everyone can view. There seems to be a separation in features (wanted or offered, not sure which).

On one hand, people seem to get by with what are essentially text repositories like simple wikis (TiddlyWiki, Mediawiki), note taking apps (from Stickies to Evernote), or blogs (WordPress). These are great: after all, most of our lab notes are just text. I’m not someone who works a lot at all with hardware or wetware, so none of my work needs it, but I’m curious how the translation is made from the physical to the digital. Do people take photos? Sketches? How do these translate into the digital domain? Is a simple wiki or blog enough?

The other side of things is more of a ‘life-stream’, as demonstrated by Cameron Neylon. Here, every research activity is streamed onto the Web: papers read, code committed, presentations given, etc. The primary technology here is RSS/Atom feeds, FriendFeed, Twitter, etc. For example, my readings are available via Mendeley, my source code at GitHub, all synchronized via FriendFeed. Maybe more important, these activities all have a (more or less) permanent resource locator, enabling people to re-assemble my work or link to pieces of it.

The difference is between tracking specific experimental activities (move sample A to Erlenmeyer flask B) and research activities in the large. For subscribers with non-specific interests (what does Cameron do for his day job, anyway?) I prefer the latter approach: I can see the large-scale activities without being bothered with the minutiae. But if I’m a close colleague (or competitor), then the small-scale notes are more useful; now I want information that will help me to replicate the study. The audience for the small-scale is probably less than 10, and at least 1 (yourself, I’m assuming).

Neil’s Open Notebook

What would an open notebook look like for my work? I’ll look at my objectives in using one, my ’sciency’ activities, what I do now, and possible obstacles.

Objective
To place my ”research activities” online, to track my methodologies, to enable others to see/copy what I do, to allow me to have a record of what I have done.

What I do (and how to open them):

  • Brainstorm
    • Twitter questions; whiteboard meetings with digital photos; email logs posted;
  • Logics and proofs
    • Publish figures; publish theorems and proofs; write papers. Add inline references to previous work or definitions (e.g., similar proofs of non-monotonicity).
  • Reading journals and conference papers
    • Publish commentaries on blogs; RSS feeds of papers added to library;
  • Data collection and manipulation
    • Blog post on procedures; source code posted; archive data online (e.g. PromiseData); save workflow in repeatable form
  • Paper writing
    • Write paper on GitHub with Latex source available; write introductory blog post; post completed drafts; post figures
  • Figures and presentations
  • Post on SlideShare; Post on Flickr or Picasa;
  • Coding
    • GitHub projects; tar’red files on personal web page; Python virtualenv to host ‘contextualized’ lab environments

Current procedures

  • Tiddlywiki notes for early-stage ideas or shots in the dark (my safe place).
  • Latex documents for creating publications. Includes plenty of formulas.
  • Github for some projects. Post source code and (depending on size) data.
  • Pdfs on my website of past projects. Defeat the paywalls, increase citability.
  • Blog posts (e.g. msr tag). Currently this is more retrospective, sharing in the hopes it saves someone else time.
  • Posts on Daytum – track personal activities and productivity, e.g. amount of time spent on projects.
  • RSS feed of Mendeley activity – lists papers I’ve read.

Possible obstacles

  • collaborators won’t want everything open and accessible
  • IP theft
  • forget to update feeds
  • diversion from ‘publishable’ activities (an open notebook won’t get you tenure, and may get you fired).

Other software scientists using open notebooks (if you have more, please let me know)


Publishing in Computer Science

2010 January 27

Clearly I’m inexperienced, little published, and have never served on program committees. However, I find the issues raised by Moshe Vardie in the SIGMOD journal very cogent. He questions the model of publishing currently in use throughout computer science research. Lance Fortnow goes on to raise the same issues in Communications of the ACM, and there is some good discussion on his blog (emphasis on some – plenty of sidetracking going on there).

The typical argument is that since this is a fast-changing field (a conjecture hitherto unproven by evidence — is it faster-changing than genetics? Than theoretical physics? Than sociology?), we need to publish at annual conferences, because journal turnaround times are much too slow to keep up. But if it takes 2 years to publish in a journal, I’d argue that’s a separate problem.

Vardie’s central point is that conferences aren’t doing a good job in evaluating important research. At best, he says, the reviews serve to correctly identify less than ¼ of the interesting work. The rest is of dubious quality (relative to that rejected). Furthermore, most reviews are hardly peer-review quality, and there is little opportunity to turn things around if a paper has flaws (I expect the model used is, What a lot of submissions – if it has a single flaw, reject it). One person I read claimed that his conference committee had each reviewed 22 papers. If this was a normal conference, that meant that in 2 months or so these people had gone over 22 papers, which is nearly 2-3 papers a week. Either these papers are not complex, or the reviewers are not spending any reasonable amount of effort on it. And that hurts the scientific quality of the field, it allows cheaters and repeat papers, and wastes everyone’s time.

I think the other argument against conferences, though, is more straightforward — too much travel. In my research area, there are 3 or 4 top conferences, each of which has fewer than 300 attendees, many of whom go to the same 3 or 4 conferences. Why not gather the 500 or so unique attendees in one large venue, like the AGU meetings? Travel is expensive, requires inordinate organization efforts, and is bad for the environment and work/life balance.


Understanding climate change with anecdote

2010 January 19

Most skeptics’ views of climate change – anthropogenic global warming – seem to come down to the following template:

“The local temperature around <my city> is much cooler than <my faulty memory>  remembers. Therefore global warming is a crock.”

Hence I have created a simple Python script which can auto-generate this recollection. It uses the monthly average temperature for my city, subtracts the current local temperature, and then, if the difference is positive, the Earth is warming. If the difference is negative, of course, hundreds of well-educated climate scientists are morons, incompetent, members of the Priory of Sion, or out to get you. Helpfully you can run this at particular times during the day to get the effect you wish.

This has the distinct advantage of not relying on spotty memory or aching joints to provide historical weather information. And thus is probably more sciency than most of the writing by Lawrence Solomon.