Semantic Werks

Thoughts on people, machines and systems.

Posts Tagged ‘open notebook

More on open notebooks

leave a comment »

I recently posted about what an open notebook in software science might look like. I think I confused life stream (where life == work :) with notebook. From what I’ve seen looking at projects like OpenWetWare, they seem more like Trac or Github then a friendfeed account. You get a wiki to write on, image handling, etc., but it isn’t automated: you have to enter all the data yourself.

This is incredibly useful, but am I right in thinking it is similar to tools software engineers have known for decades? It seems like the innovations are in collaborative editing, version control, and digital data.

What I was imagining was more automatic: whenever your microarray machine ran an experiment, it would auto-enter the results on your open notebook. Similarly for code you might run for statistical analysis (like the R workspace question I raised earlier).

I like the idea of ‘recording’ what you HAVE done (not what you will do, which is more brainstroming, mind-mapping, whiteboarding etc.). It is a very important part of selfish science, which is to say, self-replication (presumably the sine qua non of scientific reproducibility). Here are a few features I think are useful for personal lab notes:

  • A wiki with dates.
  • Separate entries.
  • Graphviz-Dot conversion.
  • Semantic markup.
  • Inline photos.
  • Inline LateX

I’m not saying these notebooks have no value: clearly they do. But I think there is a lot more that could be done with the concept. Particularly using linked data (oh noes! the semantic web!) to import other researchers’ results.

What we really want is a list of steps – some small ‘unit of science’ that can be repeated. We should show this using process models, so we can model loops, branches, and possibly execute them, recompose them. Google Wave is touted as the best thing for this, and I think it’s true. SAP has a version of its business process editor in Wave, and Google itself sees a need for it. Its collaboration feature is useful, but I don’t think it is the real advantage – yet. Right now, Wave’s support for version control (well, history) and its ability to incorporate agents/bots and arbitrary Javascript extensions is more useful. For example, someone has written ‘Watexy’, a Wave bot which can interpret Latex equations.

It’s truly an exciting time to be working in science.

Written by Neil

2010 February 13 at 19:31

Posted in Uncategorized

Tagged with , , , ,

Thoughts on open notebooks for software scientists

with one comment

Open notebook science

People are increasingly interested in open notebooks, lab notes from scientists which everyone can view. There seems to be a separation in features (wanted or offered, not sure which).

On one hand, people seem to get by with what are essentially text repositories like simple wikis (TiddlyWiki, Mediawiki), note taking apps (from Stickies to Evernote), or blogs (WordPress). These are great: after all, most of our lab notes are just text. I’m not someone who works a lot at all with hardware or wetware, so none of my work needs it, but I’m curious how the translation is made from the physical to the digital. Do people take photos? Sketches? How do these translate into the digital domain? Is a simple wiki or blog enough?

The other side of things is more of a ‘life-stream’, as demonstrated by Cameron Neylon. Here, every research activity is streamed onto the Web: papers read, code committed, presentations given, etc. The primary technology here is RSS/Atom feeds, FriendFeed, Twitter, etc. For example, my readings are available via Mendeley, my source code at GitHub, all synchronized via FriendFeed. Maybe more important, these activities all have a (more or less) permanent resource locator, enabling people to re-assemble my work or link to pieces of it.

The difference is between tracking specific experimental activities (move sample A to Erlenmeyer flask B) and research activities in the large. For subscribers with non-specific interests (what does Cameron do for his day job, anyway?) I prefer the latter approach: I can see the large-scale activities without being bothered with the minutiae. But if I’m a close colleague (or competitor), then the small-scale notes are more useful; now I want information that will help me to replicate the study. The audience for the small-scale is probably less than 10, and at least 1 (yourself, I’m assuming).

Neil’s Open Notebook

What would an open notebook look like for my work? I’ll look at my objectives in using one, my ‘sciency’ activities, what I do now, and possible obstacles.

Objective
To place my ”research activities” online, to track my methodologies, to enable others to see/copy what I do, to allow me to have a record of what I have done.

What I do (and how to open them):

  • Brainstorm
    • Twitter questions; whiteboard meetings with digital photos; email logs posted;
  • Logics and proofs
    • Publish figures; publish theorems and proofs; write papers. Add inline references to previous work or definitions (e.g., similar proofs of non-monotonicity).
  • Reading journals and conference papers
    • Publish commentaries on blogs; RSS feeds of papers added to library;
  • Data collection and manipulation
    • Blog post on procedures; source code posted; archive data online (e.g. PromiseData); save workflow in repeatable form
  • Paper writing
    • Write paper on GitHub with Latex source available; write introductory blog post; post completed drafts; post figures
  • Figures and presentations
  • Post on SlideShare; Post on Flickr or Picasa;
  • Coding
    • GitHub projects; tar’red files on personal web page; Python virtualenv to host ‘contextualized’ lab environments

Current procedures

  • Tiddlywiki notes for early-stage ideas or shots in the dark (my safe place).
  • Latex documents for creating publications. Includes plenty of formulas.
  • Github for some projects. Post source code and (depending on size) data.
  • Pdfs on my website of past projects. Defeat the paywalls, increase citability.
  • Blog posts (e.g. msr tag). Currently this is more retrospective, sharing in the hopes it saves someone else time.
  • Posts on Daytum – track personal activities and productivity, e.g. amount of time spent on projects.
  • RSS feed of Mendeley activity – lists papers I’ve read.

Possible obstacles

  • collaborators won’t want everything open and accessible
  • IP theft
  • forget to update feeds
  • diversion from ‘publishable’ activities (an open notebook won’t get you tenure, and may get you fired).

Other software scientists using open notebooks (if you have more, please let me know)

Written by Neil

2010 January 28 at 12:25

Posted in Uncategorized

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 198 other followers