The Scientific Method 2021 edition


June 3, 2021

The typical Science workflow is something like

A - Related work: build a Prior about a real world problem, like pulsar formation

B - Fieldwork: Collect data about that problem in the ‘field’

C - Model: Build a model of the problem

D - Posterior: Generate a posterior/data output

E - Analyze: draw inferences about A, updating the prior if necessary.

We can have several types of problems - “debt” - in this process.

  1. In A, we could have a misinformed prior. We might not have seen the latest result, we might have a bad mental model of the real world.
  2. In B, we could take shortcuts in data collection, e.g. a “small telescope”

Note that in A/B, we are in traditional science. This is the stuff you would learn in Astro 200 - how planets form, how to take readings, etc.

  1. In C, we traditionally did nothing very special for our ‘model’ to create a posterior. OLS regression is the most likely approach, and we could use tools like Excel or some deeply trivial subset of SPSS. The model might even be implicit in the machine itself, such as a spectrograph. Now, of course, our problems are more noisy, the datasets from B larger, and the ‘equipment’ - the math - more sophisticated. So our models are expressed as Fortran code, as Jupyter code running MCMC, or as deep neural models. Here we enter the world of software and data engineering. We might need code review, as opposed to just peer review.
  2. In D, we have the problems - and depreciation - of moving data around at volume. This includes network infrastructure, GPU provisioning, HPC problems. There are obviously large set of questions here that I won’t tackle, but you all probably know a lot about!
  3. In E, our challenges are again scientific: draw supported inference, do classification, prediction, create better explanations and build useful theories. Debt in this context is again about sloppy science: P-Hacking, HARKing, and so on. Personally I think open science policies will go a long way to removing those problems.

I want to focus on C, as an issue of scientific technical debt. I think a lot of challenges have to do with understanding the tradeoff between the scientific challenges in A/B/E, and the engineering focused challenges in C and D. Ultilatemly to do good science - to design vaccines, to predict climate adaptation approaches, to better understand pulsars - we need to have all of these phases working efficiently. Since I know about C that’s where I’ll focus.

Making Models Accessible

There are two dimensions (or more) to this question. The first is about using models and building them; the second is about querying them.

Model Usability

We now see tons of tools that help with modelling. Here are some illustrative examples

  • Fortran, in Global Climate models
  • Python, in Ralph Evin’s building sims
  • Astropy
  • Commercial: Tableau, RStudio, Metabase, are moving from building dashboards and visualizations into more complex modelling support. But like dashboards, the challenge is less about using bars vs lines, and more about the A/B and E parts: what problems do I need to learn about.

One challenge in the move to more complex models is of course the inflection points: the places where the models fail to capture the real world. It is trivial to build a predator-prey model; to build one that accurately captures the dynamics of wolf/elk dynamics in the Mackenzie valley is entirely different, and probably the validity of the model is only understandable by maybe 100 people in the world. Increasingly the challenge in peer review is not about (or not just about) the problem’s relevance or significance, but whether the data and model support the claim. Technical debt in science is a massive concern if we are going to rely on large, difficult to verify datasets for our claims.

How easy it is to build the model:

  • Show examples of RStan, Fortran, Hash as dealing with the problem of “building the model”

Accessing Model Outputs: Digital Twins

Bryan Lawrence blog post

The idea that in SKA you might not get raw data

Finding Bugs in Notebooks

Consider our study of how people find notebook problems. Given a notebook on a topic the subject understood well, we asked them to find any problems (0-4) in the notebook. This is essentially peer review of the notebook, except the authors can’t reply or help, like in code review. With “many eyeballs all bugs are shallow” is a flawed truism that seems to hold for science; if we can get lots of people looking we can find some of the flaws.

Hands up if you’ve had to rewrite analysis or model code because of a flawed assumption, misunderstood stats package, or unreliable initialization?