Technical Debt in Research Software

Neil Ernst

2024-10-11

There Isn’t Any

thank you and goodbye

About Me

  • Associate professor in Computer Science, University of Victoria (about 3 hrs by passenger ferry from Seattle)
  • Four years consulting/research at Software Engineering Institute, CMU/Pittsburgh (an FFRDC)
  • Research: software design in the LLM era, software research for/on research software, analytical sovereignty
  • Experiences: working with FDA on modernization, with the Square Kilometer Array on long-term design, with Canadian Space Agency on remote sensing tools.

Outline

  1. What is TD?
  2. Knowledge Domains
  3. Self-Admitted TD in Research Code
  4. Perceptions of TD
  5. Takeaways

Definitions / Poll

Cunningham

Ward Cunningham’s definition1: it’s good essential to go into a little bit of debt if it supports learning and is repaid

TD as a motivator/support for an agile and iterative approach to software development (1992!)

McConnell (and mine)

Technical debt occurs when a design or construction approach is taken that’s expedient in the short term, but that creates a technical context that increases complexity and cost in the long term.

Steve McConnell (Code Complete)

External/Internal

The source of the TD matters.

  • Internal TD: we did something to (learn more/get the paper published/funder said so) and now we have to fix it
  • External TD: someone did something and now my project has to be updated in non-interesting ways (i.e., exclude “new scientific findings”).
  • Now your fluid dynamics experts have suddenly also become security programmers because an authentication package is critical!1

But …

  • If we make a responsible decision in 1995 (e.g., for gridding by lat/lon) is it really TD if that becomes unsuitable in 2024?
  • In some ways it doesn’t matter! Maybe as a process check - are we tracking these impactful choices and consequences.
  • Worst thing to do is make design choices and not know why they were made
  • Debt is leverage: every code base should have some (~10-20%)

Knowledge Domains

Research software development requires cross-domain knowledge.

Domain (real-world) and Science (theory) knowledge

  • What you get from undergrad / postgrad work.
  • What papers get published on (mostly).

Software Knowledge

  • How to bring the theory and domain understanding into Fortran/Julia/Numpy
  • Bridge the finite, quantized, bit-oriented software world and the continuous, stochastic natural world

Execution Knowledge

  • Make the software work in a hardware and team environment
  • Deployment: cluster and hardware specifics. Makefiles, config files, XML.
  • Development: Git, GitHub, pull requests, code review.
  • Often not well taught in school, not well supported by tools
  • My feeling: the most under-appreciated domain

Operational Knowledge

  • Take the outcomes of the first four, and do something
    • data analysis, policy recommendations, new science
  • Crossing knowledge domains is a good way to inadvertently incur technical debt!

Knowledge Domains

Research software development requires cross-domain knowledge.

Self-Admitted Technical Debt in Research Software

Definition

the intentional acknowledgment of Technical Debt by developers through comments, commit messages, or documentation within the code.

# TODO: fix this poor approximation

Method

  • Choose 9 representative projects as samples
  • Identify 28,680 SATD comments using keyword searches and manual classification, ensuring accurate categorization.
  • Manually categorized comments using card sorting techniques

Projects

Project Scientific Domain Contribs. Project Type Code Size (KLOC) Age (Yrs)
Astropy Astronomy 453 Python Library 1,309 13
Athena High Energy Physics 100+ Software 5,208 19
Biopython Molecular Biology 331 Python Library 620 25
CESM Climate Model 134 Software 2,800 41
GROMACS Molecular Dynamics 85 Software 2,102 27
Moose Physics 221 Framework 848 16
Elmer Mathematics 45 Software 954 10
Firedrake Mathematics 96 Software 63 11
Root High Energy Physics 387 Framework 5,080 24

Scientific SATD

Accumulation of suboptimal scientific practices, assumptions, and inaccuracies within scientific software that potentially compromise the validity, accuracy, and reliability of scientific results.

Indicators

  • Translation Challenges
  • Assumptions
  • New Scientific Findings
  • Missing Edge Cases
  • Computational Accuracy

Translation

We are going to share n_eff between the neutrinos equally. In detail, this is not correct, but it is a standard assumption because properly calculating it is (a) complicated (b) depends on the details of the massive neutrinos (e.g., their weak interactions, which could be unusual if one is considering sterile neutrinos).”

Astropy

Assumptions

We assume here that new ice arrives at the surface with the same temperature as the surface. TODO: Make sure this assumption is consistent with energy conservation..

CESM

New Findings

Update the the instability calculation function and modify the neutral drag cofficient. We should follow more elegant approach like Louis et al (1982) or Bretherton and Park (2009) but for now we use very crude approach : just 1 for ri < 0, 0 for ri > 1, and linear ramping.

CESM

Missing Edge Cases

TODO(wjs, 2015-10-18) With the ‘qflx_snow_grnd_col(c) > 0.0_r8’ check in the following conditional, Leo van Kampenhout has found that, under some rare conditions, the snow pack does not get initialized when it should. However, if this check is removed, then under some different rare conditions, snow depth can grow infinitely. We clearly need a more robust check here, and/or some fixed logic elsewhere. But for now we’re going with the lesser of the two bugs - i.e., allowing the snow pack to not be initialized when it should under some conditions.

CESM

Accuracy

Results: f’’(0) never really seems to converge to more than about 4 digits? Not sure how much this really affects the solution we get for f itself

MOOSE

Patterns

Takeaways

Indicator Addressed %
New Scientific Findings 54.92
Computational Accuracy 53.57
Missing Edge Cases 50.41
Translation Challenges 44.74
Assumptions 41.38

Takeaways

  • The rate at which Scientific Debt is addressed is significantly lower than the rate at which it is introduced.
  • The removal of SD often parallels its introduction, indicating that solutions are often implemented soon after SD is identified.

Interviews

Interview Themes/Findings

  • Method: recruit 12-15 scientific software developers from large RSE projects for a one-hour interview
  • cover astronomy, math, high energy physics, climate science
  • focus on perceptions of technical debt and project problems
  • briefly cover training and process
  • after transcription and correction, code the data using thematic analysis to derive themes

People and Process I

  • Scientists generally feel comfortable with their basic code skills
    • reading Fortran, using GitHub, writing tests
  • GitHub etc. have been massive productivity boosts
  • Gaps exist in making code better especially in HPC/Parallelization and in managing projects
  • LLMs are starting to help pluck some of these low-hanging fruit
    • But do not seem to help with domain complexity

People and Process II

  • It is pretty easy to train domain experts to engineer software (beyond coding) ONCE the effort has been made
    • These tend to be self-motivated, intelligent people!
  • The challenge for many groups is jumping the chasm between my project and community project.
  • Ongoing challenges in recognition: paper cites vs code, career changes, long-term budgets.
  • This area seems to be improving though (thanks to people like Reed!)

Managing TD

  • Have an active TD plan: how much is allowed, how is it tracked, what are the main sources, what is the interest
    • E.g., how slowly are features released? Do we have existential dread about modifying a given file?
    • Do you know about all the TD in your projects? Where know==explicitly tracked in a tool.
  • A lot of TD comes from paper/proposal deadline branches later merged into the main code base or influential PIs insisting on it.
  • So many interviews talk about documentation!

Artifacts and Obstacles I

  • Learning how to test - in software - the complex, non-linear, stochastic scientific processes
  • Who is affected? Is it just our team, and acceptable? Or maybe a user we don’t know who can’t get it to work.

Artifacts and Obstacles II

  • At least three codebase types:
    • the code of the Model (CESM), often in Fortran. (Theory/Domain/SW)
    • The code of the deployment of that model to a given cluster and architecture (Makefiles, scheduler scripts). (Exec)
    • The code of the analysis suite (post-processing, data cleaning, plotting) often in Python or R. (Operation)
    • These imply totally different TD management approaches!

Takeaways

  • TD is scariest when its interest is subtle, e.g., when the tests do not blow up the model but cause silent errors.
  • This stuff matters because we make decisions based on this science, increasingly computational!
  • But operating without Debt would also be bad; we wouldn’t be doing good engineering and trying new things!
  • Integrate diverse knowledge domains for developing more effective strategies for research s/w development

Useful Readings

Thanks

Thanks to my students Ahmed Awon, Vivienne Zeng, Swapnil Hingmire, and collaborators Shurui Zhou (Toronto), Rohith Pudari, and the SE4RSE slack channel

Neil Ernst, nernst@uvic.ca, @neilernst@mastodon.acm.org

Come find me at US-RSE next week at the Convention Center, or when I visit Sandia Friday Oct 18!

Always happy to chat and listen to interesting insights or things we may have missed!