Technical Debt in Research Software

Neil Ernst

2024-10-11

There Isn’t Any

thank you and goodbye

About Me

Associate professor in Computer Science, University of Victoria (about 3 hrs by passenger ferry from Seattle)
Four years consulting/research at Software Engineering Institute, CMU/Pittsburgh (an FFRDC)
Research: software design in the LLM era, software research for/on research software, analytical sovereignty
Experiences: working with FDA on modernization, with the Square Kilometer Array on long-term design, with Canadian Space Agency on remote sensing tools.

Outline

What is TD?
Knowledge Domains
Self-Admitted TD in Research Code
Perceptions of TD
Takeaways

Definitions / Poll

Cunningham

Ward Cunningham’s definition¹: it’s ~~good~~ essential to go into a little bit of debt if it supports learning and is repaid

TD as a motivator/support for an agile and iterative approach to software development (1992!)

McConnell (and mine)

Technical debt occurs when a design or construction approach is taken that’s expedient in the short term, but that creates a technical context that increases complexity and cost in the long term.

Steve McConnell (Code Complete)

External/Internal

The source of the TD matters.

Internal TD: we did something to (learn more/get the paper published/funder said so) and now we have to fix it
External TD: someone did something and now my project has to be updated in non-interesting ways (i.e., exclude “new scientific findings”).
Now your fluid dynamics experts have suddenly also become security programmers because an authentication package is critical!¹

But …

If we make a responsible decision in 1995 (e.g., for gridding by lat/lon) is it really TD if that becomes unsuitable in 2024?
In some ways it doesn’t matter! Maybe as a process check - are we tracking these impactful choices and consequences.
Worst thing to do is make design choices and not know why they were made
Debt is leverage: every code base should have some (~10-20%)

Knowledge Domains

Research software development requires cross-domain knowledge.

Domain (real-world) and Science (theory) knowledge

What you get from undergrad / postgrad work.
What papers get published on (mostly).

Software Knowledge

How to bring the theory and domain understanding into Fortran/Julia/Numpy
Bridge the finite, quantized, bit-oriented software world and the continuous, stochastic natural world

Execution Knowledge

Make the software work in a hardware and team environment
Deployment: cluster and hardware specifics. Makefiles, config files, XML.
Development: Git, GitHub, pull requests, code review.
Often not well taught in school, not well supported by tools
My feeling: the most under-appreciated domain

Operational Knowledge

Take the outcomes of the first four, and do something
- data analysis, policy recommendations, new science
Crossing knowledge domains is a good way to inadvertently incur technical debt!

Knowledge Domains

Research software development requires cross-domain knowledge.

Self-Admitted Technical Debt in Research Software

Definition

the intentional acknowledgment of Technical Debt by developers through comments, commit messages, or documentation within the code.

# TODO: fix this poor approximation

Method

Choose 9 representative projects as samples
Identify 28,680 SATD comments using keyword searches and manual classification, ensuring accurate categorization.
Manually categorized comments using card sorting techniques

Projects

Project	Scientific Domain	Contribs.	Project Type	Code Size (KLOC)	Age (Yrs)
Astropy	Astronomy	453	Python Library	1,309	13
Athena	High Energy Physics	100+	Software	5,208	19
Biopython	Molecular Biology	331	Python Library	620	25
CESM	Climate Model	134	Software	2,800	41
GROMACS	Molecular Dynamics	85	Software	2,102	27
Moose	Physics	221	Framework	848	16
Elmer	Mathematics	45	Software	954	10
Firedrake	Mathematics	96	Software	63	11
Root	High Energy Physics	387	Framework	5,080	24

Scientific SATD

Accumulation of suboptimal scientific practices, assumptions, and inaccuracies within scientific software that potentially compromise the validity, accuracy, and reliability of scientific results.

Indicators

Translation Challenges
Assumptions
New Scientific Findings
Missing Edge Cases
Computational Accuracy

Translation

We are going to share n_eff between the neutrinos equally. In detail, this is not correct, but it is a standard assumption because properly calculating it is (a) complicated (b) depends on the details of the massive neutrinos (e.g., their weak interactions, which could be unusual if one is considering sterile neutrinos).”

Astropy

Assumptions

We assume here that new ice arrives at the surface with the same temperature as the surface. TODO: Make sure this assumption is consistent with energy conservation..

CESM

New Findings

Update the the instability calculation function and modify the neutral drag cofficient. We should follow more elegant approach like Louis et al (1982) or Bretherton and Park (2009) but for now we use very crude approach : just 1 for ri < 0, 0 for ri > 1, and linear ramping.

CESM

Missing Edge Cases

TODO(wjs, 2015-10-18) With the ‘qflx_snow_grnd_col(c) > 0.0_r8’ check in the following conditional, Leo van Kampenhout has found that, under some rare conditions, the snow pack does not get initialized when it should. However, if this check is removed, then under some different rare conditions, snow depth can grow infinitely. We clearly need a more robust check here, and/or some fixed logic elsewhere. But for now we’re going with the lesser of the two bugs - i.e., allowing the snow pack to not be initialized when it should under some conditions.

CESM

Accuracy

Results: f’’(0) never really seems to converge to more than about 4 digits? Not sure how much this really affects the solution we get for f itself

MOOSE

Patterns

Takeaways

Indicator	Addressed %
New Scientific Findings	54.92
Computational Accuracy	53.57
Missing Edge Cases	50.41
Translation Challenges	44.74
Assumptions	41.38

Takeaways

The rate at which Scientific Debt is addressed is significantly lower than the rate at which it is introduced.
The removal of SD often parallels its introduction, indicating that solutions are often implemented soon after SD is identified.

Interviews

Interview Themes/Findings

Method: recruit 12-15 scientific software developers from large RSE projects for a one-hour interview
cover astronomy, math, high energy physics, climate science
focus on perceptions of technical debt and project problems
briefly cover training and process
after transcription and correction, code the data using thematic analysis to derive themes

People and Process I

Scientists generally feel comfortable with their basic code skills
- reading Fortran, using GitHub, writing tests
GitHub etc. have been massive productivity boosts
Gaps exist in making code better especially in HPC/Parallelization and in managing projects
LLMs are starting to help pluck some of these low-hanging fruit
- But do not seem to help with domain complexity

People and Process II

It is pretty easy to train domain experts to engineer software (beyond coding) ONCE the effort has been made
- These tend to be self-motivated, intelligent people!
The challenge for many groups is jumping the chasm between my project and community project.
Ongoing challenges in recognition: paper cites vs code, career changes, long-term budgets.
This area seems to be improving though (thanks to people like Reed!)

Managing TD

Have an active TD plan: how much is allowed, how is it tracked, what are the main sources, what is the interest
- E.g., how slowly are features released? Do we have existential dread about modifying a given file?
- Do you know about all the TD in your projects? Where know==explicitly tracked in a tool.
A lot of TD comes from paper/proposal deadline branches later merged into the main code base or influential PIs insisting on it.
So many interviews talk about documentation!

Artifacts and Obstacles I

Learning how to test - in software - the complex, non-linear, stochastic scientific processes
Who is affected? Is it just our team, and acceptable? Or maybe a user we don’t know who can’t get it to work.

Artifacts and Obstacles II

At least three codebase types:
- the code of the Model (CESM), often in Fortran. (Theory/Domain/SW)
- The code of the deployment of that model to a given cluster and architecture (Makefiles, scheduler scripts). (Exec)
- The code of the analysis suite (post-processing, data cleaning, plotting) often in Python or R. (Operation)
- These imply totally different TD management approaches!

Takeaways

TD is scariest when its interest is subtle, e.g., when the tests do not blow up the model but cause silent errors.
This stuff matters because we make decisions based on this science, increasingly computational!
But operating without Debt would also be bad; we wouldn’t be doing good engineering and trying new things!
Integrate diverse knowledge domains for developing more effective strategies for research s/w development

Useful Readings

Bryan Lawrence, Crossing the Chasm
Konrad Hinsen, Technical Debt in Computational Science
Ernst, Delange, Kazman, Technical Debt in Practice
Diane Kelly, Scientific software development viewed as knowledge acquisition

Thanks

Thanks to my students Ahmed Awon, Vivienne Zeng, Swapnil Hingmire, and collaborators Shurui Zhou (Toronto), Rohith Pudari, and the SE4RSE slack channel

Neil Ernst, nernst@uvic.ca, @neilernst@mastodon.acm.org

Come find me at US-RSE next week at the Convention Center, or when I visit Sandia Friday Oct 18!

Always happy to chat and listen to interesting insights or things we may have missed!