What is unit testing?

Last updated on 2026-06-29 | Edit this page

Overview

Questions

How do I know my code is correct?
What is a unit test and how does it relate to how I already think about correctness?

Objectives

Explain what a unit test is in terms of inputs, expected outputs, and assertions
Connect the practice of unit testing to the scientific method and scientific integrity
Distinguish between verification and validation in the context of scientific software
Identify functions in their own code that would be candidates for unit testing
Distinguish between unit testing and other forms of verification such as running on known data
Connect the practice of testing to the scientific method.
Explain what a unit test is in terms of inputs and expected results.
Distinguish between a unit test and other forms of testing.

Why test?

Computers only do exactly what we tell them to, and we are fallible, so we should assume that mistakes, i.e. bugs, will happen. Scientists often fall into the trap of “I am smart, so my code must be correct.”, but software is no different to any other piece of experimental apparatus. We wouldn’t plug a new voltage sensor into a dark matter experiment and take data without testing it first (e.g. calibrating it), and we should follow the same procedure to verify and validate the behaviour of software used to produce scientific outputs:

Verification: Confirm that the code implements a specific numerical technique or theoretical model correctly.
Validation: Confirm that the code is “calibrated” against known existing data or independent calculations.

This is no more or less than applying the scientific method to software development, which aside from increasing confidence in our published results, also increases the time we can spend on research rather than tracking down bugs.

Caution

Still don’t think it’s important? There have been, and continue to be, retractions of papers due to unwitting software bugs, several of which are listed at danielskatz/errors-due-to-research-software.

That said testing is only a mitigation strategy, as we’ll see.

Types of testing

As we build up an experiment from components, so we do with software in the form of units: functions, classes, and data structures. For example, we might want a function to calculate the invariant mass of a particle. Within the ccptepp-test/ directory, open up a new file in the test/ subdirectory called test_invariant_mass.cpp. Let’s sketch out the function as follows:

CPP

//! \file test_invariant_mass.cpp
#include <cmath>
#include <stdexcept>

double invariant_mass(double energy, double momentum)
{
   if (energy < 0)
   {
      throw std::domain_error("unphysical negative energy");
   }
   double mass_squared = energy*energy - momentum*momentum;
   if (mass_squared < 0)
   {
      throw std::domain_error("unphysical mass^2");  
   }
   return std::sqrt(mass_squared);
}

int main()
{
   double photon_mass = invariant_mass(100,100);
}

At the lowest level in code is defensive programming, which adds exceptions or checks into the function itself to report or handle errors, in this case unphysical inputs like energy < 0. We can try compiling this and running it from the ccptepp-test/ directory as follows:

Linux

BASH

g++ -std=c++17 test/test_invariant_mass.cpp -o test_invariant_mass
./test_invariant_mass

MacOS

BASH

clang++ -std=c++17 test/test_invariant_mass.cpp -o test_invariant_mass
./test_invariant_mass

As written, test_invariant_mass is an example of a smoke test, as is “run and see if it breaks”. As with defensive programming though, this only checks exceptional circumstances, not normal operation. This is where unit testing comes in, where we verify that we get the return value(s) we expect for given input(s) to a specific unit, here the invariant_mass function, across its domain of applicability.

Effectively unit testing is nothing more than the application of the scientific method to our code:

Step	Scientific Method	Software Unit Testing
Hypothesis	I predict this physical system behaves like X.	I expect this function to return `Y` when given `X`.
Experiment	Run a controlled trial isolating variables.	Run an isolated test passing known values of `X` into the function.
Analysis	Check if the data matches the prediction.	Verify that the returned `Y` values matches the expected results.
Iteration	Update the theory based on anomalies.	Fix any code bugs exposed by failing verification.
Reproducibility	Independent experiments should reach the same conclusion.	Re-running the tests should always give the same result

Of course, software consists of many units, so unit testing is only part of the story:

Scientific Stage	Experimental Stage	Software Stage	Testing Type
Isolating Variables	Calibrating a single voltage sensor.	Testing a single C++ function with hardcoded inputs.	Unit Testing
Assembling Apparatus	Hooking voltage, light sensors to the electronics crates and ensuring data flows across the system.	Verifying that the I/O, kinematics, and histogramming code work together.	Integration Testing
Reproducibility	Modifying a detector component and verifying it still reproduces last week’s baseline calibration.	Modifying code for speed or features, then ensuring it still yields identical results in a known test.	Regression Testing

Challenge

Layering Tests for Defense

Imagine you are working in a major physics collaboration. Below are three critical software failures that could happen during running:

A: A minus sign is changed to a plus sign in a coordinate transformation function. Now, whenever a particle travels in the negative z-direction, its position is completely wrong.
B: A multi-threaded track fitting loop is optimized to run 4x faster. The code runs smoothly, but it subtly truncates the decimal precision of track parameters, shifting your final calculated Higgs mass peak by 2 GeV compared to last year’s published configuration.
C: Your kinematic calculator works perfectly in isolation, as does your file-reader isolation. However, when combined, the file-reader outputs data in single-precision floats, but the calculator expects double-precision arrays, causing a segmentation fault.

Question: Match each failure to the specific testing layer (Unit, Integration, or Regression) that acts as the primary shield to catch it before you submit a paper.

Show me the solution

Layers provide defense in depth:

A is caught by a Unit Test: You can catch this immediately by passing a single, known four-vector with a negative z-value into that specific function and checking the output against a hand-calculation.
B is caught by a Regression Test: This requires running a benchmark “control dataset” through the new optimized code and asserting that the output physics distributions perfectly overlap with your historical baseline data.
C is caught by an Integration Test: Neither individual unit test will catch this because both pieces work fine alone. You only see the type mismatch when running the components together.

Integration tests are often the “gold standard” especially for scientific codes as these validate against actual experimental data and thus much focus is put on them. However, their higher level nature means that if a problem is identified, triaging where in the code caused this, and more importantly why, may involve a lot of work if there are no unit or regression tests to provide a guide.

Callout

Unit and integration tests are themselves regression tests when they are written. As bugs, inevitably, are identified, new unit or integration tests are written to:

Reproduce the bug and cause a failure.
Provide tests that developers can use to help identify the bug and fix it.
These tests are retained in the codebase to avoid the regression again in the future.

This course will focus on unit testing as a foundation for testing scientific software, but don’t forget about the other types!

Key Points

Testing is formalising what you already do informally when you verify your code
Verification confirms that code implements a model correctly; validation confirms the model describes nature correctly — unit testing addresses verification
Unit tests check that a single function produces the expected output for a given input
Integration tests make sure that code units work together properly.
Regression tests ensure that everything works the same today as it did yesterday.