Organizing code to enable unit testing
Last updated on 2026-06-30 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- How should we structure C++ code to assist unit testing?
- What makes a function easy or hard to test?
Objectives
- Split a single-file C++ program into a header, an implementation file, and a separate test file
- Explain why separating test code from production code matters
- Identify properties of a function that make it easy to test: clear inputs, clear outputs, no hidden dependencies
- Identify at least three structural problems in a given function that make it difficult to test
- Propose a refactoring of a function with testability problems into smaller, testable units
- Explain why the question “how would I ensure this refactoring does not change behaviour?” motivates writing tests before refactoring
C++ Package Organization
At present, we have both the unit of code we want to test and the
test code in a single file. Practically, the invariant_mass
function is more likely to be part of a larger C++
project/package that compiles a large set of functions and
classes into an end-user program or a library of reusable,
pre-compiled code.
In terms of testing, this means that we want to separate the program/library interface and implementation code from that which tests it. Unlike some languages, the ISO C++ Standard does not enforce or require a specific directory layout of package implementation and testing code, leaving this up to the package maintainers. For this lesson, we will organise our code into the following directories:
+- ccptepp-test/
+- src/
... headers declaring interfaces and implementation files defining them ...
+- test/
... unit tests for the interfaces declared in src/ ...
Splitting test_invariant_mass into a header,
implementation, and test program
Let’s start by splitting the invariant_mass function out
from the test program. Open a new header file
invariant_mass.hpp in src/ and move the
function from test/test_invariant_mass.cpp into it:
CPP
//! \file invariant_mass.hpp
#pragma once // header guard
#include <cmath>
#include <stdexcept>
// 1. Return invariant mass $m = sqrt(E^2 - p^2) in natural units
// 2. throws std::domain_error if E < 0
// 3. throws std::domain_error if E^2 - p^2 < 0
double invariant_mass(double energy, double momentum)
{
if (energy < 0)
{
throw std::domain_error("unphysical negative energy");
}
double mass_squared = energy*energy - momentum*momentum;
if (mass_squared < 0)
{
throw std::domain_error("unphysical mass^2");
}
return std::sqrt(mass_squared);
}
We can now modify test_invariant_mass.cpp to simply
include this header to provide the function interface.
CPP
//! \file test_invariant_mass.cpp
#include "invariant_mass.hpp" // Include the interface for what we're testing
// Run the tests
int main()
{
double photon_mass = invariant_mass(100,100);
}
We now need to tell the compiler where to find the new header using
-I to specify where it should look, but otherwise
everything is as before:
Since invariant_mass is so simple, we could
leave the implementation inline in the header, but most code separates
the interface from the implementation:
- Users of the code are only interested in the interface, not the details of the implementation.
- Compiled code may be faster.
Start by providing a declaration for
invariant_mass in invariant_mass.hpp:
CPP
//! \file invariant_mass.hpp
#pragma once // header guard
#include <cmath>
#include <stdexcept>
// declaration
double invariant_mass(double energy, double momentum);
// implementation (or "definition")
double invariant_mass(double energy, double momentum)
{
if (energy < 0)
{
throw std::domain_error("unphysical negative energy");
}
double mass_squared = energy*energy - momentum*momentum;
if (mass_squared < 0)
{
throw std::domain_error("unphysical mass^2");
}
return std::sqrt(mass_squared);
}
Now create a file src/invariant_mass.cpp and move the
definition of invariant_mass into it:
CPP
//! \file invariant_mass.cpp
// Our declaration
#include "invariant_mass.hpp"
#include <cmath>
// implementation (or "definition")
double invariant_mass(double energy, double momentum)
{
if (energy < 0)
{
throw std::domain_error("unphysical negative energy");
}
double mass_squared = energy*energy - momentum*momentum;
if (mass_squared < 0)
{
throw std::domain_error("unphysical mass^2");
}
return std::sqrt(mass_squared);
}
We then clean up the header to:
CPP
//! \file invariant_mass.hpp
#pragma once // header guard
#include <stdexcept>
double invariant_mass(double energy, double momentum);
We now need to tell the compiler to also compile
invariant_mass.cpp when it builds
test_invariant_mass:
Overall, this isn’t much different from what we already have, but we have decoupled what we test from how we test it. The price of this has been a more complex compilation command, which we will address in a later episode.
C++ Design to Assist Unit Testing
We often write code iteratively based on developing or urgent
research needs. This is not bad practice per se, but without
care it can lead to code that becomes very difficult to test. Let’s say
we’ve been working on an analysis to identify Z boson candidates. We’ve
written invariant_mass to help us, and we’ve now got to the
point that our code looks like this:
CPP
#include <iostream>
#include <fstream>
#include <cmath>
#include "invariant_mass.hpp"
double g_energy_scale = 1.0;
void process_candidates(const std::string& filename) {
std::ifstream file(filename);
if (!file.is_open()) {
std::cerr << "Could not open file: " << filename << std::endl;
return;
}
int n_candidates = 0;
int n_physical = 0;
double sum_mass = 0.0;
double energy, px, py, pz;
while (file >> energy >> px >> py >> pz) {
++n_candidates;
energy *= g_energy_scale;
double momentum = std::sqrt(px*px + py*py + pz*pz);
try {
double mass = invariant_mass(energy, momentum);
++n_physical;
sum_mass += mass;
if (mass > 70.0 && mass < 110.0) {
std::cout << "Z candidate found with mass "
<< mass << " GeV" << std::endl;
}
} catch (const std::invalid_argument&) {
std::cout << "Unphysical candidate, skipping." << std::endl;
}
}
if (n_physical > 0) {
std::cout << "Mean mass: " << sum_mass / n_physical
<< " GeV" << std::endl;
}
std::cout << "Processed " << n_candidates << " candidates, "
<< n_physical << " physical." << std::endl;
}
Part 1 — Identify the problems
For each of the following properties, decide whether
process_candidates() has it and explain in one sentence why
it matters for testing:
- Does the function depend only on its explicit parameters?
- Does it separate mathematical computation from file I/O and output?
- Does it do one thing, or several?
- Does it depend on any state defined outside the function?
- Are all the values that control its behaviour visible in its signature?
No. The result depends on
g_energy_scale, which is not a parameter. A test cannot control or predict the output without also setting the global, and any other code that modifies the global between tests will silently change the result.No. File reading, arithmetic, and printing are all interleaved in the same loop. To test the mass calculation you must provide a real or carefully constructed file, and to check the result you must capture stdout — neither of which is straightforward.
No. It reads a file, applies an energy correction, computes momenta, calls invariant_mass(), applies a mass window cut, accumulates statistics, and prints a summary. Each of these is a candidate for an independent unit.
Yes. It needs the global
g_energy_scale. See above.No. The mass window cuts
70.0and110.0are hardcoded in the body. A test cannot vary them without editing the source, and a reader of the function signature has no indication they exist.
Part 2 — Consequences for testing
For each problem you identified, describe a concrete testing difficulty it causes. Try to be specific: what test would you want to write, and why can you not write it cleanly against the current code?
Global state: We want to test the effect of applying a scale factor of 1.1 to the energy. We cannot do this without setting
g_energy_scale = 1.1before the call and resetting it afterwards — and if two tests run concurrently, or another function modifies it, the test result is unreliable.File I/O entangled with computation: We want to test that a particle with energy \(100 GeV\) and momentum \(50 GeV\) produces a mass of approximately \(86.6 GeV\). To do this we must write those values to a temporary file, pass the filename to the function, and parse stdout to check the result. This is fragile, slow, and tests far more than the mass calculation.
Mega-function: We want to test the Z candidate selection independently — specifically, that a mass of \(69.9 GeV\) is not selected and \(70.1\) GeV is. There is no way to call just that logic; we must run the entire pipeline to exercise it.
Magic numbers: We want to test the mass window boundary conditions. The values
70.0and110.0are buried in the source — we cannot pass different values in a test without editing the code, which means we would be testing a different program than the one in production.
Part 3 — Propose a restructuring
Sketch a set of smaller functions that together reproduce the
behaviour of process_candidates(), but where each part can
be tested independently. Function signatures and a one-sentence
description of what you would test for each are sufficient — you do not
need to write the implementations.
CPP
// Pure mathematical unit — we already have this!
double invariant_mass(double energy, double momentum);
// Pure mathematical unit: magnitude of 3-momentum
// Test: momentum_magnitude(3.0, 4.0, 0.0) == 5.0 (Pythagorean triple)
// Test: momentum_magnitude(0.0, 0.0, 0.0) == 0.0
double momentum_magnitude(double px, double py, double pz);
// Pure function: apply a multiplicative scale to an energy value
// Test: apply_energy_scale(100.0, 1.1) == 110.0
// Test: apply_energy_scale(100.0, 1.0) == 100.0 (identity)
double apply_energy_scale(double energy, double scale);
// Pure function: test whether a mass falls within a window
// Test: is_z_candidate(91.2, 70.0, 110.0) == true
// Test: is_z_candidate(69.9, 70.0, 110.0) == false (boundary)
// Test: is_z_candidate(110.0, 70.0, 110.0) == false (upper boundary exclusive?)
bool is_z_candidate(double mass, double mass_min, double mass_max);
// Operates on data already in memory; returns results as values not printout.
// energy_scale passed explicitly — no global state.
// Test: empty vectors return n_candidates == 0, n_physical == 0
// Test: one physical candidate returns correct mean mass
// Test: one unphysical candidate (E^2 < p^2) is counted but excluded from mean
struct CandidateSummary {
int n_candidates;
int n_physical;
double mean_mass;
std::vector<double> z_candidate_masses;
};
CandidateSummary analyse_candidates(const std::vector<double>& energies,
const std::vector<double>& px,
const std::vector<double>& py,
const std::vector<double>& pz,
double energy_scale,
double mass_min,
double mass_max);
// I/O boundary: reads file, calls analyse_candidates, prints summary.
// Not directly unit tested — but now thin enough that there is little
// logic here to get wrong.
void process_candidates(const std::string& filename,
double energy_scale,
double mass_min,
double mass_max);
Points worth drawing out in discussion:
-
momentum_magnitude()is worth extracting even though it is a single line — it has a name, a clear contract, and can be tested with exact Pythagorean triples -
is_z_candidate()makes the boundary conditions explicit and testable; students should notice the question mark in the comment about whether the upper bound is inclusive or exclusive, and recognise this as a specification decision that needs to be made and documented -
analyse_candidates()now takes all its inputs as parameters and returns all its outputs as values — it can be tested without any files or output capture. -
process_candidates()still exists but is now just a thin I/O wrapper; the principle is not to eliminate I/O but to push it to the boundary.- we _don’t have a unit test for it, and that’s intentional: it connects units together, so becomes an effective integration test.
- this is exactly what we described in the first episode: Unit tests tell you which component is broken; integration tests tell you that the components work together. If your integration test fails but all your unit tests pass, the bug is almost certainly in the way the units are connected, which is a much smaller place to look.
- we haven’t completely eliminated the global energy scale: we might still need to get this from a global variable, but our code no longer depends on it.
- This is mostly good software design practice, but thinking about it terms of testing can help make design decisions.
Part 4 — Preserving behaviour
If you refactored process_candidates() into the
functions as above, how would you verify that the refactoring did not
change the behaviour of the program? What would you want to have in
place before you started, and what would you check at each step?
Before starting: characterise the existing behaviour with at least one end-to-end check — run
process_candidates()on a known input file and record the output. This becomes the reference to check against after each refactoring step. We are using this as an integration test and as a regression test.During refactoring: extract one function at a time and keep the overall program runnable after each extraction. Check after each step that the end-to-end output is unchanged, i.e. we check that the new units integrate and do not introduce a regression.
After refactoring: the new unit tests for the extracted functions verify correctness at the unit level; the end-to-end check verifies that composition of the units produces the same overall behaviour as the original.
It’s an unfortunate fact that if process_candidates()
had no tests before the refactoring, you are in this difficult
position. The end-to-end check helps, but it only covers the
cases you thought to include in your reference file. This is why it is
easier to write testable code from the start than to recover testability
from legacy code.
Important to highlight that to have a feasible exercise to demonstrate the concepts, or examples are somewhat contrived. Nevertheless, we want to highlight what practices and habits to adopt now rather than after writing thousands of lines of code. Equally, be honest and note that they may have to work with legacy code in research, but these techniques can help to mitigate problems.
Dealing with randomness
Let’s say we add a function to our analysis to model the effect of detector resolution on our calculated mass:
CPP
#include <cmath>
#include <random>
#include "invariant_mass.hpp"
/* Estimate the invariant mass resolution by smearing true quantities
with Gaussian detector resolution */
double estimate_mass_resolution(double true_energy,
double true_momentum,
double resolution = 0.05,
int n_trials = 10000) {
std::random_device rd;
std::mt19937 get_random(rd());
std::normal_distribution<double> smear(0.0, resolution);
double sum_sq = 0.0;
for (int i = 0; i < n_trials; ++i) {
double smeared_energy = true_energy * (1.0 + smear(get_random));
double smeared_momentum = true_momentum * (1.0 + smear(get_random));
double mass = invariant_mass(smeared_energy, smeared_momentum);
sum_sq += mass * mass;
}
return std::sqrt(sum_sq / n_trials);
}
Challenge
This function does not share the structural problems of
process_candidates() — it takes all inputs as parameters,
there’s no I/O, and it returns a value. But it still has testability
problems.
- What would happen if you tested
estimate_mass_resolution(91.2, 0.0) == Xfor some valueXyou computed by hand? - How would you restructure the function so that a test could produce a reproducible result? What is the minimal change needed?
- Even with that fix, what would your test actually be checking? Is that sufficient?
-
std::random_deviceseeds the Mersenne Twister random number generator from a hardware entropy source, so the sequence of random numbers is different on every execution. In addition, sequential calls toestimate_mass_resolution()with identical arguments will return different values.No fixed expected value exists to test against. The test would pass or fail unpredictably depending on the random seed. Worse, it might pass nine times out of ten and fail occasionally — the hardest kind of bug to diagnose, because the failure is not reproducible.
-
The minimal fix is accept the random number generator as a parameter:
CPP
double estimate_mass_resolution(double true_energy, double true_momentum, std::mt19937& gen, double resolution = 0.05, int n_trials = 10000);A test can now pass a generator seeded with a fixed value and get a deterministic result:
CPP
std::mt19937 gen(42); // fixed seed double result = estimate_mass_resolution(91.2, 0.0, gen); // result is now the same on every runThe caller constructs its generator however it likes — from
std::random_device, from a run number, from a command-line argument — and passes it in. The function no longer makes that decision for its caller. With a fixed seed, the test checks that the function produces a specific numerical result for that seed. It does not check that the result is statistically correct — for that you would need to verify that the distribution of outputs over many seeds has the right mean and width, which is a different and harder kind of test. The honest answer is that testing stochastic functions thoroughly is genuinely difficult, and fixing the seed is a pragmatic first step that at least guarantees reproducibility.
The tension in part 3 of the challenge is deliberate - this is a hard problem, and we acknowledge it.
The two snippets together make a useful point to draw out explicitly at the end of the exercise: process_candidates() and estimate_mass_resolution() have different kinds of testability problem, and the fix in each case is different. But both fixes follow the same underlying principle — make all dependencies explicit in the function signature. Global state, file handles, and random number generators are all the same kind of problem: hidden inputs that the function’s caller cannot see or control. A function whose entire input is visible in its signature is a function you can reason about, test, and trust.
- Tests live in their own file and are compiled separately from the code under test
- A function is easy to test if it takes all its inputs as parameters and returns its output as a value
- Global state, side effects, hidden dependencies, and mixed concerns make functions harder to test and harder to reason about
- Writing testable code and writing maintainable code are largely the same discipline
- Refactoring untested code safely requires characterising its existing behaviour first — which requires tests you do not yet have