Key Points

What is unit testing?


  • Testing is formalising what you already do informally when you verify your code
  • Verification confirms that code implements a model correctly; validation confirms the model describes nature correctly — unit testing addresses verification
  • Unit tests check that a single function produces the expected output for a given input
  • Integration tests make sure that code units work together properly.
  • Regression tests ensure that everything works the same today as it did yesterday.

Organizing code to enable unit testing


  • Tests live in their own file and are compiled separately from the code under test
  • A function is easy to test if it takes all its inputs as parameters and returns its output as a value
  • Global state, side effects, hidden dependencies, and mixed concerns make functions harder to test and harder to reason about
  • Writing testable code and writing maintainable code are largely the same discipline
  • Refactoring untested code safely requires characterising its existing behaviour first — which requires tests you do not yet have

Unit testing with assert()


  • Documentation and testing are symbiotic:
    • Documentation records our expections of the code’s behaviour.
    • Tests encode the verification of this behaviour in test cases.
  • assert(expression) aborts the program if expression is false — silence means the test passed
  • Failure of an assertion results in an error message and program termination, providing a clear test failure condition.
  • A failing assert() tells you something went wrong, and where in the sode, but not directly how.
  • Manual compilation of multiple test files does not scale.

Integrating tests into a build system


  • A build system like CMake ensures tests are always compiled against the current code before they are run.
  • CTest is a test runner — it does not care how tests are written, only whether the executable exits cleanly
  • Tests you have to run manually are tests you will forget to run — automation removes that risk
  • Keeping the barrier to running tests low is as important as writing the tests themselves
  • assert() is disabled when NDEBUG is defined — in a CMake release build your entire test suite silently disappears

Introducing GoogleTest


  • assert() gives you an abort; GoogleTest tells you which test failed, what the actual value was, and what the expected value was
  • GoogleTest integrates with CMake/CTest so your existing build workflow does not change
  • A failing TEST does not prevent further TESTs from running.
  • EXPECT_* continues after a failure; ASSERT_* stops the current TEST — use ASSERT_* when continuing would be meaningless.

Floating point comparisonsTesting with floating point numbers


  • Floating point arithmetic is not exact — two calculations that are mathematically equal may not be numerically equal.
  • EXPECT_EQ is appropriate for floating point only when the value is exactly representable.
  • EXPECT_DOUBLE_EQ and EXPECT_FLOAT_EQ check that two floating point numbers are within 4 ULPs of each other.
  • EXPECT_NEAR(a, b, tol) checks that |a - b| < tol — the tolerance should reflect the expected numerical error from the specific type of calculation, not be chosen arbitrarily.

Testing exceptional behaviour


  • Testing what your code refuses to do is as important as testing what it does
  • A function’s error handling is part of its specification and should be documented and tested like any other behaviour.
  • These often determine boundary conditions where bugs most commonly live, making them vital to test.
  • EXPECT_THROW checks both that an exception was raised and that it was the right type — the type is part of the function’s specification.
  • With invariant_mass() now fully tested, we have seen a near complete range of GoogleTest assertion types — the remaining episodes apply these tools to more complex code.

Testing stateful classes


  1. A half-open interval [x_min, x_max) has been chosen for the bins — this is a decision with testable consequences.
  2. There’s a distinction between n_entries() and in-range fills — overflow and underflow are counted but excluded from bin_counts() and mean()
  3. Note the author has defined an unweighted mean!
  • The name HistogramConstruction groups all construction-related tests in a clear suite.
  • We see a use case for EXPECT_NO_THROW: valid inputs should not throw!
  • GoogleTest’s Matchers from its GMock Component help to write tests more easily and expressively when dealing with more complex assertions.
  • A stateful class is testable if its state is explicit and controlled through a well-defined interface — the difficulty arises from global state, not from state itself.
  • Reading the specification before writing tests is not optional — it determines what the tests should assert and makes any ambiguities obvious.
  • Each test case should verify one behaviour — if a test needs “and” in its name it is probably two tests
  • GoogleTest provides helpers in GMock for more complex checks.

Test fixtures


  • A fixture eliminates repeated setup code and makes the intended starting state of each test using that fixture explicit.
  • SetUp() runs before every individual test — each test starts from a clean, identical state regardless of what other tests do
  • Fixtures do not change what is being tested, only how the starting state is prepared
  • Construction tests belong outside the fixture — the fixture assumes construction succeeds and tests behaviour from that point

Code coverage


  • Coverage measures which lines and branches were executed during testing — not whether they were tested correctly
  • A line shown as covered means it ran; it does not mean the result was checked or that the test would catch a bug there
  • Branch coverage is more informative than line coverage — a line can execute without all its branches being taken
  • Coverage is a lower bound on thoroughness, not an upper bound — 100% coverage is necessary but not sufficient
  • The coverage report is most useful as a guide to where tests are missing, not as a measure of test quality
  • Beware of diminishing returns

Sanitizers as another line of defenceCode Sanitizers


  • Unit tests check that your code does what you intended; sanitizers check for errors your intentions did not anticipate
  • A test suite that is green and fully covered can still contain memory errors and undefined behaviour
  • AddressSanitizer detects out-of-bounds memory access and use-after-free at runtime — errors that produce no compiler warning and may crash only rarely in production
  • Sanitizers diagnose bugs that already exist; a well-chosen test prevents their reintroduction
  • No single tool is sufficient — unit tests, coverage measurement, and sanitizers answer different questions and catch different bugs; together they give you the best practical assurance that your code is correct