Testing stateful classes

Last updated on 2026-06-30 | Edit this page

Overview

Questions

How do I test code that has to be constructed and populated before I can interrogate it?
How do I verify results that are collections rather than single values?

Objectives

Explain why a stateful class requires a different testing approach to a pure function
Read a class header and its Doxygen comments to identify what should be tested before writing any test
Write a suite of TEST() cases covering construction and filling of Histogram
Use GoogleTest Matchers to simplify comparing collections of values

Testing a Histogram class

So far we have been testing invariant_mass which is a pure function: give it the same inputs and it always returns the same outputs. Most of the interesting code we will write is not like this, instead we have object orientation, and in particular classes. Imagine our analysis needs histogramming. A histogram has to be constructed, filled, and then interrogated. The result of calling, say, bin_counts() depends on everything that has happened to the object since it was created. How do we test something like that?

Callout

This might seem to contradict our earlier design exercise where we warned against functions relying on external state. A C++ class does have state, but it is private and maintained for consistency by the class itself. This internal consistency of internal state is sometimes called invariance though this should not be read as “the state is constant”. It’s more like our invariant mass example.

Let’s start by looking at a pre-existing implementation we’ve taken over, and as provided in your ccptepp-test project. Open up src/histogram.hpp, and we see:

CPP

#pragma once
#include <stdexcept>
#include <vector>

/**
 * @brief A one-dimensional histogram with fixed-width bins.
 *
 * Bins are defined over the half-open interval [@p x_min, @p x_max).
 * Values outside this range are counted separately as underflow or overflow
 * and do not contribute to bin counts or the mean.
 *
 * All bin widths are equal: (@p x_max - @p x_min) / @p n_bins.
 */
class Histogram
{

public:
    /**
     * @brief Construct a histogram with uniform binning.
     *
     * @param n_bins  Number of bins. Must be greater than zero.
     * @param x_min   Lower edge of the first bin (inclusive).
     * @param x_max   Upper edge of the last bin (exclusive).
     *
     * @throws std::invalid_argument if @p n_bins <= 0.
     * @throws std::invalid_argument if @p x_min >= @p x_max.
     */
    Histogram(int n_bins, float x_min, float x_max);

    /**
     * @brief Fill the histogram with a value.
     *
     * If @p x is in [@p x_min, @p x_max), the corresponding bin count is
     * incremented by @p weight. If @p x is outside this range, the underflow
     * or overflow counter is incremented instead; @p weight is ignored for
     * out-of-range values. The total entry count is always incremented.
     *
     * @param x       The value to fill.
     * @param weight  The weight to add to the bin count. Defaults to 1.0.
     */
    void fill(float x, float weight = 1.0f);

    /**
     * @brief Return the bin counts as a vector of length n_bins.
     *
     * Element @c i contains the sum of weights of all in-range values that
     * fell into bin @c i. Underflow and overflow are not included.
     */
    std::vector<float> bin_counts() const;

    /**
     * @brief Return the bin edges as a vector of length n_bins + 1.
     *
     * Element @c i is the lower edge of bin @c i; element @c n_bins is the
     * upper edge of the last bin, equal to @p x_max.
     */
    std::vector<float> bin_edges() const;

    /**
     * @brief Return the total number of fill() calls, including out-of-range values.
     */
    int n_entries() const;

    /**
     * @brief Return the number of fill() calls where x >= x_max.
     */
    int n_overflow() const;

    /**
     * @brief Return the number of fill() calls where x < x_min.
     */
    int n_underflow() const;

    /**
     * @brief Return the unweighted mean of all in-range filled values.
     *
     * Computed as the arithmetic mean of the @p x values passed to fill(),
     * excluding out-of-range values. The @p weight parameter of fill() does
     * not affect this calculation.
     *
     * @throws std::runtime_error if no in-range values have been filled.
     */
    float mean() const;

private:
    int n_bins_;
    float x_min_, x_max_, bin_width_;
    std::vector<float> counts_;
    int n_entries_ = 0;
    int n_overflow_ = 0;
    int n_underflow_ = 0;
    float value_sum_ = 0.0f;
    int in_range_ = 0;
};

The good news is that the author has provided documentation for the class and each of its member functions, so the first thing we do is to check through this before writing a single test. We also won’t worry about src/histogram.cpp yet - hopefully the specification will tell us everything we are allowed to assume about the intended behaviour of this class and thus what we should need to test for.

Key Points

A half-open interval [x_min, x_max) has been chosen for the bins — this is a decision with testable consequences.
There’s a distinction between n_entries() and in-range fills — overflow and underflow are counted but excluded from bin_counts() and mean()
Note the author has defined an unweighted mean!

We note these as design decisions - we are going to test as given, and will focus on that rather than on whether these decisions are good or not!

Let’s just do some build and test housekeeping to make sure we can compile Histogram and get the skeleton of the test program in place. Create a file test/test_histogram.cpp as follows:

CPP

//! \file test_histogram.cpp
#include "histogram.hpp"

#include <gtest/gtest.h>
#include <gmock/gmock.h>

Save this as is and open up CMakeLists.txt to add the Histogram code the library and build and set up the test:

CMAKE

...

# - Build test_invariant_mass
add_executable(test_invariant_mass test/test_invariant_mass.cpp)
target_link_libraries(test_invariant_mass ccptepp GTest::gtest_main)

# - Build test_histogram
add_executable(test_histogram test/test_histogram.cpp)
target_link_libraries(test_histogram ccptepp GTest::gtest_main GTest::gmock)

# - Setup CTest
enable_testing()

# - Declare tests
add_test(NAME TestInvariantMass COMMAND test_invariant_mass)
add_test(NAME TestHistogram COMMAND test_histogram)

We’ll explain the extra gmock.h header and GTest::gmock library in the next section. We should now be able to compile and run and see the new test in the output:

BASH

(ccptepp-test) [macbook]$ cmake --build build                                                  
[0/1] Re-running CMake...
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/ccptepp-test/build
[2/3] Linking CXX executable test_histogram

and then

BASH

(ccptepp-test) $ ctest --test-dir build          
Test project /tmp/ccptepp-test/build
    Start 1: TestInvariantMass
1/2 Test #1: TestInvariantMass ................   Passed    0.45 sec
    Start 2: TestHistogram
2/2 Test #2: TestHistogram ....................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) =   0.46 sec

One feature of CTest you might want to be aware of here is filtering. We only have two tests running, but as the suite grows, we may only be interested in the results of the one we are working on. Every test in CTest has a number, the Test #N in the output, and the name we gave it in add_test. If we just wanted to run TestHistogram alone, then we could use CTest’s -I argument to select it by number:

BASH

(ccptepp-test) [macbook]$ ctest --test-dir build -I 2,2
Test project /tmp/ccptepp-test/build
    Start 2: TestHistogram
1/1 Test #2: TestHistogram ....................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.01 sec

Note that -I actually takes start,end,stride as arguments, so 2,2 is needed to select only test 2. Usually more useful is to use the -R argument to select by a regex on the test name, e.g.

BASH

(ccptepp-test) $ ctest --test-dir build -R '.*Hist'
Test project /tmp/ccptepp-test/build
    Start 2: TestHistogram
1/1 Test #2: TestHistogram ....................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.01 sec

Any regex supported by CMake can be used here, and CTest has several other arguments to include/exclude specific tests if you need this.

Step 1: Can we construct Histogram as specified?

There’s not much point testing what Histogram can do until we construct it. The specification about this is pretty clear, so let’s open up test/test_histogram.cpp and write these up as tests

CPP

//! \file test_histogram.cpp
#include "histogram.hpp"

#include <gtest/gtest.h>
#include <gmock/gmock.h>

TEST(HistogramConstruction, ValidParametersDoNotThrow) {
    EXPECT_NO_THROW(Histogram(10, 0.0f, 1.0f));
}

TEST(HistogramConstruction, NegativeBinsThrows) {
    EXPECT_THROW(Histogram(-10, 0.0f, 1.0f), std::invalid_argument);
}

TEST(HistogramConstruction, ZeroBinsThrows) {
    EXPECT_THROW(Histogram(0, 0.0f, 1.0f), std::invalid_argument);
}

TEST(HistogramConstruction, IncorrectRangeThrows) {
    EXPECT_THROW(Histogram(10, 1.0f, 0.99f), std::invalid_argument);
}

TEST(HistogramConstruction, BinCountsHasCorrectSize) {
    Histogram h(10, 0.0f, 1.0f);
    EXPECT_EQ(h.bin_counts().size(), 10);
}

TEST(HistogramConstruction, AllBinsInitiallyZero) {
    Histogram h(10, 0.0f, 1.0f);
    std::vector<float> expected(10, 0.0f);
    EXPECT_THAT(h.bin_counts(), ::testing::ContainerEq(expected));
}

We’ve introduced the new EXPECT_THAT(actual_value, matcher) macro here to help with an aspect that starts to appear when testing classes or rather comparing them for equality. We infer from the specification that a freshly constructed histogram is empty, so we want to assert that there are each of the N bin counts are zero. We could use std::vector::operator==, or even a loop over the vector returned by bin_counts(), combined with EXPECT_EQ, but that would add boilerplate and we might not get an informative error message (which element(s) weren’t equal, but how much).

EXPECT_THAT is sort of a generalized EXPECT_EQ where the second argument is a Matcher object that performs a specific type of comparison against the expected value. We’ve used the one designed to check for the equality of two containers, which might not seem like much, but we get a lot of information on failure, e.g. with

CPP

TEST(HistogramConstruction, AllBinsInitiallyZero) {
    Histogram h(10, 0.0f, 1.0f);
    std::vector<float> expected(10, 0.0f);
    expected[3] = 1.0f; // deliberate wrong value;
    EXPECT_THAT(h.bin_counts(), ::testing::ContainerEq(expected));
}

we’ll get failure output:

BASH

[ RUN      ] HistogramConstruction.AllBinsInitiallyZero
/Users/benmorgan/tmp/pix/ccptepp-test/test/test_histogram.cpp:26: Failure
Value of: h.bin_counts()
Expected: equals { 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 }
  Actual: { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, which doesn't have these expected elements: 1

[  FAILED  ] HistogramConstruction.AllBinsInitiallyZero (0 ms)

Key Points

The name HistogramConstruction groups all construction-related tests in a clear suite.
We see a use case for EXPECT_NO_THROW: valid inputs should not throw!
GoogleTest’s Matchers from its GMock Component help to write tests more easily and expressively when dealing with more complex assertions.

Challenge

The documentation says bin_edges() returns a vector of length n_bins + 1.

Write a test that verifies this for a histogram with 10 bins.
Write a test that checks the first and last edges are equal to x_min and x_max respectively.

Show me the solution

CPP

TEST(HistogramConstruction, BinEdgesHasCorrectSize) {
    Histogram h(10, 0.0f, 1.0f);
    EXPECT_EQ(h.bin_edges().size(), 11);
}

TEST(HistogramConstruction, BinEdgesHaveCorrectExtremes) {
    Histogram h(10, 0.0f, 1.0f);
    auto edges = h.bin_edges();
    EXPECT_EQ(edges.front(), 0.0f);
    EXPECT_EQ(edges.back(), 1.0f);
}

We’ve chosen to be a bit strict here and use EXPECT_EQ rather than EXPECT_FLOAT_EQ. The upper and lower bounds are nominally “constants” after construction so we’d expect to get them back exactly as we input them. This is subtle, and EXPECT_FLOAT_EQ would also have been valid here. It’s never bad to start with strict bounds though, false positives (failures) are better than false negatives (passes).

Step 2: Does Histogram filling behave as specified?

With construction cases handled, let’s move on to testing fill operations, starting with single bins:

CPP

TEST(HistogramFill, SingleFillIncreasesCorrectBin) {
    Histogram h(10, 0.0f, 1.0f);  // bins: [0,0.1), [0.1,0.2), ...
    h.fill(0.35f);                  // should land in bin 3
    EXPECT_EQ(h.bin_counts()[3], 1.0f);
    EXPECT_EQ(h.n_entries(), 1);
}

TEST(HistogramFill, SingleFillLeavesOtherBinsZero) {
    Histogram h(10, 0.0f, 1.0f);
    h.fill(0.45f);
    std::vector<float> expected(10, 0.0f);
    expected[4] = 1.0f;
    EXPECT_THAT(h.bin_counts(), ::testing::ContainerEq(expected));
}

These are deliberately separate — the first checks that the right bin was incremented, the second checks that no other bin was affected. A combined test that checked both in a single TEST() would be harder to diagnose on failure. Again, we are being strict with our floating point numbers as we know calculations are only involving 0.0f and 1.0f.

Challenge

The documentation says that a value passed to fill that is less than x_min is treated as underflow. Write a test that verifies this. Think carefully about what you need to check — there may be more than one assertion worth making.

Show me the solution

There are actually three assertions we can make here:

CPP

TEST(HistogramFill, ValueBelowXMinIsUnderflow) {
    Histogram h(10, 0.0f, 1.0f);
    h.fill(-0.1f);  // below x_min — should be underflow
    EXPECT_EQ(h.n_underflow(), 1);
    EXPECT_EQ(h.n_entries(), 1);
    EXPECT_THAT(h.bin_counts(), ::testing::ContainerEq(std::vector<float>(10, 0.0f)));
}

This is where we need to read the specification carefully to understand all of the postconditions. We can argue this Histogram is designed somewhat oddly, but it is what we were given.

The fill operation can also take a weight, so let’s implement a corresponding test case for this

CPP

TEST(HistogramFill, WeightedFillProducesCorrectCounts) {
    Histogram h(10, 0.0f, 1.0f);
    h.fill(0.1f, 0.5f);
    h.fill(0.6f, 1.5f);
    std::vector<float> expected(10, 0.0f);
    expected[0] = 0.5f;
    expected[5] = 1.5f;
    EXPECT_THAT(h.bin_counts(), ::testing::Pointwise(::testing::FloatEq(), expected));
}

This is much the same as the unweighted case, but we have swapped over to using a different Matcher. As weighted counts are going to involve sums and multiplications, we may start to run into floating point precision issues. ContainerEq is basically doing an EXPECT_EQ on corresponding pairs of elements in the actual and expected collections. Pointwise allows us to do this but specify an extra Matcher to do this comparison - the equivalent to EXPECT_FLOAT_EQ here is FloatEq, and we could also get EXPECT_FLOAT_NEAR behaviour with FloatNear, which takes the tolerance as a constructor argument:

CPP

    // If we used `FloatNear` instead.
    EXPECT_THAT(h.bin_counts(), ::testing::Pointwise(::testing::FloatNear(0.01), expected));

Challenge

Write a test case that fills the same bin twice with different weights and checks the total count in that bin.
Write a test case that verifies n_entries() counts all fills including those with weights other than 1.0.

Show me the solution

Depending on the floating point values you used:

CPP

TEST(HistogramFill, MultipleWeightedFillsAccumulate) {
    Histogram h(10, 0.0f, 1.0f);
    h.fill(0.25f, 0.1f);
    h.fill(0.25f, 0.2f);  // same bin, different weight
    std::vector<float> expected(10, 0.0f);
    expected[2] = 0.3f;
    EXPECT_THAT(h.bin_counts(), ::testing::Pointwise(::testing::FloatEq(), expected));
}

This is also a postconditions test:

CPP

TEST(HistogramFill, NEntriesCountsAllFillsRegardlessOfWeight) {
    Histogram h(10, 0.0f, 1.0f);
    h.fill(0.1f, 2.0f);
    h.fill(0.6f, 0.5f);
    EXPECT_EQ(h.n_entries(), 2);
}

Step 3: Is the Histogram mean calculated correctly after filling?

We’ve tested construction and filling of Histogram, so we should now check that the mean value is calculated correctly from the filled data. Let’s start with a simple unweighted symmetric case:

CPP

TEST(HistogramMean, MeanOfSymmetricFillsIsNearCentre) {
    Histogram h(10, 0.0f, 1.0f);
    h.fill(0.2f);
    h.fill(0.8f);
    EXPECT_NEAR(h.mean(), 0.5f, 1e-5f);
}

Now the weighted fill case:

CPP

TEST(HistogramMean, MeanIsUnweighted) {
    Histogram h(10, 0.0f, 1.0f);
    h.fill(0.2f, 10.0f);  // large weight — should not affect mean
    h.fill(0.8f,  1.0f);
    // unweighted mean of {0.2, 0.8} = 0.5, regardless of weights
    EXPECT_NEAR(h.mean(), 0.5f, 1e-5f);
}

We’re following what the specification tells us here, that an unweighted mean is calculated! The point here is not to worry (yet!) whether this is good design, but testing to specification first before thinking about refactoring.

Challenge

The documentation says that mean() excludes out-of-range values.

Write a test that fills one in-range value and one underflow value and verifies that mean() reflects only the in-range fill.
What does this tell you about the relationship between mean() and n_entries()?

Show me the solution

Again, depending on your choice of filling:

CPP

TEST(HistogramMean, MeanExcludesUnderflowValues) {
    Histogram h(10, 0.0f, 1.0f);
    h.fill(0.6f);    // in range
    h.fill(-0.01f);   // underflow — should not affect mean
    EXPECT_NEAR(h.mean(), 0.6f, 1e-5f);
    EXPECT_EQ(h.n_entries(), 2);
}

Per the specification, n_entries() actually returns the total number of fills, not how many are in the range. This is slightly subtle detail of the specification.

We now have a substantial test suite for Histogram, and we’ve been able to do that entirely from the header file and the documentation of its interface. Unless we encountered problems, we probably haven’t had to read its actual implementation. However, writing the tests required some intepretation - is the mean weighted or unweighted? what does n_entries() actually count? These are not testing decisions — they are specification decisions made by the author of Histogram. The tests are forcing us to read and understand the contract carefully, which is useful regardless of whether the tests ever catch a bug. It also illustrates that writing down these specifications and contracts for our own code is valuable in helping us decide what to test, once again reinforcing the symbiotic nature of documentation and testing in software development.

Key Points

A stateful class is testable if its state is explicit and controlled through a well-defined interface — the difficulty arises from global state, not from state itself.
Reading the specification before writing tests is not optional — it determines what the tests should assert and makes any ambiguities obvious.
Each test case should verify one behaviour — if a test needs “and” in its name it is probably two tests
GoogleTest provides helpers in GMock for more complex checks.