Testing stateful classes
Last updated on 2026-06-30 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- How do I test code that has to be constructed and populated before I can interrogate it?
- How do I verify results that are collections rather than single values?
Objectives
- Explain why a stateful class requires a different testing approach to a pure function
- Read a class header and its Doxygen comments to identify what should be tested before writing any test
- Write a suite of
TEST()cases covering construction and filling of Histogram - Use GoogleTest Matchers to simplify comparing collections of values
Testing a Histogram class
So far we have been testing invariant_mass which is a
pure function: give it the same inputs and it always returns
the same outputs. Most of the interesting code we will write is not like
this, instead we have object orientation, and in particular
classes. Imagine our analysis needs histogramming. A histogram
has to be constructed, filled, and then interrogated. The result of
calling, say, bin_counts() depends on everything that has
happened to the object since it was created. How do we test something
like that?
This might seem to contradict our earlier design exercise where we warned against functions relying on external state. A C++ class does have state, but it is private and maintained for consistency by the class itself. This internal consistency of internal state is sometimes called invariance though this should not be read as “the state is constant”. It’s more like our invariant mass example.
Let’s start by looking at a pre-existing implementation we’ve taken
over, and as provided in your ccptepp-test project. Open up
src/histogram.hpp, and we see:
CPP
#pragma once
#include <stdexcept>
#include <vector>
/**
* @brief A one-dimensional histogram with fixed-width bins.
*
* Bins are defined over the half-open interval [@p x_min, @p x_max).
* Values outside this range are counted separately as underflow or overflow
* and do not contribute to bin counts or the mean.
*
* All bin widths are equal: (@p x_max - @p x_min) / @p n_bins.
*/
class Histogram
{
public:
/**
* @brief Construct a histogram with uniform binning.
*
* @param n_bins Number of bins. Must be greater than zero.
* @param x_min Lower edge of the first bin (inclusive).
* @param x_max Upper edge of the last bin (exclusive).
*
* @throws std::invalid_argument if @p n_bins <= 0.
* @throws std::invalid_argument if @p x_min >= @p x_max.
*/
Histogram(int n_bins, float x_min, float x_max);
/**
* @brief Fill the histogram with a value.
*
* If @p x is in [@p x_min, @p x_max), the corresponding bin count is
* incremented by @p weight. If @p x is outside this range, the underflow
* or overflow counter is incremented instead; @p weight is ignored for
* out-of-range values. The total entry count is always incremented.
*
* @param x The value to fill.
* @param weight The weight to add to the bin count. Defaults to 1.0.
*/
void fill(float x, float weight = 1.0f);
/**
* @brief Return the bin counts as a vector of length n_bins.
*
* Element @c i contains the sum of weights of all in-range values that
* fell into bin @c i. Underflow and overflow are not included.
*/
std::vector<float> bin_counts() const;
/**
* @brief Return the bin edges as a vector of length n_bins + 1.
*
* Element @c i is the lower edge of bin @c i; element @c n_bins is the
* upper edge of the last bin, equal to @p x_max.
*/
std::vector<float> bin_edges() const;
/**
* @brief Return the total number of fill() calls, including out-of-range values.
*/
int n_entries() const;
/**
* @brief Return the number of fill() calls where x >= x_max.
*/
int n_overflow() const;
/**
* @brief Return the number of fill() calls where x < x_min.
*/
int n_underflow() const;
/**
* @brief Return the unweighted mean of all in-range filled values.
*
* Computed as the arithmetic mean of the @p x values passed to fill(),
* excluding out-of-range values. The @p weight parameter of fill() does
* not affect this calculation.
*
* @throws std::runtime_error if no in-range values have been filled.
*/
float mean() const;
private:
int n_bins_;
float x_min_, x_max_, bin_width_;
std::vector<float> counts_;
int n_entries_ = 0;
int n_overflow_ = 0;
int n_underflow_ = 0;
float value_sum_ = 0.0f;
int in_range_ = 0;
};
The good news is that the author has provided documentation for the
class and each of its member functions, so the first thing we do is to
check through this before writing a single test. We also won’t worry
about src/histogram.cpp yet - hopefully the specification
will tell us everything we are allowed to assume about the intended
behaviour of this class and thus what we should need to test for.
- A half-open interval
[x_min, x_max)has been chosen for the bins — this is a decision with testable consequences. - There’s a distinction between
n_entries()and in-range fills — overflow and underflow are counted but excluded frombin_counts()andmean() - Note the author has defined an unweighted mean!
We note these as design decisions - we are going to test as given, and will focus on that rather than on whether these decisions are good or not!
Let’s just do some build and test housekeeping to make sure we can
compile Histogram and get the skeleton of the test program
in place. Create a file test/test_histogram.cpp as
follows:
CPP
//! \file test_histogram.cpp
#include "histogram.hpp"
#include <gtest/gtest.h>
#include <gmock/gmock.h>
Save this as is and open up CMakeLists.txt to add the
Histogram code the library and build and set up the
test:
CMAKE
...
# - Build test_invariant_mass
add_executable(test_invariant_mass test/test_invariant_mass.cpp)
target_link_libraries(test_invariant_mass ccptepp GTest::gtest_main)
# - Build test_histogram
add_executable(test_histogram test/test_histogram.cpp)
target_link_libraries(test_histogram ccptepp GTest::gtest_main GTest::gmock)
# - Setup CTest
enable_testing()
# - Declare tests
add_test(NAME TestInvariantMass COMMAND test_invariant_mass)
add_test(NAME TestHistogram COMMAND test_histogram)
We’ll explain the extra gmock.h header and
GTest::gmock library in the next section. We should now be
able to compile and run and see the new test in the output:
BASH
(ccptepp-test) [macbook]$ cmake --build build
[0/1] Re-running CMake...
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/ccptepp-test/build
[2/3] Linking CXX executable test_histogram
and then
BASH
(ccptepp-test) $ ctest --test-dir build
Test project /tmp/ccptepp-test/build
Start 1: TestInvariantMass
1/2 Test #1: TestInvariantMass ................ Passed 0.45 sec
Start 2: TestHistogram
2/2 Test #2: TestHistogram .................... Passed 0.01 sec
100% tests passed, 0 tests failed out of 2
Total Test time (real) = 0.46 sec
One feature of CTest you might want to be aware of here is
filtering. We only have two tests running, but as the suite
grows, we may only be interested in the results of the one we are
working on. Every test in CTest has a number, the Test #N
in the output, and the name we gave it in add_test. If we
just wanted to run TestHistogram alone, then we could use
CTest’s -I argument to select it by number:
BASH
(ccptepp-test) [macbook]$ ctest --test-dir build -I 2,2
Test project /tmp/ccptepp-test/build
Start 2: TestHistogram
1/1 Test #2: TestHistogram .................... Passed 0.01 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 0.01 sec
Note that -I actually takes
start,end,stride as arguments, so 2,2 is
needed to select only test 2. Usually more useful is to use the
-R argument to select by a regex on the test name, e.g.
BASH
(ccptepp-test) $ ctest --test-dir build -R '.*Hist'
Test project /tmp/ccptepp-test/build
Start 2: TestHistogram
1/1 Test #2: TestHistogram .................... Passed 0.01 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 0.01 sec
Any regex supported by CMake can be used here, and CTest has several other arguments to include/exclude specific tests if you need this.
Step 1: Can we construct Histogram as specified?
There’s not much point testing what Histogram can do
until we construct it. The specification about this is pretty
clear, so let’s open up test/test_histogram.cpp and write
these up as tests
CPP
//! \file test_histogram.cpp
#include "histogram.hpp"
#include <gtest/gtest.h>
#include <gmock/gmock.h>
TEST(HistogramConstruction, ValidParametersDoNotThrow) {
EXPECT_NO_THROW(Histogram(10, 0.0f, 1.0f));
}
TEST(HistogramConstruction, NegativeBinsThrows) {
EXPECT_THROW(Histogram(-10, 0.0f, 1.0f), std::invalid_argument);
}
TEST(HistogramConstruction, ZeroBinsThrows) {
EXPECT_THROW(Histogram(0, 0.0f, 1.0f), std::invalid_argument);
}
TEST(HistogramConstruction, IncorrectRangeThrows) {
EXPECT_THROW(Histogram(10, 1.0f, 0.99f), std::invalid_argument);
}
TEST(HistogramConstruction, BinCountsHasCorrectSize) {
Histogram h(10, 0.0f, 1.0f);
EXPECT_EQ(h.bin_counts().size(), 10);
}
TEST(HistogramConstruction, AllBinsInitiallyZero) {
Histogram h(10, 0.0f, 1.0f);
std::vector<float> expected(10, 0.0f);
EXPECT_THAT(h.bin_counts(), ::testing::ContainerEq(expected));
}
We’ve introduced the new
EXPECT_THAT(actual_value, matcher) macro here to help with
an aspect that starts to appear when testing classes or rather
comparing them for equality. We infer from the specification
that a freshly constructed histogram is empty, so we want to assert that
there are each of the N bin counts are zero. We could use std::vector::operator==,
or even a loop over the vector returned by bin_counts(),
combined with EXPECT_EQ, but that would add boilerplate and
we might not get an informative error message (which element(s)
weren’t equal, but how much).
EXPECT_THAT is sort of a generalized
EXPECT_EQ where the second argument is a
Matcher object that performs a specific type of comparison
against the expected value. We’ve used the one designed to check for the
equality of two containers, which might not seem like much, but we get a
lot of information on failure, e.g. with
CPP
TEST(HistogramConstruction, AllBinsInitiallyZero) {
Histogram h(10, 0.0f, 1.0f);
std::vector<float> expected(10, 0.0f);
expected[3] = 1.0f; // deliberate wrong value;
EXPECT_THAT(h.bin_counts(), ::testing::ContainerEq(expected));
}
we’ll get failure output:
BASH
[ RUN ] HistogramConstruction.AllBinsInitiallyZero
/Users/benmorgan/tmp/pix/ccptepp-test/test/test_histogram.cpp:26: Failure
Value of: h.bin_counts()
Expected: equals { 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 }
Actual: { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, which doesn't have these expected elements: 1
[ FAILED ] HistogramConstruction.AllBinsInitiallyZero (0 ms)
- The name
HistogramConstructiongroups all construction-related tests in a clear suite. - We see a use case for
EXPECT_NO_THROW: valid inputs should not throw! - GoogleTest’s Matchers from its GMock Component help to write tests more easily and expressively when dealing with more complex assertions.
Challenge
The documentation says bin_edges() returns a vector of
length n_bins + 1.
- Write a test that verifies this for a histogram with 10 bins.
- Write a test that checks the first and last edges are equal to
x_minandx_maxrespectively.
CPP
TEST(HistogramConstruction, BinEdgesHasCorrectSize) {
Histogram h(10, 0.0f, 1.0f);
EXPECT_EQ(h.bin_edges().size(), 11);
}
TEST(HistogramConstruction, BinEdgesHaveCorrectExtremes) {
Histogram h(10, 0.0f, 1.0f);
auto edges = h.bin_edges();
EXPECT_EQ(edges.front(), 0.0f);
EXPECT_EQ(edges.back(), 1.0f);
}
We’ve chosen to be a bit strict here and use EXPECT_EQ
rather than EXPECT_FLOAT_EQ. The upper and lower bounds are
nominally “constants” after construction so we’d expect to get them back
exactly as we input them. This is subtle, and
EXPECT_FLOAT_EQ would also have been valid here. It’s never
bad to start with strict bounds though, false positives (failures) are
better than false negatives (passes).
Step 2: Does Histogram filling behave as specified?
With construction cases handled, let’s move on to testing fill operations, starting with single bins:
CPP
TEST(HistogramFill, SingleFillIncreasesCorrectBin) {
Histogram h(10, 0.0f, 1.0f); // bins: [0,0.1), [0.1,0.2), ...
h.fill(0.35f); // should land in bin 3
EXPECT_EQ(h.bin_counts()[3], 1.0f);
EXPECT_EQ(h.n_entries(), 1);
}
TEST(HistogramFill, SingleFillLeavesOtherBinsZero) {
Histogram h(10, 0.0f, 1.0f);
h.fill(0.45f);
std::vector<float> expected(10, 0.0f);
expected[4] = 1.0f;
EXPECT_THAT(h.bin_counts(), ::testing::ContainerEq(expected));
}
These are deliberately separate — the first checks that the right bin
was incremented, the second checks that no other bin was affected. A
combined test that checked both in a single TEST() would be
harder to diagnose on failure. Again, we are being strict with our
floating point numbers as we know calculations are only involving
0.0f and 1.0f.
Challenge
The documentation says that a value passed to fill that
is less than x_min is treated as underflow. Write a test
that verifies this. Think carefully about what you need to check — there
may be more than one assertion worth making.
There are actually three assertions we can make here:
CPP
TEST(HistogramFill, ValueBelowXMinIsUnderflow) {
Histogram h(10, 0.0f, 1.0f);
h.fill(-0.1f); // below x_min — should be underflow
EXPECT_EQ(h.n_underflow(), 1);
EXPECT_EQ(h.n_entries(), 1);
EXPECT_THAT(h.bin_counts(), ::testing::ContainerEq(std::vector<float>(10, 0.0f)));
}
This is where we need to read the specification carefully to
understand all of the postconditions. We can argue this
Histogram is designed somewhat oddly, but it is what we
were given.
Point out that this challenge requires careful reading of the Doxygen
— specifically that underflow increments n_underflow() and
n_entries() but does not affect bin_counts().
Students who miss the n_entries() assertion are leaving
part of the contract untested. We are deliberately testing only
underflow here, not overflow. The overflow branch will be left uncovered
and discovered in episode 10. Do not draw attention to this omission —
let the coverage report make the discovery.
The fill operation can also take a weight, so let’s implement a corresponding test case for this
CPP
TEST(HistogramFill, WeightedFillProducesCorrectCounts) {
Histogram h(10, 0.0f, 1.0f);
h.fill(0.1f, 0.5f);
h.fill(0.6f, 1.5f);
std::vector<float> expected(10, 0.0f);
expected[0] = 0.5f;
expected[5] = 1.5f;
EXPECT_THAT(h.bin_counts(), ::testing::Pointwise(::testing::FloatEq(), expected));
}
This is much the same as the unweighted case, but we have swapped
over to using a different Matcher. As weighted counts are going to
involve sums and multiplications, we may start to run into floating
point precision issues. ContainerEq is basically doing an
EXPECT_EQ on corresponding pairs of elements in the actual
and expected collections. Pointwise allows us to do this
but specify an extra Matcher to do this comparison - the equivalent to
EXPECT_FLOAT_EQ here is FloatEq, and we could
also get EXPECT_FLOAT_NEAR behaviour with
FloatNear, which takes the tolerance as a constructor
argument:
CPP
// If we used `FloatNear` instead.
EXPECT_THAT(h.bin_counts(), ::testing::Pointwise(::testing::FloatNear(0.01), expected));
Challenge
- Write a test case that fills the same bin twice with different weights and checks the total count in that bin.
- Write a test case that verifies
n_entries()counts all fills including those with weights other than 1.0.
-
Depending on the floating point values you used:
CPP
TEST(HistogramFill, MultipleWeightedFillsAccumulate) { Histogram h(10, 0.0f, 1.0f); h.fill(0.25f, 0.1f); h.fill(0.25f, 0.2f); // same bin, different weight std::vector<float> expected(10, 0.0f); expected[2] = 0.3f; EXPECT_THAT(h.bin_counts(), ::testing::Pointwise(::testing::FloatEq(), expected)); } -
This is also a postconditions test:
Step 3: Is the Histogram mean calculated correctly after filling?
We’ve tested construction and filling of Histogram, so
we should now check that the mean value is calculated correctly from the
filled data. Let’s start with a simple unweighted symmetric case:
CPP
TEST(HistogramMean, MeanOfSymmetricFillsIsNearCentre) {
Histogram h(10, 0.0f, 1.0f);
h.fill(0.2f);
h.fill(0.8f);
EXPECT_NEAR(h.mean(), 0.5f, 1e-5f);
}
Now the weighted fill case:
CPP
TEST(HistogramMean, MeanIsUnweighted) {
Histogram h(10, 0.0f, 1.0f);
h.fill(0.2f, 10.0f); // large weight — should not affect mean
h.fill(0.8f, 1.0f);
// unweighted mean of {0.2, 0.8} = 0.5, regardless of weights
EXPECT_NEAR(h.mean(), 0.5f, 1e-5f);
}
We’re following what the specification tells us here, that an unweighted mean is calculated! The point here is not to worry (yet!) whether this is good design, but testing to specification first before thinking about refactoring.
Challenge
The documentation says that mean() excludes out-of-range
values.
- Write a test that fills one in-range value and one underflow value
and verifies that
mean()reflects only the in-range fill. - What does this tell you about the relationship between
mean()andn_entries()?
-
Again, depending on your choice of filling:
Per the specification,
n_entries()actually returns the total number of fills, not how many are in the range. This is slightly subtle detail of the specification.
The challenge asks specifically about underflow rather than overflow.
This is deliberate — using overflow here would cover the overflow branch
of fill() and remove the gap we need for episode 10.
The follow-up question about mean() and
n_entries() has no single correct answer but should prompt
students to articulate that n_entries() counts all fills
while mean() uses only in-range values — a subtle but
important aspect of the contract.
We now have a substantial test suite for Histogram, and
we’ve been able to do that entirely from the header file and the
documentation of its interface. Unless we encountered problems, we
probably haven’t had to read its actual implementation. However, writing
the tests required some intepretation - is the mean weighted or
unweighted? what does n_entries() actually count? These are
not testing decisions — they are specification decisions made by the
author of Histogram. The tests are forcing us to read and
understand the contract carefully, which is useful regardless of whether
the tests ever catch a bug. It also illustrates that writing down these
specifications and contracts for our own code is valuable in helping us
decide what to test, once again reinforcing the symbiotic nature of
documentation and testing in software development.
- A stateful class is testable if its state is explicit and controlled through a well-defined interface — the difficulty arises from global state, not from state itself.
- Reading the specification before writing tests is not optional — it determines what the tests should assert and makes any ambiguities obvious.
- Each test case should verify one behaviour — if a test needs “and” in its name it is probably two tests
- GoogleTest provides helpers in GMock for more complex checks.