A challenging fact of working on the RJMetrics batch computation framework and data warehouse is that little information can be assumed at compile time. RJMetrics consumes data from a variety of disparate database platforms, applies flexible computations to ingested data, and produces new data to give our clients the insights they need to make decisions. Generic data processing frameworks, similar to our software, have a hard time negotiating contracts ahead of time about schemas, load size, computations, and the like. All of these aspects need to flex drastically and vary widely among system tenants. The inherent properties of such systems make it an anti-pattern to anticipate structure before run-time.

Consequently, testing components like our computation and storage layers becomes a critical exercise in creativity. The number of dimensions in the input space explodes as more diverse clients use our product. These forces shape our testing effort to aggressively cover the input space.

One of the approaches we take in validating our backend is generative testing. Popular in the functional programming space (Quickcheck), generative testing offers a philosophy that sets it apart from other techniques.

Traditional, example based validation permeates the minds of developers from the moment the topic of testing is introduced. Example based testing is about picking an input value, running it through a function, and verifying that the output was a specific value. This technique works well enough for some problems. It offers concrete expectations about the outputs of a function under test. When given input “i”, function “f” will output “o”.

(is (= (+ 40 2) 42))

There are some drawbacks to this approach that makes one “reach” for more. I’ll focus on one drawback of example based testing in this post.

In example based testing, the inputs and outputs need to be crafted by a human. And this is a problem – we want to cover as much of the input space as possible. It’s unreasonable for a human to test more than a very small slice of all possible inputs. It’s impractical in terms of time, cost, and offers progressively less value in terms of understandability to the maintenance programmer as the number of examples grows.

Two opposing forces seem to be at work here. I’ve stated that it’s good to cover as much of the input space as possible, but bad to create many examples.

Enter generative testing. Generative testing differs from example based testing in at least two respects:

1. The inputs to the function under test are not hand-crafted by a human.
2. The output of the function under test is verified against invariants, not values.

We’ll look at each of these, respectively. I’ll introduce an example to ease understanding.

The canonical example of generative testing is verifying the + (addition) function. + takes two arguments, a and b, both of which are integers. + returns a third integer, c.

(+ 2 3) ;; 5
(+ 5 -7) ;; -2

Instead of hand-crafting values to be supplied to a function, generative testing uses a formal specification of the range of possible input values. The inputs to the + function can be formally specified by “a: any integer, b: any integer”, (where ‘any integer’ is platform specific).

Likewise, instead of verifying that the output of a function is a specific value, we create invariants, or axioms. Invariants are assertions of truth for all inputs in the formal input space. An invariant for the + function is that for any integers and a and b, c is also an integer.

Stuart Halloway of the Clojure community developed a library called test.generative that ships the machinery to carry out this style of testing. Here’s a test.generative example that the integers are closed under addition:

(defspec integers-closed-over-addition

  • ;; input fn
    [^int a ^int b] ;; input spec
    (assert (integer? %))) ;; 0 or more validator forms

Being able to formally specify the inputs to the function opens a powerful door. Our testing approach now has the opportunity to programmatically generate as much of the input space as we’d like to verify against our chosen invariants. In other words, a tremendous amount of assertions can be verified without a human needing to intervene.

We gain leverage through this approach by no longer thinking about how many tests are running, rather what the “intensity” and “duration” of the test suite is. Referentially transparent functions can be tested in parallel, expanding over all the cores on the testing box. Greater intensity yields higher core usage, and therefore greater consumption of the input space.

Generative testing should frequently be supplemented with some degree of example based testing. An implementation of the + function that always returns the integer 0 is incorrect, yet it passes the integer-only type axiom.

Using this approach, we are able to generate and uniquely compose the different aspects of the input space for our problem:

  • Schemas – of all sizes and data type compositions are generated to form a specification of the shape of client data.
  • Data – leveraging the schema, extremes of data type ranges and edge cases are formed as artifacts of the using programs to generate large amounts of input data.
  • Computational composition – RJMetrics let’s you transform your data to answer questions. Arbitrary compositions of transformations can be applied to the schema and data to produce new data. Invariants are applied to the result of this transformation.

The result of this set up is a powerful technique to answer questions about the implementation of our data warehouse through axioms.

Combining example based and generative testing yields a tremendous leap in the things we can verify in our software. Just as we design with different styles at different layers, we must also apply different testing techniques to different components. The deeper our toolbox of testing styles is, the better equipped we are to solve hard problems – and be confident that we’re coming up with correct solutions.