Not all verification methods require truth-results assessment. Benchmarking and consistency assessment are crucial strategies for:

non-truthy pipelines
truthy pipelines with uncharacterized or partially characterized datasets
regression testing of stable characteristics
any pipeline undergoing computational/non-analytical performance optimization

Benchmarking

The concept of "benchmarking" or "baselining" is common in software and informatics testing. Essentially, a (potentially complex) output from a stable version of the software is set as the benchmark, and results from execution of each new version are compared to ensure that they match.

Benchmark Data values can be assigned to a datasource from any metadata that is generated from post-processing of output files and output folders from that datasource's execution.

If a benchmark value is expected to be different across different Workflow Variants, you can create per-variant Benchmark Data. Otherwise, benchmarks will be evaluated for all variants.

Consistency Assessment

Similar to benchmarking, consistency assessment evaluates whether specific execution outputs match each other for a given set of conditions. However, this technique is used to assess matching results within a version, not matching to an established value. For example, the checksum of an output file may be expected to change from version to version as a workflow's underlying software and algorithms change. However, in a deterministic workflow, we would expect that the results would be consistent when running two executions of the same version with the same parameters on the same inputs. This is consistency.

There are two main types of consistency assessment:

Consistency within replicate executions of a single workflow variant
Consistency across multiple workflow variants -- for example, across multiple different thread settings

You may configure as many Consistency Monitors as you like for your workflows to capture multiple conditions. Consistency Monitors define the workflow variants and datasources to be evaluated with each version, and can be configured to have a minimum number of trials per condition to ensure robustness.

Pipelines

Workflows and Pipelines

Datasets

Workflow Automations

Building a Body of Test Data