Skip to main content
Datasets

What are datasets, how do I use them, and how do I add them?

Gwenn Berry avatar
Written by Gwenn Berry
Updated over 4 months ago

What's a Dataset?

Datasets are key to your verification journey -- they are the inputs that your pipelines execute on, and are the core unit of information that verification is performed against.

You're probably familiar with executing your pipeline on various input files. Maybe you call these files samples, inputs, subjects, or something else. Datasets are the representation of these input files in Miqa, with the added benefit of structured and freeform metadata, tagging, and associated expectations and truths.

In a truthy or expectation-based pipeline, your execution outputs will be compared against the expectations and truths configured for the dataset the execution was performed on. Upon each execution, the results are compared against the dataset's truth information and meta-expectations, and per-dataset and overall accuracy and benchmarking statistics are automatically generated.

But even if you don't have truths or expectations, you can still derive meaningful information by examining and reporting on your results at a dataset level.

For example:

  • Reporting average summary statistics per tag or segmented on a dataset property

  • Exploring outlier summary statistics by comparing results across datasets

  • Discovering systematic errors and artifacts by comparing overlapping results across unrelated datasets

  • Enabling apples-to-apples comparison between workflow versions by ensuring only outputs from the same datasets are included

  • Monitoring reproducibility and measuring consistency between executions within a workflow version, between workflow versions, between orthogonal workflows, or between technical or biological replicates

How do I add Datasets?

Add a new dataset for your current Pipeline by navigating to the Dataset page from the left panel and clicking the "+ Add New" button at the top left of the Dataset page to add a new test dataset. Follow the tutorials below to give it a try yourself.

From Cloud Storage

If your organization is configured to use your own cloud storage, you will need to place any necessary dataset files inside the Storage Bucket that is configured for your Pipeline or Organization. If you're not sure where that is, reach out to us!

After you place your files in the storage bucket, they will be automatically available in the file browser on the Dataset Creation page. You can search on the file name and view by sub-directories if you prefer a structured file organization.

By File Upload

This option is only recommended for small files and is only available when using Miqa storage. Instead of pointing to the file location in your designated cloud storage, you will upload the file directly to Miqa using the file browser.

Datasets Tutorial

Adding and Linking a File to Create a Dataset

Did this answer your question?