CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance

Table 2 Three tiers of metadata within Majora

Tier	Implementation	Properties	Example
Primary	Database model	● Fast queries via object-relational mapping ● Takes up space in database even if unused ● Significant work to add to the database model, API and user templates	● Biosample identifier ● Patient sex, age ● Digital resource file path, size, hash
Secondary	Database model	● Fast queries via object-relational mapping ● Additional lookups necessary to link back to the primary database model ● Cannot assume a primary model will have a secondary	● Cycle threshold metrics for biosamples ● BAM coverage metrics ● Patient healthcare worker or care home status
Tertiary	Key-value row in generic model	● More difficult to manage artifacts based on tagged properties alone ● Highly flexible ● No work required to add new tags at any time	● Locally relevant tags not implemented in a model ● Additional anonymised patient information ● Additional sequencing run information

Majora stores submitted metadata about artifacts and processes in an SQL database. Metadata is stored differently based on its priority. Fields that are a core part of a model (for example, a sample identifier, or the name of a file) are considered primary metadata and are stored in a distinct database model. Metrics such as the results of a PCR Ct test, or the coverage levels of a BAM are also stored in a distinct database model and are attached to primary models through a database foreign key. Arbitrary metadata can then be stored in key value pairs (not backed by any particular database model) and tagged to primary and secondary models as appropriate

ISSN: 1474-760X