Biobanking sales are projected to grow at a CAGR of 7.25% taking the value of the industry to close to $88 billion by 2032. The industry has been on an upward growth trajectory with the demand being fueled by a compendium of factors, including the paradigm shift toward personalized medicine. However, the value and potential of biobanking is only as good as the quality of its biospecimens and the associated metadata it holds.
Metadata refers to the different kinds of information and data that is linked to the biospecimens stored in a biobank. This is the information that accompanies the specimens provided to researchers. High-quality biobanking metadata enables researchers to have all the necessary sample information they need to perform the intended research effectively. In this blog, we review seven quality checks for biobanking metadata and how they tie in with the four key FAIR data principles.
1. Accuracy of Metadata
Metadata held in a biobank must be accurate. This means that it must provide the correct description of data and must be up-to-date at all times.
2. Completeness of Metadata
Metadata must be able to supplement data items held in the biobank, this is assessed by the percentage of metadata values that are “non-empty.” Complete metadata provides a better description of external data that makes it easier to infer meaning from data values.
3. Consistency of Metadata
All the metadata held in a biobank must “correspond to the same standard.” There must be a high degree of uniformity and coherence in all metadata.
4. Timeliness of Metadata
Metadata must be representative of the “current state of affairs” of the specimens and data that is held in the biobank. There should not be a lapse when a change is made to a data item; the metadata should always reflect such a change.
5. Provenance of Metadata
All metadata held in a biobank should be linked to its original source. It should also have information related to collection methods, standards, and any changes made in the process.
6. Reliability of Metadata
The reliability of metadata is dependent upon the collection method and the standards that are set in producing the metadata. Biobanks must use standard collection, storage, and processing methods to generate reliable metadata.
7. Conformance to Expectations
Metadata held in a biobank should be able to meet the expectations of those who need to use it, such as researchers and stakeholders. Metadata should not be produced with no end goal in mind.
Metadata that scores highly on the quality parameters mentioned above can be trusted and relied upon by researchers. Conversely, metadata that does not meet the quality parameters is considered to be of poor quality.
What are the FAIR Data Principles in Biobanking?
FAIR data principles were created in 2014 as guiding principles for data stewardship and they underpin high-quality scientific research. These principles highlight crucial preconditions for handling and sharing clinically relevant data and work in tandem with the metadata quality parameters mentioned above. They set measures to ensure that the metadata that is held in biobanks is of high-quality.
1. The Principle of Findability
Data in a biobank should be easily identifiable. This means that it needs to be accurately indexed and described in an unequivocal manner. Data sets should be organized in a standard and systematic format that makes the data easy to locate.
2. The Principle of Accessibility
Biobanks must have a pre-specified format or procedure for accessing data sets. There’s a need to establish procedures for authentication and authorizations needed for data access as well as protocols for data retrieval where appropriate. Ideally, this should be an automated process supported by a cloud-hosted biospecimen management software to enable 24/7 data accessibility in a secure manner.
3. The Principle of Interoperability
Data and metadata held in a biobank needs to be structured and expressed using standards that have been published. There’s a need to create semantic and technical data formats, ontologies, and variables to facilitate interoperability. The published data needs to be fair, traceable, and accessible.
4. The Principle of Reusability
This principle relates to all other fair data principles and specifies that data needs to be clear, findable, accessible, and maintain provenance. The data should be published with relevant data descriptions, have appropriate access and usage licenses, and adhere to the community standards.
How Does a Biospecimen Management Software Support the FAIR Data Principles?
A biospecimen management software enables biobanks to securely manage datasets and metadata. A biospecimen management software being interoperable, eliminates human error, ensuring that data is accurate. A biospecimen management software uses forms to ensure that the data is complete and presented in a standard format to whoever is accessing the data. A biospecimen management software database is searchable, enabling researchers to pull out the data they seek for their research. As a result, a biospecimen management software system can support easy accessibility, findability, and reusability of metadata.
Conclusion
The FAIR data principles provide a powerful framework for improving the discoverability and usability of research data. By implementing these principles in biobanking, biobankers can ensure that biological samples are used to their full potential, leading to new insights and breakthroughs in biomedical research. The metadata must be accurate, complete, consistent, up to date, reliable, conformable, and show provenance. Poor quality metadata impacts on the quality of the research.
A biospecimen management software can play a critical role in supporting the implementation of these principles, providing biobanks with the tools they need to manage and share their samples and sample data in a way that is both efficient and FAIR.