Social science and humanities data added to the COVID-19 Data Portal
The COVID-19 Data Portal now includes a section on Social Sciences and Humanities data which includes 600+ COVID-19-related studies from the CESSDA Data Catalogue. CESSDA is the first new source in the data portal coming from the BY-COVID project.
The onboarding was a result of collaboration between FSD, DANS and EKKE as CESSDA partners and EMBL-EBI (EMBL's European Bioinformatics Institute), who maintain the portal. The onboarding process followed an iterative process parallel with the development of the portal.
CESSDA metadata was mapped with the OmicsDI format required by the COVID-19 Data Portal. Additional metadata fields were added for social sciences.
The creation of the OmicsDI XML file from CESSDA was automated utilising CESSDA’s OAI-PMH endpoint to first harvest part of the metadata of all studies into DSpace1 to query for relevant studies and then harvesting full metadata of those specific studies.
Transforming metadata for the COVID-19 Data Portal used the common XSLT approach by harvesting from CESSDA’s OAI-PMH endpoint in DDI-Codebook 2.5 format. The process was fairly easy because the metadata available from CESSDA contains high-quality entries for all the mandatory fields for the portal (and also for many of the recommended fields).
Next, the query for metadata harvested from CESSDA will be improved to include more studies. The plan is also to reuse the same OmicsDI XML creation process for other metadata sources wishing to be added to the portal.
More information:
- The COVID-19 Data Portal
- CESSDA metadata in the portal
- BY-COVID project
- BY-COVID D3.2: Implementation of cloud-based, high performance, scalable indexing system (Zenodo).
1 DSpace is a web application, allowing researchers and scholars to publish documents and data. https://dspace.lyrasis.org/