Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Excerpt of the survey that was set up for the classification task.

The annotators were told to assign only one category per given artifact. If an artifact is a compound noun, the nested entities such as adjectives or second nouns that further describe the term were provided for tagging as well. In this question, ‘CO2 fixation’ is an example for a two term artifact and ‘groundwater’ an example for a one term artifact.

More »

Fig 1 Expand

Fig 2.

The frequency of the categories and how often they were assigned to given phrases and terms, with and without QUALITY correction.

More »

Fig 2 Expand

Fig 3.

Fleiss’ Kappa values for the individual information categories (with QUALITY correction): a) for all artifacts b) for artifacts with one and two terms.

More »

Fig 3 Expand

Table 1.

Annotator’s agreement with QUALITY correction overall and for one term, two terms, three terms and more per artifact.

More »

Table 1 Expand

Fig 4.

Frequency of category mentions and inter-rater agreement with QUALITY correction.

More »

Fig 4 Expand

Table 2.

Metadata standards in the (life) sciences obtained from re3data [57] and RDA metadata standards catalog [58].

The number in brackets denotes the number of repositories supporting the standard (provided in re3data).

More »

Table 2 Expand

Table 3.

Comparison of metadata standards and information categories.

The categories are sorted by the frequency of their occurrence determined in the previous question analysis, the asterisk denotes the categories with an agreement less than 0.4.

More »

Table 3 Expand

Table 4.

Metadata schemes and formats offered by selected data repositories in their OAI-PMH interfaces.

More »

Table 4 Expand

Table 5.

The date stamps used for each metadata standard and their descriptions obtained from the standard’s website.

More »

Table 5 Expand

Table 6.

Total number of datasets parsed per data repository and metadata standards and schemata.

The numbers in brackets denote the number of datasets used for the analysis. All datasets were harvested and parsed in May 2019.

More »

Table 6 Expand

Fig 5.

Timelines for all repositories presenting the number of datasets per metadata standard and schema offered.

For several repositories, the timelines for the different metadata standards and schemata are almost identical and overlap. Obviously, when introducing a new metadata standard or schema, publication dates were adopted from existing metadata structures. Figshare’s timeline for RDF was computed separately as the data are too large to process it together with the other metadata files.

More »

Fig 5 Expand

Fig 6.

Metadata field usage in all data repositories evaluated.

The graphics display the percentage of metadata fields used per data repository and its best matching standard with respect to the information categories.

More »

Fig 6 Expand

Table 7.

Comparison of data repositories and their best matching standard with the information categories.

The categories are sorted by the frequency of their occurrence determined in the question analysis. The asterisk denotes the categories with an agreement less than 0.4.

More »

Table 7 Expand

Table 8.

Five most common keywords and their frequencies in the metadata field dc:subject.

The last row denotes the amount of files with an empty dc:subject field.

More »

Table 8 Expand

Table 9.

Filter strategies used per data repository to select 10,000 datasets.

The number in brackets denotes the total number of available datasets (OAI-DC standard) at the time of download (October/November 2019).

More »

Table 9 Expand

Table 10.

NLP analysis: Number of datasets with named entities (out of 10,000 processed files in a reduced OAI-DC structure) per repository.

Each file contains a subset of the original metadata, namely, dc:title, dc:description, dc:subject and dc:date.

More »

Table 10 Expand