You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: elsarticle-template.tex
+76-36Lines changed: 76 additions & 36 deletions
Original file line number
Diff line number
Diff line change
@@ -388,11 +388,14 @@ \subsection{Analysis of the use of classes and properties}
388
388
\caption{Use of mandatory properties of the class Dataset in the EDP}
389
389
\end{figure}
390
390
391
-
Use of dct:identifier and adms:identifier
392
-
391
+
\textbf{Use of dct:identifier and adms:identifier}
392
+
\\
393
+
\\
393
394
DCAT-AP foresees two optional properties for identifying a dataset, \textit{dct:identifier} and \textit{adms:identifier}. While the first one is described as the main unique identifier, e.g. a URI, the second property is provided as a secondary identifier. In practice, the second is used in 100\% of the cases by the EDP itself while the first is in some cases provided by the catalogue of datasets harvested. In most of the cases assessed, the \textit{dct:identifier} follows the same format as the \textit{adms:identifier}. This tends to confirm that the EDP provides a \textit{dct:identifier }and a dataset URI for all the datasets which do not have one already.
394
-
395
-
Use of dct:spatial
395
+
\\
396
+
\textbf{Use of dct:spatial}
397
+
\\
398
+
\\
396
399
DCAT-AP mandates to use controlled vocabularies for the property \textit{dct:spatial}, independently from the class Dataset or Catalogue, for named places. The list of controlled vocabularies is:
397
400
398
401
\begin{itemize}
@@ -448,7 +451,9 @@ \subsection{Analysis of the use of classes and properties}
448
451
\caption{Use of dct:spatial for the class dcat:Catalog}
449
452
\end{longtable}
450
453
451
-
Use of dct:accrualPeriodicity
454
+
\textbf{Use of dct:accrualPeriodicity}
455
+
\\
456
+
\\
452
457
We analysed the use of the property \textit{dct:accrualPeriodicity} which indicates the frequency at which a dataset updated by its owner. From the different periodicities with their use, the most used frequencies are visualised below:
453
458
454
459
\begin{figure}[!h]
@@ -457,22 +462,28 @@ \subsection{Analysis of the use of classes and properties}
457
462
\end{figure}
458
463
459
464
Daily, continuously, monthly and annually represent the most used frequency for updating datasets. One important point to notice concerns the quantity of duplicated frequencies, e.g. IRREG and IRREGULAR. For example, some datasets are described using the authority code (i.e. IRREG) provided by the frequency authority list from the Publications Office\footnote{\href{ http://publications.europa.eu/mdr/resource/authority/frequency/html/frequencies-eng.html }{ http://publications.europa.eu/mdr/resource/authority/frequency/html/frequencies-eng.html }} while other datasets use the label from the same authority list (i.e. IRREGULAR).
460
-
461
-
52\% of the datasets provide a provenance
465
+
\\
466
+
\\
467
+
\textbf{52\% of the datasets provide a provenance
468
+
}\\
469
+
\\
462
470
The provenance is supposed to be used as for providing a statement about the lineage of a dataset. In a guideline on how to model and express provenance\footnote{\href{ https://joinup.ec.europa.eu/release/dcat-ap-how-model-and-express-provenance}{ https://joinup.ec.europa.eu/release/dcat-ap-how-model-and-express-provenance}}, the following recommendation was made: \textit{‘As the provision of provenance information is not wide-spread and information in free text does not allow further processing, the usefulness of such information in (international) harvesting is questionable and the information may be ignored. Local implementations are of course free to provide provenance information satisfying local requirements.’ }However, the guideline recognises the potential of such information: \textit{‘It could support credibility of a dataset to know which organisation created the metadata for it in the first place and how the description was modified along a chain of exchanges.’}
463
471
464
472
In practice, the different uses of provenance specified above were observed which demonstrates the fact that the use of this property is not standardised. For example, a dataset\footnote{\href{ https://www.europeandataportal.eu/data/dataset/00dfaddf-f2f0-487a-b28e-aad53a318521}{ https://www.europeandataportal.eu/data/dataset/00dfaddf-f2f0-487a-b28e-aad53a318521}} describes the provenance by mentioning the context and the organisation responsible for the creation of the dataset (translation from Dutch): \textit{‘Label: This map was established on the basis of an inventory during the establishment of the provincial policy for recreational and professional use of waterways.’}
465
473
Moreover, the provenance is also used to keep track of the modifications applied to a dataset. One example is \url{https://www.europeandataportal.eu/data/dataset/de-pangaea-dataset864016} with: \textit{‘Label: The data set was checked for completeness, correctness, and consistency of metainformation. Validity of used methods was checked and - if applicable - precision and range of data.’}
466
474
Consequently, we would recommend keeping the existing guideline, as it is still relevant.
467
-
468
-
dcat:Distribution
475
+
\\
476
+
\\
477
+
\textbf{\textit{dcat:Distribution}}
469
478
470
479
\begin{figure}[!h]
471
480
\includegraphics{replace23.png}
472
481
\caption{Use of mandatory properties of the class Distribution in the EDP}
473
482
\end{figure}
474
483
475
-
Relationship between accessURL, downloadURL and Distributions
484
+
\textbf{Relationship between accessURL, downloadURL and Distributions}
485
+
\\
486
+
\\
476
487
As the figure above shows, almost all distributions respect the mandatory property \textit{dcat:accessURL}. There was no case for which a distribution had no \textit{accessURL} and no \textit{downloadURL}.
477
488
478
489
We also looked at the use of the two properties combined, to confirm if this use is differentiated or not. In practice, 16.866 distributions queried have different URLs for access and download while 253.006 distributions\footnote{The number of distributions using downloadURL is slightly different than in the percentage of the visual below (optional property dcat:downloadURL) due of the time gap between queries.} have the same URL. This shows that the high majority of the distributions using the two properties do not provide different information for both properties but simply copy twice a downloadURL, as explained in the guideline ‘How to use accessURL and downloadURL?’\footnote{\href{ https://joinup.ec.europa.eu/release/how-use-accessurl-and-downloadurl }{ https://joinup.ec.europa.eu/release/how-use-accessurl-and-downloadurl }}.
@@ -482,7 +493,9 @@ \subsection{Analysis of the use of classes and properties}
482
493
\caption{Use of recommended properties of the class Distribution in the EDP}
483
494
\end{figure}
484
495
485
-
Use of licences
496
+
\textbf{Use of licences}
497
+
\\
498
+
\\
486
499
Only 27.86\% of the distributions have a licence property defined. This can be a barrier for the reuse of the open data, as many potential users might not take the risk of using distributions without knowing under which conditions they can do it. The EDP also adds that 90\% of the licences of the datasets on the portal are unknown\footnote{\href{ https://www.europeandataportal.eu/mqa-service/en}{ https://www.europeandataportal.eu/mqa-service/en }}. The portal considers a licence as unknown if it is not part of the list of licences provided by CKAN\footnote{\href{ https://www.europeandataportal.eu/en/licence-assistant}{ https://www.europeandataportal.eu/en/licence-assistant }}. Using known licences for the datasets would greatly simplify the work required for potential users before deciding if they can use specific datasets or not.
487
500
In general, we also found that the distributions providing a known licence are compliant with the guideline on \textit{‘How to refer to licence documents and licence URIs?’}\footnote{\href{https://joinup.ec.europa.eu/release/dcat-ap-how-refer-licence-documents-and-licence-uris}{ https://joinup.ec.europa.eu/release/dcat-ap-how-refer-licence-documents-and-licence-uris}}. The guideline specifies that licences should always be identified with URIs which should resolve to the description of the licence.
488
501
@@ -491,9 +504,10 @@ \subsection{Analysis of the use of classes and properties}
491
504
\caption{Use of optional properties of the class Distribution in the EDP}
492
505
\end{figure}
493
506
494
-
Relationship between dct:title, dct:format and dcat:mediaType
507
+
\textbf{Relationship between dct:title, dct:format and dcat:mediaType}
508
+
\\
509
+
\\
495
510
DCAT-AP recommends to use:
496
-
497
511
\begin{itemize}
498
512
\item\textit{dct:format} to give information about the file format of the distribution;
499
513
\item\textit{dcat:mediaType}, as a subproperty of dct:format, to follow the official register of media types managed by IANA\footnote{\href{ https://www.iana.org/assignments/media-types/media-types.xhtml}{ https://www.iana.org/assignments/media-types/media-types.xhtml}}; and
@@ -505,8 +519,11 @@ \subsection{Analysis of the use of classes and properties}
505
519
Despite this misuse, when looking at the combined use of \textit{dct:title} and \textit{dcat:mediaType}, most of the distributions observed use appropriately the two properties: \textit{dct:title} as a name for the distribution and \textit{dcat:mediaType} with a value from IANA.
506
520
507
521
Some members of the DCAT-AP community have expressed a preference to use only \textit{dct:format} with IANA media type. One reason for using IANA media types, is that you can express the \textit{innerMimeType} for Zip-Files. On the other hand, \textit{dct:format} is more flexible. Even though IANA does not include all geospatial values, they can be added to the OP list (\textit{dct:format}). Chapter 4.2 goes deeper in the analysis of the controlled vocabularies for \textit{dct:format} and \textit{dcat:mediaType}.
508
-
509
-
dcat:CatalogRecord
522
+
\\
523
+
\\
524
+
\textbf{\textit{dcat:CatalogRecord}}
525
+
\\
526
+
\\
510
527
DCAT-AP defines the class Catalog Record as \textit{‘a description of a dataset’s entry in the catalogue’}. The analysis shows a 100\% use of mandatory properties and recommended properties\textit{ dct:issued }and \textit{adms:status}. Other properties are not used at Catalogue Record level.
511
528
512
529
\subsection{Analysis of controlled vocabulary use on the European Data Portal}
@@ -574,20 +591,28 @@ \subsection{Analysis of controlled vocabulary use on the European Data Portal}
574
591
\caption{Number of errors found per property using controlled vocabularies }
575
592
\end{longtable}
576
593
577
-
dct:language on dcat:Dataset
578
-
594
+
\textbf{\textit{dct:language on dcat:Dataset}}
595
+
\\
596
+
\\
579
597
The property \textit{dct:language} for the class \textit{dcat:Dataset} is used to indicate a language for a dataset. DCAT-AP requests implementers to provide a value from the MDR Languages Named Authority List\footnote{\href{ http://publications.europa.eu/mdr/authority/language/}{ http://publications.europa.eu/mdr/authority/language/}} such as \\
580
598
\textit{$<$http://publications.europa.eu/resource/ authority/language/ENG$>$} for English. In our sample, only one catalogue uses \textit{dct:language} for the class Dataset pointing correctly to the MDR Languages Named Authority List. The other catalogues are not specifying the language for the class \textit{dcat:Dataset}.
581
-
582
-
dct:accrualPeriodicity on dcat:Dataset
583
-
599
+
\\
600
+
\\
601
+
\textbf{\textit{dct:accrualPeriodicity on dcat:Dataset}}
602
+
\\
603
+
\\
584
604
Similarly, the property \textit{dct:accrualPeriodicity} is used to refer to the frequency at which the Dataset is updated. DCAT-AP requests to use the MDR Frequency Named Authority List\footnote{\href{ http://publications.europa.eu/mdr/authority/frequency}{ http://publications.europa.eu/mdr/authority/frequency}} , for example with a URI value as follows: \textit{$<$http://publications.europa.eu/resource/authority/frequency/ANNUAL$>$}. Among the 6 catalogues analysed, a single one provided information about the frequency of update, providing the correct value from the MDR Frequency Named Authority List.
585
-
586
-
dcat:theme on dcat:Dataset
605
+
\\
606
+
\\
607
+
\textbf{\textit{dcat:theme on dcat:Dataset}}
608
+
\\
609
+
\\
587
610
The property \textit{dcat:theme }refers to a category of the Dataset. A Dataset may be associated with multiple themes. As for the previous properties, a Controlled Vocabulary must be followed, the MDR data theme Named Authority List\footnote{\href{ http://publications.europa.eu/mdr/resource/authority/data-theme/html/data-theme-eng.html}{ http://publications.europa.eu/mdr/resource/authority/data-theme/html/data-theme-eng.html}} with a URI pointing to one authority code from the list. From our sample analysis, the Controlled Vocabulary is perfectly used by all 6 catalogues.
588
-
589
-
dct:publisher on dcat:Dataset and dcat:Catalog
590
-
611
+
\\
612
+
\\
613
+
\textbf{\textit{dct:publisher on dcat:Dataset and dcat:Catalog
614
+
}}\\
615
+
\\
591
616
The property \textit{dct:publisher} must follow the MDR Corporate bodies Named Authority List\footnote{\href{ http://publications.europa.eu/mdr/authority/corporate-body/}{ http://publications.europa.eu/mdr/authority/corporate-body/}} for which DCAT-AP specifies that it \textit{“must be used for European institutions and a small set of international organisations. In case of other types of organisations, national, regional or local vocabularies should be used”}\footnote{\href{ https://joinup.ec.europa.eu/release/dcat-ap-v11}{ https://joinup.ec.europa.eu/release/dcat-ap-v11 }}. As data portals can indicate national or subnational publishers, this property could not be verified automatically with the DCAT-AP validator. However, from manual processing, the following conclusions were found:
592
617
593
618
\begin{itemize}
@@ -596,7 +621,9 @@ \subsection{Analysis of controlled vocabulary use on the European Data Portal}
596
621
\item For the class \textit{dcat:Catalog}, all catalogues analysed also provide information about \textit{dct:publisher} using a URI pointing to \textit{foaf:name} and \textit{rdf:type}, therefore not using a Controlled Vocabulary as specified by DCAT-AP.
597
622
\end{itemize}
598
623
599
-
Wrongly used properties
624
+
\textbf{\textit{Wrongly used properties}}
625
+
\\
626
+
\\
600
627
The following properties were wrongly used by all catalogues in our sample, as explained in the following sections:
601
628
\begin{itemize}
602
629
\item\textit{dct:format} for \textit{dcat:Distribution};
@@ -606,15 +633,22 @@ \subsection{Analysis of controlled vocabulary use on the European Data Portal}
606
633
\item dcat:themeTaxonomy for dcat:Catalog.
607
634
\end{itemize}
608
635
609
-
dct:format on dcat:Distribution
610
-
636
+
\textbf{\textit{dct:format on dcat:Distribution}}
637
+
\\
638
+
\\
611
639
The \textit{dct:format} property must refer to the MDR File Type Named Authority List\footnote{\href{ http://publications.europa.eu/mdr/authority/file-type/}{ http://publications.europa.eu/mdr/authority/file-type/}} for describing the file format of a distribution. All 6 catalogues have datasets with the property \textit{dct:format}. For all instances identified using \textit{dct:format}, the MDR controlled vocabulary is not followed. Instead, the
612
640
value gives a URI pointing to an instance of \textit{dct:MediaTypeOrExtent} with the format described as a literal in \textit{rdf:label}, e.g. \textit{“WMS”}.
613
-
614
-
dcat:mediaType on dcat:Distribution
641
+
\\
642
+
\\
643
+
\textbf{\textit{dcat:mediaType on dcat:Distribution}}
644
+
\\
645
+
\\
615
646
\textit{dcat:mediaType }is a subproperty of \textit{dct:format} used to express the media type of the Distribution as defined in the official register of media types managed by IANA\footnote{\href{ http://www.iana.org/assignments/media-types/media-types.xhtml}{ http://www.iana.org/assignments/media-types/media-types.xhtml}}. Only one catalogue out of 6 uses \textit{dcat:mediaType}. The values provided for the subproperty are almost always correct, with a percentage error of 4\% (55/1365). For the errors identified, a media type not referenced in IANA was used, such as ‘xml/soap’.
616
-
617
-
dct:spatial on dcat:Dataset and dcat:Catalog
647
+
\\
648
+
\\
649
+
\textbf{\textit{dct:spatial on dcat:Dataset and dcat:Catalog}}
650
+
\\
651
+
\\
618
652
For \textit{dct:spatial} under the class \textit{dcat:Dataset}, multiple Controlled Vocabularies are requested by DCAT-AP, respectively:
619
653
\begin{itemize}
620
654
\item MDR Continents Named Authority List\footnote{\href{http://publications.europa.eu/mdr/authority/continent/
@@ -631,11 +665,17 @@ \subsection{Analysis of controlled vocabulary use on the European Data Portal}
631
665
\end{itemize}
632
666
633
667
The same use is expected from the catalogues for the class \textit{dcat:Catalog}. In the sample, the 6 instances analysed use an acronym (e.g. “XY”) of the country of the catalogue instead of pointing to the MDR Countries Named Authority List or another Controlled Vocabulary listed by DCAT-AP.
634
-
635
-
adms:status on dcat:CatalogRecord
668
+
\\
669
+
\\
670
+
\textbf{\textit{adms:status on dcat:CatalogRecord}}
671
+
\\
672
+
\\
636
673
The property \textit{adms:status }for the class \textit{dcat:CatalogRecord} refers tothe type of the latest revision of a Dataset's entry in the Catalogue. All the catalogues analysed use it and all the instances of this property, or 5.536, have the value \textit{“:modified”} in place of one from the list provided by DCAT-AP: \textit{:created, :updated, :deleted}. The ratio and the similar value among all the catalogues selected seem to demonstrate the use of this property by the EDP itself and not by the catalogue owners.
637
-
638
-
dcat:themeTaxonomy on dcat:Catalog
674
+
\\
675
+
\\
676
+
\textbf{\textit{dcat:themeTaxonomy on dcat:Catalog}}
677
+
\\
678
+
\\
639
679
The property \textit{dcat:themeTaxonomy} for the class \textit{dcat:Catalog} follows the same Controlled Vocabulary than \textit{dcat:theme} for the class Dataset. All catalogues in the sample do not appropriately use the property. All of them indicate the correct first part of the URI, \\
640
680
\textit{$<$http://publications.europa.eu/resource/authority/data-theme$>$}, but not the full URI expected with the authoritative code, such as \\\textit{$<$http://publications.europa.eu/resource/ authority/data-theme/\textbf{AGRI}$>$}. As described in section 3.1, \textit{dcat:themeTaxonomy} is populated by the EDP for all catalogues harvested.
0 commit comments