Configuration

API configuration

The overall API service is configured using an appbase style configuration file.

The location of the configuration file is defined in the web.xml for the application. In the current code this is set up to load from /opt/dsapi/app.conf and fall-back onto an app.conf baked into the war.

A sample app.conf is shown below.

# app configuration

# Velocity library plugin for json handling
json                 = com.epimorphics.data_api.endpoints.LibJson

# Velocity engine
velocity             = com.epimorphics.appbase.templates.VelocityRender
velocity.templates   = {webapp}/WEB-INF/templates
velocity.root        = /
velocity.production  = false
velocity.plugin      = $json

# The source being queried
ssource              = com.epimorphics.appbase.data.impl.FileSparqlSource
ssource.files        = /opt/dsapi/data

# The dataset configurations
monitor              = com.epimorphics.data_api.config.DatasetMonitor
monitor.directory    = /opt/dsapi/conf
monitor.fileSampleLength = 500

# The API service
dsapi                = com.epimorphics.data_api.config.DSAPIManager
dsapi.source         = $ssource
dsapi.apiBase        = /dsapi
dsapi.monitoredDatasets = $monitor

A lot of this is basic wiring and shouldn't be changed.

Data source

This is the SPARQL data source which will be queried for all configured datasets.

The configuration:

ssource              = com.epimorphics.appbase.data.impl.FileSparqlSource
ssource.files        = /opt/dsapi/data

sets up an in-memory data source loading all the RDF files in the /opt/dsapi/data directory.

In the future we'll also want optional text indexing. For the in-memory source this can be specified by setting an ssource.textIndex value. This value should be default (to index rdfs:label) or a comma-separated list of predicates to index. These can use curies with the prefixes as defined in the application's prefix service though the index will always include rdfs:label.

To configure a local TDB instance use:

ssource              = com.epimorphics.appbase.data.impl.TDBSparqlSource
ssource.location     = /opt/dsapi/tdbstore
ssource.unionDefault = true
ssource.index        = /opt/dsapi/lucene-index

The index setting is optional and assumes a default index over just rdfs:label.

To configure a remote SPARQL endpoint use:

ssource              = com.epimorphics.appbase.data.impl.RemoteSparqlSource
ssource.endpoint     = http://localhost:3030/mydata

Multiple data sources

You can support multiple data sources by configuring multiple SparqlSource instances (each needs a different name in the configuration file), then configure the API by using:

dsapi.sources         = $ssource1, $ssource2, ...

(note that dsapi.sources is plural in this case).

When there are multiple sources available the first source in that list is treated as a default. All configured datasets will use the default source unless they include an explicit dsapi:source declaration.

Dataset configuration monitoring

The block:

# The dataset configurations
monitor              = com.epimorphics.data_api.config.DatasetMonitor
monitor.directory    = /opt/dsapi/conf
monitor.fileSampleLength = 500

sets up a monitor on the directory /opt/dsapi/conf from which data set configurations (see below) will be loaded.

This will scan for changes every 2 seconds. The change the scan interval use:

monitor.scanInternal = xxxx     # interval in ms

The mysterious monitor.fileSampleLength setting tells the monitoring to also checksum the first 500 bytes of each configuration file when checking for changes. Without this then only the date stamp and file length are used. For manual development purposes that is fine.

To not scan at all, but just load the configurations once at application startup then set:

monitor.productionMode = true

Dataset configuration

A single data services API can provide multiple "data sets" which are each different projections fdsapi.source = $ssourcerom the underlying sparql source. These data sets may be really distinct (e.g. different Data Cubes) or maybe different views onto the same data (e.g. different aspect defintions).

Better name than "data set"?

Each "data set" is configured using a separate RDF file placed in the configuration directory as defined above.

The RDF file defines the data set metadata and its aspects, either in line or by referencing qb:DataSet and/or qb:DataStructureDefinition instances in the data source. In the latter case then the relevant data set, DSD and component definitions will be fetched from the data source and need not be replicated in the configuration file.

An example configuration file is:

@prefix classification: <http://environment.data.gov.uk/def/classification/> .
@prefix owl:   <http://www.w3.org/2002/07/owl#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix dct:   <http://purl.org/dc/terms/> .
@prefix qb:    <http://purl.org/linked-data/cube#> .
@prefix wb-classification: <http://environment.data.gov.uk/def/waterbody-classification/> .
@prefix wfd: <http://location.data.gov.uk/def/am/wfd/> .
@prefix dsapi: <http://www.epimorphics.com/public/vocabulary/dsapi#> .
@prefix :  <http://www.epimorphics.com/test/dsapi/sprint2#> .

:wbclass a dsapi:Dataset;
    rdfs:label "Waterbody classifications";
    dct:description "A data cube of waterbody classifications from EA catchment planning pilot";
    dsapi:qb_dataset <http://environment.data.gov.uk/data/waterbody/classification/dataset>;
    dsapi:qb_dsd wb-classification:classificationDSD;
    .

This creates a data set which will be accessed as wbclass and provides a label and description.

The base level query (which defines the set of resources belonging to the data set) is implicit, it will be any resource whose qb:dataSet property matches the given dsapi:qb_dataset value.

The structure of the data set will be derived by looking the DSD given by the dsapi:qb_dsd value which will be retrieved from the data source and each cube component (dimension, measure, attribute) will be turned into an Aspect definition.

For a well-formed Data Cube dataset you would normally only need the dsapi:qb_dataset value and the API machinery would then be able to fetch the corresponding DSD and process that. In the Sprint2 data the actual dataset declaration was missing hence the need to specify the DSD explicitly.

Note: all the prefixes defined in the configuration file will be used to generate prefix contractions for the values returned by, or queried by, the API.

To illustrate defining Aspects of a data set in-line here's another example (prefixes omitted):

:games a dsapi:Dataset;
    rdfs:label "Games";
    dct:description "Kers games database";
    dsapi:baseQuery "" ;
    
    dsapi:aspect 
        [ dsapi:property rdfs:label; rdfs:label "label"; dsapi:multivalued true ],
        [ dsapi:property rdf:type;   rdfs:label "type"],
        [ dsapi:property egc:players; rdfs:label "players"; dsapi:multivalued true ],
        [ dsapi:property egc:pubYear; rdfs:label "publication year"],
        [ dsapi:property egc:playTimeMinutes; rdfs:label "playing time"; dsapi:optional true ];
    .

In this case the base query is given explicitly but it's empty. This is because the only resources which have values matching the required aspects are legal resources in the games data set.

Configuration vocabulary

The vocabulary namespace is http://www.epimorphics.com/public/vocabulary/dsapi#

Data sets

ID	Description
`:Dataset`	Specification for a data set to be accessed through the data services API. Needs either a `:qb_dataset` or a `:baseQuery` to define the contents of the dataset. The structure is taken from either the `:aspect` definitions given here, from directly reference DSD or implicitly from the DSD associated with the QB dataset. Should have an `rdfs:label` and optionally a `dct:descripion` and a `skos:notation` (to give the shortname).
`:qb_dataset`	Indicates a Data Cube DataSet whose observations are the contents of this dsapi data set.
`:qb_dsd`	Indicates a Data Cube DataStructureDefinition defining the aspect structure of the data set.
`:baseQuery`	Gives the textual source of a SPARQL BGP which will bind any member of the dataset to the ?item variable. Will be injected into the query in a `{ ... }` block. Can be a complete SELECT query in which it will act as a sub-query.
`:literalType`	Names the types whose values are literals rather than resources.
`:aspect`	Indicates a locally-configured 'aspect' of the data set.
`:codeList`	indicates this dataset is a projection of a code list, the value should be the URI of a `skos:ConceptScheme`, `skos:Collection` or `qb:HierarchicalCodeList` (note that the latter can be used to map any hierarchy by providing a set of roots and a parent-child or child-parent relationship)
`:source`	Gives the name of the data source to be queried by this data set. This name normally corresponds to the name of the configuration variable in the `app.conf` file.

In-line aspect definition

ID	Description
`:Aspect`	Specification of a single aspect of the data set. Should include a `rdfs:label` and optionally a `dct:description`, `rdfs:range`, `skos:notation`.
`:optional`	Set to true if the aspect is optional, default is false.
`:multivalued`	Set to true if the aspect can have multiple values, default is false.
`:propertyPath`	Source text of a SPARQL property path expression that links an element of the data set to the aspect value. May use prefixes defined for this dataset. See note below on naming.
`:property`	Property which links a data set element to this aspect. Default label, description and range information may be found on this resource.
`:rangeConstraint`	Indicates limits to the range of values which will be present for this aspect.
`:rangeDataset`	Gives the short name of a dataset. All the legal values for this aspect are from the dataset. Primarily used for datasets which are code lists but can link to any form of dataset
`:codeList`	URI of a `skos:ConceptScheme`, `skos:Collection` or `qb:HierarchicalCodeList` from which all the legal values for this aspect are drawn. This will cause a hierarchical dataset to be declared to match the given code list and set the `rangeDataset` to point to that dataset.

Note on aspect naming : If the aspect definition is a URI resource then (the prefixed form of) that will be used as the aspect name. If the aspect is specified via a blank-node then the URI for the :property will be used. This means that if defining aspects with :propertyPath as blank-nodes then a URI for a :property is still required. In that case the :property is simply used for naming and is not expected to be present in the data (or its vocabulary).

Range constraints

ID	Description
`:RangeConstraint`	Constraint on the range of values which will be present for the corresponding aspect.
`:lowerBound`	Lowest value expected for a measure or other cube component.
`:upperBound`	Highest value expected for a measure or other cube component.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Contents

API configuration

Data source

Multiple data sources

Dataset configuration monitoring

Dataset configuration

Configuration vocabulary

Data sets

In-line aspect definition

Range constraints

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally