RDF – Project Haystack

RDF

OverviewGeneral Mapping RulesMarker TagsDatatypesValue TagsExporting InstancesPending Work

Overview

Ontological relationships are conveyed in the Haystack information model using defs. Haystack provides methods to query information about the relationships between defs, and to make inferences about these relationships. These defs are stored in a format, viz., Trio, that cannot be consumed by traditional semantic tools.

In order to make the Haystack ontology more accessible to users more familiar with standard tools for the semantic web, we define a mapping from Haystack defs to RDF. RDF is considered the "lingua franca" of the semantic web, so it is highly desirable to have a well-defined set of rules for exporting Haystack defs to RDF. We adopt the following high-level goals in defining this mapping:

  • Export defs as RDF statements in Turtle format.
  • Generate RDFS statements to add semantic meaning that is equivalent to the def.
  • Use OWL statements where they serve to increase the expressivity and usability of the model.

The rest of this document outlines a strategy for mapping Haystack defs to RDF.

General Mapping Rules

These are the basic rules for mapping defs to RDF. They are further refined in the sections that follow.

The symbol for a def becomes the subject of an RDF statement.

Each tag/value pair on a def becomes the predicate and object of an RDF statement respectively.

  • doc tags are mapped to rdfs:comment predicates.
  • Values of the is list become distinct predicate/object pairs

RDF requires that every resource be uniquely identified by an IRI. Every Def is a logical resource, so we need to map def symbols to IRIs. To construct an IRI from a symbol:

  1. Obtain the baseUri from the Lib the def belongs to.
  2. Append the version of the lib with an empty fragment
  3. Append the symbol name

For example, site is defined in the phIoT library. That library has the following def:

def: ^lib:phIoT
doc: "Project Haystack definitions for Internet of Things"
version: "4.0"
baseUri: `https://project-haystack.org/def/phIoT/`
depends: [^lib:ph, ^lib:phScience]

Therefore, the full IRI for the site def based on the above lib def is

https://project-haystack.org/def/phIoT/4.0#site

Consider the definition of a site:

def: ^site
is: [^entity, ^geoPlace]
mandatory
doc: "Site is a geographic location of the built environment"

A minimal RDF mapping for this def would be:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix ph: <https://project-haystack.org/def/ph/4.0#> .
@prefix phScience: <https://project-haystack.org/def/phScience/4.0#> .
@prefix phIoT: <https://project-haystack.org/def/phIoT/4.0#> .

phIoT:site is ph:entity, ph:geoPlace ;
    rdfs:comment "Site is a geographic location of the built environment" ;
    ph:mandatory ph:marker .

This is a valid RDF export, but it lacks the necessary statements to convey semantic and ontological meaning. The remaining sections refine the export rules for different def types to address this issue.

NOTE: For the remaining examples, we omit the RDF @prefix statements for brevity.

Marker Tags

Defs for marker tags are subtypes of marker via the is tag. See the section on Subtyping for details.

Marker tag defs become instances of an owl:Class.

The supertype tree defined by is maps to a set of rdfs:subClassOf statements. This establishes a subtyping relationship in the ontology.

Using these rules, the site def now becomes

phIoT:site a owl:Class ;
    rdfs:subClassOf ph:entity,
        ph:geoPlace ;
    rdfs:comment "Site is a geographic location of the built environment" ;
    ph:is ph:entity,
        ph:geoPlace ;
    ph:lib phIoT:lib:phIoT ;
    ph:mandatory ph:marker .

Datatypes

Haystack defines a number of data types (e.g. Str, Bool, DateTime). All data types in Haystack are subtypes of the val def. Scalar data types (scalar) are further refined as subtypes of val. This section describes the rules for defining scalar data types in the RDF export.

Only direct subtypes of scalar are defined as data types. They are declared as instances of the owl:DatatypeProperty class.

Exception: marker tags are not defined as data types since they convey subtyping information (see Marker Tags section).

We also declare scalar data types to be a subclass of the best XSD datatype using rdfs:subClassOf. The following rules map a Haystack scalar type to XSD data type:

  • bool: xsd:boolean
  • curVal: rdfs:Literal
  • date: xsd:date
  • dateTime: xsd:dateTime
  • number: xsd:double
  • ref: xsd:anyURI
  • symbol: xsd:anyURI
  • time: xsd:time
  • uri: xsd:anyURI
  • writeVal: rdfs:Literal
  • All other scalar types are xsd:string

Example:

ph:dateTime a owl:DatatypeProperty ;
    rdfs:subClassOf xsd:dateTime ;
    rdfs:comment "ISO 8601 timestamp followed by timezone identifier" ;
    ph:is ph:scalar ;
    ph:lib ph:lib:ph .

ph:number a owl:DatatypeProperty ;
    rdfs:subClassOf xsd:double ;
    rdfs:comment "Integer or floating point numbers annotated with an optional unit" ;
    ph:is ph:scalar ;
    ph:lib ph:lib:ph .

Value Tags

A value tag def is any tag def that is not a subtype of marker.

If the tag def is a subtype of ref or is a choice, then declare the def to be an instance of owl:ObjectProperty. All other value tags are declared to be an instance of owl:DatatypeProperty.

If the tag def (or one of its defx extensions) declares a tagOn association, then specify the rdfs:domain to be all referent entities.

If the tag def is a ref or choice, then specify the rdfs:range to the value of the of tag on the def.

If the def is a basic Tag Def (e.g. tz), then specify the rdfs:range to be the appropriate Haystack datatype (see Datatypes section).

// A Ref tag
phIoT:siteRef a owl:ObjectProperty ;
    rdfs:range phIoT:site ;
    rdfs:comment "Site which contains the entity" ;
    ph:is ph:ref ;
    ph:lib phIoT:lib:phIoT ;
    ph:of phIoT:site .

// A basic Tag Def with multiple domains
ph:tz a owl:DatatypeProperty ;
    rdfs:domain phIoT:site, phIoT:point ;
    rdfs:range ph:str ;
    rdfs:comment "Timezone identifier from standard timezone database" ;
    ph:is ph:str ;
    ph:lib ph:lib:ph .

// A choice
phIoT:conveys a owl:ObjectProperty ;
  rdfs:range phScience:phenomenon ;
  rdfs:comment "Equipment conveys a substance or phenomenon." ;
  ph:is phIoT:equipFunction ;
  ph:lib phIoT:lib:phIoT ;
  ph:of phScience:phenomenon .

Exporting Instances

To this point we have been primarily interested in exporting Defs to RDF. However, we can also export instance data to RDF. In Haystack, an instance is a Dict with an id tag. The following outlines the high-level rules for exporting instances (i.e. Dicts) to RDF.

  • Instances should be blank nodes. If possible, the label for the blank node should be the id of the instance.
  • Use rdf:type ("a") to indicate the entity is an instance of a particular owl:Class.
  • All marker tags should be specified using hasTag
  • Encode tag values as defined in the data types section.
    • Exception: If the tag is a Ref (e.g. siteRef) the value should be a blank node with a label that references the corresponding instance.
  • If we don't have a def for a tag, do not export it to RDF.

For example, here is the Trio for a site Dict:

id:@24192ca1-0c85f75d "Headquarters"
site
area:140797ft²
tz:New_York
dis:Headquarters
geoAddr:"600 W Main St, Richmond, VA"
geoCoord:C(37.545826,-77.449188)
hq
metro:Richmond
primaryFunction:Office
yearBuilt:1999

And here is the RDF export of that site instance:

_:24192ca1-0c85f75d
    a phIoT:site ;
    ph:hasTag site,
    phIoT:area 140797 ;
    ph:tz "New_York" ;
    ph:dis "Headquarters" ;
    ph:geoAddr "600 W Main St, Richmond, VA" ;
    ph:geoCoord "C(37.545826,-77.449188)" ;
    phIoT:primaryFunction "Office" ;
    phIoT:yearBuilt 1999 .

Pending Work

  • How to handle the unit of numeric tags in instance exports?
  • How to indicate inverse relationships?
  • How to indicate transitive containment?
  • Can we leverage other OWL statements to improve the ontology?