Working Group

#551 Haystack Type System WG

Brian Frank Tue 17 Oct 2017

Overview

Haystack is designed around the concept of tagging entities with name/value pairs to describe facts about those entities. The formal definitions of these tags and their value types are captured in a machine readable format (Trio files) which is used to generate the tags section of this website. But how tags are combined lacks formal machine readable definitions. For example the description and constraints of how to model site/equip/point entities is largely described by documentation without a corresponding formal schema and machine readable format. Historically this has been by design since formalization of "compound types" introduces significant complexity. But with broader adaptation of Haystack, there seems to be a pent-up demand to formalize types/schema. We believe its time to tackle this problem, and would like to kick start a new working group.

I have spent several weeks designing various prototypes with help from Matthew Giannini. By way of this post, I will describe a fairly complete prototype which serves as a starting point for a proposal on how types might work in Haystack. The prototype defines most of the Haystack model using a type system I will discuss here. I have made the source code and the documentation it generates available for download (discussed below).

Requirements

Leverage Markers: we wish to leverage Haystack's existing and extensive use of markers as the basis for a more advanced data type system. We do not wish to introduce a new concept such as a "type" tag.

No Indirection: all data semantics should be captured in the entity's tags. You should not be required to have previous knowledge (such as a data dictionary) or make an additional network request to infer semantics. For example if a point currently uses discharge air temp sensor, then that will not be coalesced into some abstract "tag set" name that requires another request to know that all those tags were applied. Or put another way: entities will always continue to expand their full set of tags inline.

Tooling: a common use case for a more advanced type system is to allow tool manufactures to develop UIs that "guide" users to properly tag their data. Capturing tag relationships and rules in machine format is a key requirement for tooling

Validation: a machine readable schema allows validation of data models. But we acknowledge that type systems require a trade-off; more complex type systems are required to more fully validate data. And no declarative type system can perform 100% validation. We wish to strike a compromise with a practical type system that performs basic validation, but will not provide perfect validation.

RDF: it is desired that enhancements to Haystack allow our taxonomy to be expressed in alternate formats such as RDFS, RDFa, micro-data, JSON-LD, etc. These technologies are based on the concept of subject-predicate-object triples that map well to Haystack's entity name/value tags. And ideally we want to map Haystack types to the RDF Schema class model.

Source Definitions: the goal of this effort is to rewrite the project-haystack.org specification source material using the new definitions and formats as the authoritative source. The machine readable formats will be directly accessible over HTTP and also used to auto-generation the HTML presented on the site.

Observations

Lets begin with a couple observations of how the existing model works. There are essentially only four "root" entity types: sites, equips, points, and weather stations. All other Haystack tags are used to annotate these four core entity types with additional information.

There are three distinct ways we use tags to annotate the core entity types:

  • Has Tags: an entity may have specific value based tags. For example a site entity may apply the area tag to define the building's square footage. This sort of tag usage includes all the tags which are neither markers nor refs.
  • Subtyping Tags: we use marker tags to create subtypes to further refine the semantics of a given entity. For example adding sensor to a point entity narrows the type of point represented
  • Relationship Tags: we use ref tags to establish relationships between entities. For example adding equipRef on a point defines a equip/point containment relationship

In all three cases, what we really desire is to document the behavior of a specific combination of tags. This has been a pain point maintaining the documentation. For example lets take the water tag. It has a generic definition which means "associated with liquid water". But it also has more specific definitions when paired with point, meter, or tank. In our final solution, we want tags to be defined generically with more specific documentation as we combine tags.

Tag Based Subtyping

We add marker tags to an entity to indicate a more specific type of the entity. For example we add ahu to equip to mark the equipment as an air handler unit. We can further mark the AHU with steamHeat to indicate its an AHU using steam from a central plant for heating. Each time we apply a marker tag we further restrict what the entity type represents. From a type system perspective, this is a form of subtyping.

There are two key observations to be made about how marker tags are used for subtyping:

  • Subtypes are often defined as an exclusive choice: for example an AHU can have hotWaterHeat, steamHeat, elecHeat, or gasHeat
  • Subtyping is multi-dimensional: for example I can subtype a AHU by its heating method, cooling method, and ductwork configuration (all simultaneously)

This pattern plays out in the documentation quite often in a non-formal way:

  • Point qualifier: sensor, cmd, sp
  • Point subject: air, water, steam, elec, etc
  • Point quantity: temp, flow, pressure, power, energy, etc
  • Power Qualifier: active, reactive, apparent
  • AHU heating: steamHeat, hotWaterHeat, gasHeat, elecHeat
  • AHU cooling: dxCool, chilledWaterCool
  • AHU ductwork: singleDuct, dualDuct, tripleDuct
  • VAV airflow: series, parallel
  • Chiller type: absorption, reciprocal, screw, centrifugal

Another important consideration is that these exclusive choices are often open ended. This is opposed to an enum in a programming language which is closed (once defined you may not add new choices to the enumeration). But in a data model type system, these enumerated choices may be expanded after the fact. An example might be a project which requires a subtype choice not covered by the standard Haystack tag library.

Type Names

One of the common questions I've heard over the years is this: why not just define a shorthand name for a combination of tags such as "discharge air temp point". But what would this name be? Creating a shortcut such as "DAT" would go against the principle of avoiding indirection to understand an entity's tags. And to provide the same information without indirection would lead to a name such as "DischargeAirTempPoint" which sort of defeats the purpose of creating new names. I would propose that any new synthetic name generated for Haystack's type system is strictly just a combination of existing tag names. For example the type that represents an AHU with steam heating:

equip ahu steamHeat    // tags separated by space
equip-ahu-steamHeat    // tags separated with dash
equip+ahu+steamHeat    // tags separated with plus
steamHeat ahu equip    // most specific to least specific

For this proposal I will use the first option: a type name is a list of tags separated by space and ordered from least to most specific. For the prototype documentation HTML pages I used dash instead of space as a more URL friendly file name.

Side note: I also investigated using camel case to join tags name together (if all tags were lowercase). But we have many tags such as hotWaterHeat where this would cause a problem. These compound tag names are a potential problem which could possibly be solved more elegantly through the type system. But I'll leave that as a discussion for the working group.

Notation

In order to discuss how we might apply a type system to Haystack tags using the concepts above, we need some notation. I'm going to introduce a notation/syntax which I have found concise and readable to develop the prototype. However, my proposal is based on the abstraction concepts, not the specific syntax I am using here. However at some point we will need to formalize one or more machine readable formats which capture the type system abstractions.

Here is the quick summary of notation:

type > tag       // type has tag
type dim>        // type has subtype dimension
type dim> tag    // subtype choice within given dimension
type <ref> type  // relationship definition

Lets look at each of these notations in more detail...

Notation: Has

Lets start off with an entity which might have data tags:

site > area             // Square footage of the site
site > tz               // Timezone of the site
site > primaryFunction  // Primary function of the site
site > yearBuilt        // Original construction year of the site

Here we using the syntax "type > tag" to define that the LHS (left hand side) type may optionally use the RHS (right hand side) tag according the definition given in the slash-slash comment. This definition is context specific to when the tag is applied to the LHS type.

We can use Python style indentation to omit the base type. The following has exactly the same semantics as the definitions above:

site 
  > area             // Square footage of the site
  > tz               // Timezone of the site
  > primaryFunction  // Primary function of the site
  > yearBuilt        // Original construction year

Notation: Subtype

Here is how to define a subtype dimension

point subject>        // Subject or substance of the point's measurement or control
point subject> air    // Point related to air
point subject> water  // Point related to water
point subject> steam  // Point related to steam
point subject> elec   // Point related to electricity

Here we define a named dimension of subtyping on points. In this case the dimension name is subject as defined with the syntax "type dim>". Then we can define exclusive subtype choices for that dimension with the syntax "type dim> tag". Each choice defines a new type. In our example above, we have now defined the new types "point air", "point water", etc.

We can use indentation to collapse the definition above. And lets flush out more point subtypes to see how it works in practice:

point

qualifier>            // Classifies the point as a sensor, command, or setpoint
  sensor              // Point is a sensor, input, AI/BI
  cmd                 // Point is a command, actuator, AO/BO
  sp                  // Point is a setpoint, soft point, internal control variable, schedule

subject>              // Subject or substance of the point's measurement or control
  air                 // Point related to air
  water               // Point related to water
  steam               // Point related to steam
  elec                // Point related to electricity
  refrig              // Point related to refrigerant substance

air

  quantity>           // Quantity of air measured or controlled
    temp              // Point related to dry bulb air temperature
    humidity          // Point related to percent relative humidity of air
    flow              // Point related to volumetric air flow
    pressure          // Point related to static air pressure

water

  quantity>           // Quantity of water measured or controlled
    temp              // Point related to water temperature
    flow              // Point related to volumetric water flow
    pressure          // Point related to water pressure

  waterType>          // Type of the water and its usage
    domestic          // Tap water for drinking, washing, cooking, and flushing of toliets
    hot               // Hot water used for heating or supply to hot taps
    chilled           // Water used for cooling
    condenser         // Water used used to remove heat through condensation
    makeup            // Water used used to makeup water loss through leaks, evaporation, or blowdown
    blowdown          // Water expelled from a system to remove mineral build up

What is created as you define these dimensions and their choices is a "type tree" or "decision tree". Each time you add a marker tag it potentially opens up new choices to narrow the type along multiple branches (dimensions).

Notation: Relationship

Lastly we need a notation to define relationships. Here are some examples:

equip <equipRef> equip      // Equipment contains sub-equipment
equip <equipRef> point      // Equipment contains point

A relationship has a LHS type and a RHS type and one or more relationship tags grouped between the "<>". The first relationship tag must be a ref tag which is applied to the entity on the RHS to reference the LHS. Or put another way the RHS is the "from entity" and the LHS is the "to entity" in terms of the ref tag. Lets deconstruct this example:

LHS    Tags        RHS      Doc definition of relationship
-----  ---------   -----    -------------------------------
equip  <equipRef>  point    // Equipment contains point

The LHS type is any entity tagged with the equip marker tag. The RHS is a point entity. In order to apply the relationship, then the equipRef tag must applied to the RHS (the point) and reference the LHS (the equip). When all of those conditions hold true, then the relationship applies.

We can define additional tags to apply to the RHS entity for more complex relationships:

equip ahu <equipRef discharge> point air  // AHU point associated with discharge air duct
equip ahu <equipRef return> point air     // AHU point associated with return air duct

In the example above, the LHS (to) is AHU equipment and the RHS (from) is points associated with the measurement/control of air. The relationship tags include both a ref tag as well as a "section tag" to apply to the point to create the specified relationship. This model allows us to reuse the subtype definition of "point air" without duplicating massive point tag combinations under each equipment (like we do today).

Here are some more relationship examples for a steam plant:

equip plant steam

  <steamPlantRef> equip ahu steamHeat   // Plant supplies steam to AHU for heating 
  <equipRef> equip boiler               // Plant contains boiler
  <equipRef leaving> point steam        // Point associated with steam leaving plant as heating supply
  <equipRef delta> point steam          // Point associated with steam differential between leaving and entering
  <equipRef entering> point steam       // Point associated with steam returning to plant to be heated back up

Prototype

I have developed a complete prototype for the type system discussed above. This is actually my third prototype (the first two being dead ends). The prototype is developed in Fantom and has following key features:

  • TagDef: models a single tag definition
  • TypeDef: models type, its has tags, dimensions, and relationships
  • Model: immutable data structure for all the TagDef and TypeDef
  • Loader: loads one or more haydef text files to build an in-memory model
  • DocGen: generates simple HTML documentation for a model
  • lib/*.haydef: definitions for about 70% of the Haystack model using notation discussed

You can download the prototype include source code, definitions, and example documentation from:

https://project-haystack.org/download/build/haystack-model-prototype-2017-10-17.zip

To run the documentation use this command which generates HTML files to "./doc/"

bin/fan haystackModel::DocGen

The prototype has quite a bit of the model flushed out including

  • air, water, steam points
  • electrical meters and power/energy/volt/current points
  • central plants (using simple, not existing compound tags)
  • chillers
  • boilers
  • VAVs

None of it is complete, but its pretty far along to test out the concepts. If you are interested in this topic, then I would encourage you to download it and at least look thru the haydef text files.

Next Steps

There seems to lots of momentum with various organizations, vendors, and community members around this core problem. I believe now is a great time to tackle the problem head on. So I'd like to create a new working group (WG) for those interested. I'm thinking of a WG process with weekly webcast calls. Also feel free to post ideas/comments to the forum. If you are interested please use the "Join Group" command to join the WG.

Stephen Frank Wed 18 Oct 2017

Count me in on the WG please.

Some things I would also like this WG to address are:

  • General typing for location information (currently missing from Haystack)
  • Revisit best practice for handling one-to-many and many-to-many relationships

Jason Briggs Wed 18 Oct 2017

I already met with Brian offline, and love the direction of this. Been needing this for a long time.

Greg Ingram Sun 19 Nov 2017

What's the status on this topic? Next steps? Any updates/notes from WG meetings?

Doug Migliori Tue 9 Jan

I would suggest that all contributors to this WG read parts 3 and 4 of the multi-part article series on Cross-Industry Semantic Interop to broaden perspectives.

http://www.embedded-computing.com/semantic-interop/cross-industry-semantic-interoperability-part-three-the-role-of-a-top-level-ontology

Login or Signup to reply.