Skip to content

Annotation syntax

The extractor looks for owl:Class definitions that carry a dc:description annotation containing the phrase “The properties that can be used”. Everything after that line is parsed as a list of property declarations.

Each property is a bullet point (* or -) followed by this pattern:

prefix:propertyName -[cardinality]-> prefix:TargetType

A concrete example in a Turtle ontology:

foaf:Agent a owl:Class ;
dc:description """The properties that can be used with this class are:
* datacite:hasIdentifier -[0..N]-> datacite:Identifier
* foaf:name -[0..1]-> rdfs:Literal
* pro:holdsRoleInTime -[0..N]-> pro:RoleInTime""" .

The extractor turns each line into an sh:property block on the corresponding NodeShape.

Three forms are recognized:

NotationMeaningSHACL output
[2]Exactly 2sh:minCount 2 ; sh:maxCount 2
[0..1]Zero or onesh:maxCount 1
[1..N]At least one, no upper boundsh:minCount 1

N and * both mean unbounded. When the minimum is 0 or unbounded, no sh:minCount is emitted. When the maximum is N or *, no sh:maxCount is emitted.

What the extractor generates depends on the target type:

Literals. rdfs:Literal and rdfs:langString produce sh:nodeKind sh:Literal. XSD types like xsd:string or xsd:dateTime produce sh:datatype.

Classes with a shape. When the target class also has its own dc:description block, the extractor generates sh:node TargetShape. This links the property constraint to the shape of the target class, so values are validated against its properties too.

Classes without a shape. If the target class has no property annotations (and therefore no generated shape), the extractor falls back to sh:nodeKind sh:BlankNodeOrIRI.

When the same property path appears multiple times with different target classes, the extractor treats it as a union range and generates a single sh:property with sh:or.

Consider this annotation, where srv:isDeploymentOf can point to three different classes:

* srv:isDeploymentOf -[0..N]-> fabio:Software
* srv:isDeploymentOf -[0..N]-> schema:SoftwareSourceCode

Suppose schema:SoftwareSourceCode has its own dc:description block (so the extractor generates a SoftwareSourceCodeShape for it), while fabio:Software do not. The result is:

sh:property [
sh:path srv:isDeploymentOf ;
sh:or (
[ sh:nodeKind sh:BlankNodeOrIRI ]
[ sh:node srv_sh:SoftwareSourceCodeShape ]
) ;
] ;

When a property range is a fixed set of named individuals rather than a class, list the allowed values inside curly braces:

* datacite:usesIdentifierScheme -[1]-> {datacite:doi datacite:isbn datacite:orcid}

The extractor generates sh:in with the list of URIs:

sh:property [
sh:path datacite:usesIdentifierScheme ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:in ( datacite:doi datacite:isbn datacite:orcid ) ;
] ;

Values inside the braces can be prefixed names (datacite:doi) or absolute IRIs (http://purl.org/spar/datacite/doi). Only IRIs are supported; string literals are not.

Prefixes in the annotations need to be resolved to full URIs. The extractor tries three strategies in order:

  1. Registered prefixes — standard @prefix declarations parsed by rdflib.
  2. URI namespace map — when a target name appears without a prefix (e.g., just E55_Type), the extractor looks it up against all URIs already present in the graph. If any URI ends with E55_Type after a # or /, the namespace part of that URI is used.
  3. Literal prefix map — parses @prefix declarations found inside string literals (i.e., inside dc:description text itself). Some ontologies declare prefixes only in annotation strings, not at the top level. CHAD-AP is one such case: crm:, lrmoo:, and crmdig: appear only inside a dc:description literal, never as top-level @prefix declarations.

Not all data consistently declares rdf:type on every resource. A SHACL shape with sh:targetClass forces validation on all instances of that class, which means the data must include rdf:type for the validator to find them. Root classes are the classes where this requirement is enforced.

A root class gets sh:targetClass in its shape, so validators check all its instances against the shape. Non-root classes only get sh:NodeShape without a target — they are validated indirectly when referenced via sh:node from another shape. This way, a reference to a non-root class does not fail just because the data omits rdf:type on that resource.

By default, the extractor detects root classes automatically: a class is root if no other described class points to it as a target. This works well for ontologies with a clear hierarchy, where top-level classes are never referenced as the range of another class’s property.

It breaks down when classes reference each other across modules. In SKG-IF core, for instance, Agent is referenced by Grant (via pro:isHeldBy), Work is referenced by DataService, and so on — almost every top-level entity appears as someone else’s target. The automatic algorithm would miss them all, since it only picks classes with zero incoming edges.

For these cases, use --root-classes to specify the root classes explicitly via a JSON file that maps module names to class URIs. The repository ships a preset for SKG-IF core:

Terminal window
uv run extractor data-model/ontology/current/ shapes.ttl --root-classes presets/skg-if-core.json