Annotation syntax

The extractor looks for owl:Class definitions that carry a dc:description annotation containing the phrase “The properties that can be used”. Everything after that line is parsed as a list of property declarations.

Property format

Each property is a bullet point (* or -) followed by this pattern:

prefix:propertyName -[cardinality]-> prefix:TargetType

A concrete example in a Turtle ontology:

foaf:Agent a owl:Class ;
    dc:description """The properties that can be used with this class are:

* datacite:hasIdentifier -[0..N]-> datacite:Identifier
* foaf:name -[0..1]-> rdfs:Literal
* pro:holdsRoleInTime -[0..N]-> pro:RoleInTime""" .

The extractor turns each line into an sh:property block on the corresponding NodeShape.

Cardinality

Three forms are recognized:

Notation	Meaning	SHACL output
`[2]`	Exactly 2	`sh:minCount 2 ; sh:maxCount 2`
`[0..1]`	Zero or one	`sh:maxCount 1`
`[1..N]`	At least one, no upper bound	`sh:minCount 1`

N and * both mean unbounded. When the minimum is 0 or unbounded, no sh:minCount is emitted. When the maximum is N or *, no sh:maxCount is emitted.

Target types

What the extractor generates depends on the target type:

Literals. rdfs:Literal and rdfs:langString produce sh:nodeKind sh:Literal. XSD types like xsd:string or xsd:dateTime produce sh:datatype.

Classes with a shape. When the target class also has its own dc:description block, the extractor generates sh:node TargetShape. This links the property constraint to the shape of the target class, so values are validated against its properties too.

Classes without a shape. If the target class has no property annotations (and therefore no generated shape), the extractor falls back to sh:nodeKind sh:BlankNodeOrIRI.

Union ranges

When the same property path appears multiple times with different target classes, the extractor treats it as a union range and generates a single sh:property with sh:or.

Consider this annotation, where srv:isDeploymentOf can point to three different classes:

* srv:isDeploymentOf -[0..N]-> fabio:Software
* srv:isDeploymentOf -[0..N]-> schema:SoftwareSourceCode

Suppose schema:SoftwareSourceCode has its own dc:description block (so the extractor generates a SoftwareSourceCodeShape for it), while fabio:Software do not. The result is:

sh:property [
    sh:path srv:isDeploymentOf ;
    sh:or (
        [ sh:nodeKind sh:BlankNodeOrIRI ]
        [ sh:node srv_sh:SoftwareSourceCodeShape ]
    ) ;
] ;

Controlled vocabularies

When a property range is a fixed set of named individuals rather than a class, list the allowed values inside curly braces:

* datacite:usesIdentifierScheme -[1]-> {datacite:doi datacite:isbn datacite:orcid}

The extractor generates sh:in with the list of URIs:

sh:property [
    sh:path datacite:usesIdentifierScheme ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:in ( datacite:doi datacite:isbn datacite:orcid ) ;
] ;

Values inside the braces can be prefixed names (datacite:doi) or absolute IRIs (http://purl.org/spar/datacite/doi). Only IRIs are supported; string literals are not.

Namespace resolution

Prefixes in the annotations need to be resolved to full URIs. The extractor tries three strategies in order:

Registered prefixes — standard @prefix declarations parsed by rdflib.
URI namespace map — when a target name appears without a prefix (e.g., just E55_Type), the extractor looks it up against all URIs already present in the graph. If any URI ends with E55_Type after a # or /, the namespace part of that URI is used.
Literal prefix map — parses @prefix declarations found inside string literals (i.e., inside dc:description text itself). Some ontologies declare prefixes only in annotation strings, not at the top level. CHAD-AP is one such case: crm:, lrmoo:, and crmdig: appear only inside a dc:description literal, never as top-level @prefix declarations.

Root classes

Not all data consistently declares rdf:type on every resource. A SHACL shape with sh:targetClass forces validation on all instances of that class, which means the data must include rdf:type for the validator to find them. Root classes are the classes where this requirement is enforced.

A root class gets sh:targetClass in its shape, so validators check all its instances against the shape. Non-root classes only get sh:NodeShape without a target — they are validated indirectly when referenced via sh:node from another shape. This way, a reference to a non-root class does not fail just because the data omits rdf:type on that resource.

By default, the extractor detects root classes automatically: a class is root if no other described class points to it as a target. This works well for ontologies with a clear hierarchy, where top-level classes are never referenced as the range of another class’s property.

It breaks down when classes reference each other across modules. In SKG-IF core, for instance, Agent is referenced by Grant (via pro:isHeldBy), Work is referenced by DataService, and so on — almost every top-level entity appears as someone else’s target. The automatic algorithm would miss them all, since it only picks classes with zero incoming edges.

For these cases, use --root-classes to specify the root classes explicitly via a JSON file that maps module names to class URIs. The repository ships a preset for SKG-IF core:

uv run extractor data-model/ontology/current/ shapes.ttl --root-classes presets/skg-if-core.json