Annotation syntax
The extractor looks for owl:Class definitions that carry a dc:description annotation containing the phrase “The properties that can be used”. Everything after that line is parsed as a list of property declarations.
Property format
Section titled “Property format”Each property is a bullet point (* or -) followed by this pattern:
prefix:propertyName -[cardinality]-> prefix:TargetTypeA concrete example in a Turtle ontology:
foaf:Agent a owl:Class ; dc:description """The properties that can be used with this class are:
* datacite:hasIdentifier -[0..N]-> datacite:Identifier* foaf:name -[0..1]-> rdfs:Literal* pro:holdsRoleInTime -[0..N]-> pro:RoleInTime""" .The extractor turns each line into an sh:property block on the corresponding NodeShape.
Cardinality
Section titled “Cardinality”Three forms are recognized:
| Notation | Meaning | SHACL output |
|---|---|---|
[2] | Exactly 2 | sh:minCount 2 ; sh:maxCount 2 |
[0..1] | Zero or one | sh:maxCount 1 |
[1..N] | At least one, no upper bound | sh:minCount 1 |
N and * both mean unbounded. When the minimum is 0 or unbounded, no sh:minCount is emitted. When the maximum is N or *, no sh:maxCount is emitted.
Target types
Section titled “Target types”What the extractor generates depends on the target type:
Literals. rdfs:Literal and rdfs:langString produce sh:nodeKind sh:Literal. XSD types like xsd:string or xsd:dateTime produce sh:datatype.
Classes with a shape. When the target class also has its own dc:description block, the extractor generates sh:node TargetShape. This links the property constraint to the shape of the target class, so values are validated against its properties too.
Classes without a shape. If the target class has no property annotations (and therefore no generated shape), the extractor falls back to sh:nodeKind sh:BlankNodeOrIRI.
Union ranges
Section titled “Union ranges”When the same property path appears multiple times with different target classes, the extractor treats it as a union range and generates a single sh:property with sh:or.
Consider this annotation, where srv:isDeploymentOf can point to three different classes:
* srv:isDeploymentOf -[0..N]-> fabio:Software* srv:isDeploymentOf -[0..N]-> schema:SoftwareSourceCodeSuppose schema:SoftwareSourceCode has its own dc:description block (so the extractor generates a SoftwareSourceCodeShape for it), while fabio:Software do not. The result is:
sh:property [ sh:path srv:isDeploymentOf ; sh:or ( [ sh:nodeKind sh:BlankNodeOrIRI ] [ sh:node srv_sh:SoftwareSourceCodeShape ] ) ;] ;Controlled vocabularies
Section titled “Controlled vocabularies”When a property range is a fixed set of named individuals rather than a class, list the allowed values inside curly braces:
* datacite:usesIdentifierScheme -[1]-> {datacite:doi datacite:isbn datacite:orcid}The extractor generates sh:in with the list of URIs:
sh:property [ sh:path datacite:usesIdentifierScheme ; sh:minCount 1 ; sh:maxCount 1 ; sh:in ( datacite:doi datacite:isbn datacite:orcid ) ;] ;Values inside the braces can be prefixed names (datacite:doi) or absolute IRIs (http://purl.org/spar/datacite/doi). Only IRIs are supported; string literals are not.
Namespace resolution
Section titled “Namespace resolution”Prefixes in the annotations need to be resolved to full URIs. The extractor tries three strategies in order:
- Registered prefixes — standard
@prefixdeclarations parsed by rdflib. - URI namespace map — when a target name appears without a prefix (e.g., just
E55_Type), the extractor looks it up against all URIs already present in the graph. If any URI ends withE55_Typeafter a#or/, the namespace part of that URI is used. - Literal prefix map — parses
@prefixdeclarations found inside string literals (i.e., insidedc:descriptiontext itself). Some ontologies declare prefixes only in annotation strings, not at the top level. CHAD-AP is one such case:crm:,lrmoo:, andcrmdig:appear only inside adc:descriptionliteral, never as top-level@prefixdeclarations.
Root classes
Section titled “Root classes”Not all data consistently declares rdf:type on every resource. A SHACL shape with sh:targetClass forces validation on all instances of that class, which means the data must include rdf:type for the validator to find them. Root classes are the classes where this requirement is enforced.
A root class gets sh:targetClass in its shape, so validators check all its instances against the shape. Non-root classes only get sh:NodeShape without a target — they are validated indirectly when referenced via sh:node from another shape. This way, a reference to a non-root class does not fail just because the data omits rdf:type on that resource.
By default, the extractor detects root classes automatically: a class is root if no other described class points to it as a target. This works well for ontologies with a clear hierarchy, where top-level classes are never referenced as the range of another class’s property.
It breaks down when classes reference each other across modules. In SKG-IF core, for instance, Agent is referenced by Grant (via pro:isHeldBy), Work is referenced by DataService, and so on — almost every top-level entity appears as someone else’s target. The automatic algorithm would miss them all, since it only picks classes with zero incoming edges.
For these cases, use --root-classes to specify the root classes explicitly via a JSON file that maps module names to class URIs. The repository ships a preset for SKG-IF core:
uv run extractor data-model/ontology/current/ shapes.ttl --root-classes presets/skg-if-core.json