Anyone involved in anything having to do with the Semantic Web or Linked Data knows how much time and energy is wasted on endless discussions on the blank node issue. It is a controversial topic because on the one side blank nodes cause huge problems in practice, while on the other, they enable a great flexibility in expressing.
In this flexibility a more profound reason is hidden, which perhaps can explain how blank nodes have survived as a part of the RDF model all these years despite all the headaches they have caused. The thing is, blank nodes reflect a human way of referencing things. Let’t me show an example:
If I want to talk about my left arm, it’s quite unnatural to invent a new identifier for it. I’ll just say „my left arm“, describing it relative to myself, and a listener will understand. This is possible due to human’s ability to understand the context. He or she knows that the pronoun „my“ refers to me as something unique, the arm being part of me, and the “left” finally specifying the exact arm. So, my left arm, unique in the universe, is referenced quite simply and elegantly.
In RDF, it can be expressed with the two statements (triples) as: “I have an arm. It (has a property that) is left”. Let’s assume that we know that the blank node is of a type „ex:Arm“ implicitly through the property:
ex:hasArm rdfs:range ex:Arm .
Given that the URI of me is
http://milicicvuk.com/data_/vuk, and assuming the relevant properties are defined in
ex ontology, we can express it with the following triples:
<http://milicicvuk.com/data_/vuk> ex:hasArm [ ex:hasProperty ex:Left . ]
The left arm is represented by the blank node, which is the object in the first triple and the subject in the second, thus chaining them and forming a rather readable code.
Now, let’s take a slightly more complicated example. I can say something as “the 5 cm scar on my left arm” (or my left arm’s 5 cm scar). Again, the scar is relative to the left arm, and the arm is relative to me. Translated to RDF, it will become: “I have an arm witch has a property that is left and has a scar that has a length which has a value 5 and is in unit of cm. This rather cumbersome sentence is much clearer when written in Turtle notation using nested blank nodes:
<http://milicicvuk.com/data_/vuk> ex:hasArm [ ex:hasProperty ex:Left . ex:hasScar [ ex:hasLength [ rdf:value "5" ; ex:inUnit ex:cm . ] ] ]
Here we have three blank nodes that connect various statements which results in pretty elegant and readable code. This level of elegance and readability can never be achieved by using URI references.
That’s what makes blank nodes cool – they allow referencing relative to another thing. You can, instead of minting identifiers for every possible resource, just say, „something“ or „someone“, which is related to something else that has an identifier. The trouble is that this coolness is greatly diminished due to the negative side of not having global identifiers.
The question is: is it possible to keep the flexibility of blank nodes while having URIs at the same time? The answers is: yes, there is an elegant solution that allows just that.
To understand it, let’s try to look at the problem from the perspective of a namespace. The idea of a namespace is related to that of a context. A namespace is defined as a container that provides context of identifiers. A namespace has a unique name in the global space, allowing otherwise ambiguous identifiers to also become globally unique.
Now, let’s for a moment look at the part enclosed between
] in the first RDF example. The subject and the predicate of the first triple (
<http://milicicvuk.com/data_/vuk> ex:hasArm) act as the namespace of the part between square brackets. It uniquely defines the “container” that provides context for local identifiers.
<http://milicicvuk.com/data_/vuk> ex:hasArm [ ex:hasProperty ex:Left . ]
However, in order for this namespace to be usable, we must convert it to a URI. The URI
http://milicicvuk.com/data_/vuk alone can be seen as a kind of namespace for the predicate
ex:hasArm. Of course,
ex:hasArm is also unique C(URI)E, but in this context, it acts as a local identifier.
Put in this perspective, it is not hard to figure out what the full name of that identifier is. It can be made as with every other namespaced variable, by concatenating the namespace with the local name.
As a delimiter, we are going to use the “slash” character “/”, a standard delimiter of URI segments. The result is:
Another thing we have to do is to replace the URL unfriendly character “:” with something else. Let’s use the “_” char[*]. Finally, we get:
We got the URI of the namespace defined by the subject and the predicate of the triple. Another way of looking at this URI is as the full name of the “local identifier”
ex:hasArm, defined in the
http://milicicvuk.com/data_/vuk context. In any case, in the context of this new namespace we are going to define new local identifiers, using the following template:
Namespaces are cool because they allow us not to worry about the global scope. The uniqueness of a namespace guarantees that all new identifiers defined in its context will also be unique. This way we reduced the problem of creating a whole new URI to the problem of inventing a name which has to be only locally unique.
In this particular case, having that I have just two arms, the local identifiers „left“ and „right“ will do the job nicely[**]. The full URIs (with the namespace) will thus look like this:
Therefore, the resource (the arm) that was previously represented by a blank node got the URI (
http://milicicvuk.com/data_/vuk/ex_hasArm/left). The blank node just evolved to the URI reference while keeping its flexibility of expressing!
Additionally, we have a clear pattern for other URIs, too. What about identifiers for legs? No problem:
As I mentioned earlier, every new URI is at the same time the namespace for new identifiers. New namespaces can be built on the basis of the previous ones, forming the chain of nested namespaces. Namely, ex:hasScar is a local identifier in the context defined by
http://milicicvuk.com/data_/vuk/ex_hasLeg/left namespace. Suppose it’s a scar from a surgery, suggesting the local identifier
Again, the new URI is the namespace of the subject
http://milicicvuk.com/data_/vuk/ex_hasLeg/left and the predicate
ex_hasScar, forming the container for the local identifier “surgery”. The full URI of the scar is therefore the URI of the object of that triple, previously being a blank node:
<http://milicicvuk.com/data_/vuk/ex_hasLeg/left> ex:hasScar <http://milicicvuk.com/data_/vuk/ex_hasLeg/left/ex_hasScar/surgery> .
What about literals? The exact same method can be applied to constructing the URIs of literals as well. The literal “5″ in the second RDF example will get the URI:
Literals’ URIs by convention always end with
rdf_value segment because literal nodes are always values of the
rdf:value property. Also, literal nodes are special in that they are terminal nodes, meaning they can not branch further (and thus can not serve as namespaces for new identifiers).
You may recognized a pattern used in these URIs. It is a variation of a well-known URI pattern used on the Web, that consists of two parts: one representing the collection, and other being one individual (instance) of the collection.
This pattern is also used in Linked Data. In the book Linked Data patterns, this kind of URIs are called patterned URIs and are recommended as as way for creating more hackable and human-readable URIs. The authors suggest using pluralized class names as the first part of the URI pattern, and identifier as the second.
For example if an application will be publishing data about book resources, which are modelled as the rdf:type ex:Book. One might construct URIs of the form:
/booksis the base part of the URI indicating “the collection of books”, and the
12345is an identifier for an individual book.
In another, hierarchical URIs pattern, the authors state:
Where a natural hierarchy exists between a set of resources use Patterned URIs that conform to the following pattern:
E.g. in a system which is publishing data about individual books and their chapters, we might use the following identifier for chapter 1 of a specific book:
The /chapters URI will naturally reflect to the collection of all chapters within a specific book. The /books URI maps to the collection of all books within a system, etc.
A pattern for naming nodes of an RDF graph can be considered as a kind of “hierarchical URIs” pattern where a property name is used instead of a pluralized class. Its form can be written as follows:
A “hierarchical” is perhaps not the best name for the relations between nodes in a graph, but bear in mind that the part of a graph described this way has the form of a tree with the described resource as a root. Anyways, to differentiate it from the other URI patterns, let’s call it the “RDF graph” URI pattern.
The “RDF graph” URI pattern
Using properties instead of class names explicitly state the relations between the nodes. Also, information about the item’s class can be preserved if contained in the property name, as it’s the case with
ex:Arm class in
The “RDF graph” pattern can be applied to the entire URI of a node, starting from the domain name to the last segment. The default namespace
website.com/data_ is a container for the root level nodes which than branch to the lowest level nodes using the same pattern. For instance, in the URI
http://milicicvuk.com/data_/vuk/ex_hasLeg/left/ex_hasScar/surgery/ex_hasLength/rdf_value, there are five “property” parts of the URI denoting properties (data:, ex:hasLeg, ex:hasScar, ex:hasLength and rdf:value) and three item (or key) parts (vuk, left, surgery)[***].
The diagram showing a part of RDF graph describing all the nodes contained in the URI looks like this:
The triples in the Turtle syntax look like this:
<http://milicicvuk.com> data: <http://milicicvuk.com/data_/vuk> . <http://milicicvuk.com/data_/vuk> ex:hasLeg <http://milicicvuk.com/data_/vuk/ex_hasLeg/left> . <http://milicicvuk.com/data_/vuk/ex_hasLeg/left> ex:hasScar <http://milicicvuk.com/data_/vuk/ex_hasLeg/left/ex_hasScar/surgery> <http://milicicvuk.com/data_/vuk/ex_hasLeg/left/ex_hasScar/surgery> ex:hasLength <http://milicicvuk.com/data_/vuk/ex_hasLeg/left/ex_hasScar/surgery/ex_hasLength> <http://milicicvuk.com/data_/vuk/ex_hasLeg/left/ex_hasScar/surgery/ex_hasLength> rdf:value "5" .
Using the more concise syntax based on extended CURIEs, it will look as follows:
<http://milicicvuk.com> data::vuk [ ex:hasLeg:left [ ex:hasScar:surgery [ ex:hasLength [ rdf:value "5" . ] ] ] ]
Note that the literal is represented by its value (5). Its URI, if needed, can be easily inferred from its parent URI.
Different syntactic representation of the first “property” part and the second “item” part of the URI allows a URI to be readable not just to people, but to machines as well. In the form such as
/books/12345/chapters/1 we intuitively know which part is which, but there is no syntactic constraints that explicitly make those parts distinct. In the “RDF graph” pattern, the property segment is always in the form of CURIE, which enables a parser to automatically identify and distinguish between the segments.
Furthermore, the prefixes of the CURIE properties are defined on the default namespace
website.com/prefix_, so the full properties’ URIs can be obtained automatically as well. For instance, the full URI of the
ex prefix could be retrieved from the
This approach allows a generic algorithm for identifying URIs implemented using the “RDF graph” pattern and distinguishing them from the ordinary, opaque URIs. Then, the parser can sort out the two types of segments and decompose the URIs to triples thanks to the explicitly defined meanings of the properties. This means that the parser is able not just to “read” the URI, but also “understand” it, by recursively parsing all the relevant URIs and getting triples it needs to learn. Its “knowledge” can be also used to guess new URIs by recombining the segments in the similar way humans do it with readable/hackable URIs. Finally, due to the fact that triples “live” in URIs and are inseparable from them, the source of triples is always known.
There is another important repercussion of using the “RDF graph” pattern. Because properties (in the form of CURIEs) become the part of the URI, they limit the publisher’s choice when it comes to generating his URIs. In other words, the ontology directs the creation of URIs by providing the names for properties. The burden and responsibility of minting URIs is thus transferred from a publisher to an ontology creator. The only things the publisher has to worry about are the local identifiers („left“, „right“ and „surgery“ in the above examples). These kind of “keys” can be recommended by the ontology maker, or can (perhaps more probably) arise as conventions from the community’s best practices.
[**] In other cases, if there are many identifiers, or descriptive names aren’t important, simple indexes can be automatically generated or existing IDs can be used.
[***] Note that the
:property/:sub-property pattern is also possible if there is a single item, as in
ex_hasLength/rdf_value. All the combinations will be discussed in more detail in the future posts.