Before we start, let’s remind ourselves of the example RDF graph we used in the previous post:
The challenge is to figure out URIs for nodes having question marks, namely blank nodes and literals.
How to provide a URI for each node of an RDF graph? The solution to this problem can be found in the very nature of the Web. Namely, a unique (HTTP) URI for all nodes can be obtained in a similar way ordinary web pages get their URLs. The domain of each website is unique, while webpages that naturally have ambiguous names, get unique URLs in the context of a web site.
For instance, imagine that the website http://chucknorris.com has a contact page. The term “contact” is ambiguous and exists on a number of web pages, but the URL http://chucknorris.com/contact becomes globally unique. In the context of triples of an RDF graph, http://chucknorris.com would become the subject, the “contact” predicate, and the http://chucknorris.com/contact the object of the RDF triple.
<http://chucknorris.com> “contact” <http://chucknorris.com/contact> .
However, there are two significant differences between web pages and nodes of an RDF graph. First, the properties that make up predicates in an RDF graph are URIs themselves, and not mere words (like the “contact” in the example). Secondly, a resource can be linked by the same properties to several different values, i.e. there may be several RDF triples with the same subjects and predicates, but different objects. In this case, simple concatenation of the subject and the predicate is not enough to create a unique URI.
The idea for solving the first problem can be found in CURIE syntax. CURIE defines an abbreviated syntax for expressing URIs in the “prefix:localName” form, which is already widely used in RDF notations. It consists of a prefix and a local name separated by the collon (:) delimiter. The prefix is a reference to a URI namespace, i.e. the part of a URI common to all resources of a domain. For example, resources defined by FOAF ontology share the namespace http://xmlns.com/foaf/0.1/, which is usually mapped to the prefix “foaf”. The CURIE for the property http://xmlns.com/foaf/0.1/based_near will therefore become “foaf:based_near”.
By extending the URI of the subject (http://chucknorris.com/data_/chuck) with the predicate in the CURIE form (foaf:based_near), the blank node from the above example will obtain the URI http://chucknorris.com/data_/chuck/foaf:based_near. However, the character “:” is reserved in the URI syntax and forbidden in file names and folders, as well as in other contexts, so an alternative delimiter is needed. Instead of the “:” we can use the underscore (_), making the previous example look like this:
The triple in question will look like this:
<http://chucknorris.com/data_/chuck> foaf:based_near <http://chucknorris.com/data_/chuck/foaf_based_near> .
The same method can be applied to other blank nodes, for instance:
<http://chucknorris.com/data_/chuck/foaf_based_near> geo:lat <http://chucknorris.com/data_/chuck/foaf_based_near/geo_lat> .
When using the CURIE syntax, one needs to define the prefixes and map them to the appropriate namespaces. This definition is usually located at the beginning of a document. For example, in the Turtle notation the keyword “@prefix” is used at the beginning of a file, while in notations based on XML, it is usually defined on the root tag using the “prefix” or “xmlns” attributes. Since the web site has a tree structure, the logical choice for the definition of a prefix is the root of the tree. Prefixes are therefore defined at the website level and placed on the “website.com/prefix_” path. For example, the URL http://chucknorris.com/prefix_/foaf can return the reference to http://xmlns.com/foaf/0.1/ namespace. Therefore, for the CURIE form of a URI, the full URI can be obtained in a relatively simple way.
The second problem is related to the assignment of URIs in the situation where there are multiple RDF triples with the same subjects and predicates, but different objects. For example, what will happen if the node http://chucknorris.com/data_/chuck from the example graph is connected using the same property “foaf:based_near” to multiple (geo:Point) nodes? In that case, the http://chucknorris.com/data_/chuck/foaf_based_near URI is not suitable because it is unclear to which node it refers. It is therefore necessary to provide a mechanism that allows a distinct URI for each node.
Here an analogy with arrays in programming languages can help. If the
based_near is the name of an array, its members will be named as
based_near and so on. One can also use an associative array (hash), where instead of numbers, (descriptive) keys are used as indexes, for example –
In the HTTP context, the names of array members will become the URIs http://chucknorris.com/data_/chuck/foaf_based_near/1 and http://chucknorris.com/data_/chuck/foaf_based_near/2 (for simplicity and compatibility with other standards the indices start from 1 instead of 0). The associative array equivalents would be http://chucknorris.com/data_/chuck/foaf_based_near/belgrade and http://chucknorris.com/data_/chuck/foaf_based_near/pancevo.
These segments should be carefully chosen to ensure stability of the URIs. Their subsequent change affects all the URIs of child nodes containing the URI of the parent node. These „key“ segments can also be used when there is only one property, if it is expected to be more in the future. In this way it is ensured that later addition of a new object for the same property in an RDF triple will not cause changing the current URI. If the property is unique, the key can be omitted.
Adding the URI predicates in shortened (CURIE) form on the subjects URI, together with adding arbitrary keys on the resulting URI, allows for simple mechanism of assigning URIs to all nodes of an RDF graph. “Blank” nodes are now identified by URIs just like URI references. Using the same method literals can get a URI as well, which will be discussed in more detail in the following post. With URIs assigned to blank nodes, our example graph looks like this:
URIs tailored this way are always defined in the context of the “parent” URI, which makes them dependend on it. The nodes they identify represent some kind of property of the node in which context they have been defined, meaning that deleting the parent will cause deletion of its child. However, the “initial” nodes (for example http://chucknorris.com/data_/chuck) are in a similar way dependant on the web site, so viewed that way there are no fundamental differences between the “initial” and the “blank” nodes.
- next post: Literals, blank nodes, n-ary relations and rdf:value »
- « previous post: Fixing the RDF model: (re)defining a node of an RDF graph