In the previous posts I’ve covered two important problems of the RDF model, related to blank nodes and literals. Here, I’m going to focus on what I think is the key problem of RDF – the problem of the node of an RDF graph.
The set of nodes of an RDF graph is the set of subjects and objects of triples in the graph.
It then states that a node can be a URI reference, a literal or a blank node. However, the concept of a node itself is not clearly defined. What is it that is common to all different types of nodes? What requirements does a node need to meet in order to be a node?
A node may represent a resource or a value. It may or may not have a name. If it has a name, that name is defined in different ways. A node can hold one or two values. If it holds two values, the second value can be a string or a URI.
So, what’s common to all nodes? It seems that the only requirement is the existence of a “thing” a node represents. However, such a requirement is too broad to make sense. The result is that there are conceptually completely different types of nodes. For instance, a blank node and a literal virtually don’t have anything in common, except that both are called “nodes”.
URIs allow for an easy creation of unique names at the global scope, which is one of the key aspects of RDF, making it potentially a very powerful data model. However, using a URI as an identifier is limited to only one type of node – a URI reference.
A literal is identified by the lexical representation of its value. The same way a resource is distinguished from the URI reference that represents it, a value should be distinguished from the literal node by which it’s represented. Like a blank node, this node doesn’t have a URI as the name (its name is its value), meaning it’s impossible to be referred to in the Web context, and HTTP request (look up) cannot be made, which is the essence of the Web and Linked Data.
Furthermore, even within literals, there is no unique method of identification, but it depends on whether it’s a plain or a typed literal. Add to that the special cases of the identification of blank nodes in practice, and you end up having a number of different ways to identify nodes in an RDF graph. For an “extremely simple” model, as the RDF model is sometimes referred to, made with the ambition to extend the Web, this represents a significant problem.
The Web graph, on the other hand, is an example an RDF graph should follow. Every node is a web resource identified with (the unique mechanism of) an HTTP URI. There are no “blank resources”, or resources identified by their value. There are no special cases – every node is a web resource with a URI. As simple as that.
When it comes to the simplicity of the RDF model, it is often said that RDF is just triples. However, the problem is not at the level of a triple, but at the level of a node. From the point of an RDF graph, a node, not a triple, is its basic element. A node is not clearly defined which has dramatic consequences for the whole model. This fact, more or less directly, causes all the other previously discussed problems of the RDF model.
An RDF graph doesn’t reflect the simplicity of the mathematical concept of a graph. It is not defined as a graph, but as a set of triples. A graph data model is what makes it potentially so flexible and powerful. However, trivial graph operations are hard in RDF. Data manipulation and integration is a pain. RDF syntaxes are much more complex because they have to implement many special cases of the RDF model. Linked Data is complex partly because of RDF and is running away from it. Web developers are scared of it, let alone webmasters or ordinary people.
RDF has been seriously threatened by the recent launch of Schema.org – a joint project of Google, Bing and Yahoo search engines. This applies not only to the ignored RDFa syntax, but also to the other aspects of RDF. They have used a model that is significantly simpler than the RDF model, and haven’t used the RDF standards (RDF schema, OWL), nor the proposed way of implementing non-information resources.
This should be the clear signal that something is seriously wrong. However, it seems that there is still no awareness of the crisis RDF (and the Semantic Web, in a wider sense) is going through. Too many things are taken for granted, and it seems that nobody is trying to solve the fundamental problems. The new RDF working group is doing just a maintenance work. The standards are set and gain maturity, but they are based on the flawed concept.
Throughout the history, various things have been blamed for the slow adoption – RDF syntaxes, complex implementations, non-practical solutions – but the core RDF model has been mostly left alone, perhaps because of the “RDF is just triples” illusion. The Semantic Web community is trying to solve the problems by attempting to fix consequences instead of causes. Standardizing the Skolemization is one example. The result of such approach is adding even more complexity to the stack. Exactly the opposite is needed. The challenge is to make a truly simple RDF model.
Don’t get me wrong. I don’t blame anyone. The community has put a lot of effort into developing the standards and overall has done a great job. However, RDF is in the crisis, that’s just the fact. It demands a radical change, especially in the way of thinking. A crisis is a not necessarily a bad thing – it can lead to big positive changes, even a revolution.