The Web is just a bunch of trees plus shortcuts

The “Graph thinking” is one of the biggest conceptual problems when it comes to learning and understanding Linked Data and the RDF model, according to Rob Styles. Here, the term “graph thinking” refers to the ability to think about data as a graph, a web, a network. People, although understand the concept of a graph, are used to think about data from one point of view or another, and have difficulty when they need to “put themselves above the data”, i. e. imagine a graph as a whole.

It’s interesting that for developers it can be even harder (compared to non-programmers):

Having worked with tables in the RDBMS for so long, many developers have adopted tables as their way of thinking about the problem. Even for those fluent in object-oriented design (a graph model) the practical implications of working with a graph of objects leads us to develop, predominantly, trees.

Similarly, it seems that most people understand that the Web is a huge graph consisting of web pages and hyperlinks between them. However, the Web is “experienced” from the perspective of particular Web sites or pages (which are organized predominantly hierarchically), rather than a Web graph as a whole.

For example, the typical navigation menu on a website contains a list of hyperlinks to internal web pages (top-level menu), representing hierarchically organized “child” nodes of the tree forming around the website as its root. External hyperlinks to other Web sites and pages, as well as internal (relative) hyperlinks that “skip” this hierarchy, break the tree structure and create a graph*.

graph tree links

Another „graph“, people seem to intuitively understand, is a file system. File systems typically have directories (folders) and allow hierarchies where directories may contain subdirectories. These trees are relatively easy to understand, but are somewhat limited when it comes to navigation. In a tree, you can go one level up, or one level down.

Fortunately, you’re not limited to this kind of “tree links”, but can “jump” to any part of file system. You can do that, thanks to shortcuts, and these are possible due to the fact that every folder or file has a unique address – a path that can be easily manipulated. So when starting a program, you don’t have to go to the exact location of the executable file on the disk every time, but rather click on the shortcut on the Desktop. A similar way hyperlinks break the hierarchy of websites, shortcuts break the hierarchical structures of folders in a file system.

It seems that predominantly hierarchical (plus “shortcut” links) view of a graph is intuitively understood and that this fact should be used in order to facilitate understanding of the RDF model.

Linked Data is a step in this direction. In the Linked Data context, resources are identified by HTTP URIs, and their descriptions (obtained by dereferencing the URIs) contain all the RDF triples in which a particular resource appears as the subject or the object. In short, the description contains the part of a graph in which one node becomes the “root” relative to the other nodes, that can be thought of as its children nodes. Again, RDF links break tree structures connecting these subgraphs (RDF molecules or data objects), into a single global giant graph.

However, the problem is that you can’t browse this Linked Data graph in a way you do it on the Web, or in your file system. You are not allowed to traverse the nodes „hidden“ in documents containing the descriptions – you must download and parse them. These bits of data don’t have addresses, paths you can refer to or use for shortcuts.

When it comes to the Semantic Web and RDF, it seems that the idea of paths is primarily applied in the context of query languages. But what about paths as a part of the RDF model itself?

Tim Berners-Lee has written about them in the document Shorthand: Paths and lists, and @keywords:

Often it turns out that you need to refer to something indirectly through a string of properties, such as “George’s mother’s assistant’s home’s address’ zipcode”. This is traversal of the graph.

Such an indirect referencing can be expressed through a series of RDF triples chained with a number of blank nodes:

[is con:zipcode of [
    is con:address of [
        is con:home of [
            is off:assistant of [
                is rel:mother of :George]]]]]

The author then presents more elegant notation – a shortcut inspired by cascading style used by methods and attributes in an object-oriented language (dot notation), where „.“ (dot) is used as a delimiter:


This is forward traversal of the graph, where with each “.” you move from something to its property. So ?x.con:mailbox is x’s mailbox, and in fact in english you can read the “.” as ” ‘s”.

Let me repeat what I think is one of the most powerful and yet one of the most neglected ideas of the Semantic Web:

You move from something to its property.
?x.con:mailbox is x’s mailbox.

In Linked Data, you don’t move from something to its property. You can only move from something to “something else”. Now, if you can move to the property, it means you can stop, rest a bit, look around you. If you look behind, you’ll see a single node, the parent. And if you look ahead, you’ll see the children nodes, through which you can go on the journey, one node at a time. You are placed on the part of the global graph that has a form of a tree.

The statement “?x.con:mailbox is x’s mailbox” suggests that the “mailbox” relation is “instantiated”, materialized in the form of distinct node, being dependant on its parent. That node has a dual nature, encoding the relation and the node involved in the relation.

This approach is the one that fully respects the nature of a directed labeled graph. It’s elegant and provides flexibility in expression. It facilitates implementation of n-ary relations and encourages modular design. It allows deep, nested structures instead of flat ones. It uses indirect referencing, which is how people think and refer to things.

Finally, it indirectly acknowledges the hierarchical aspect of the RDF graph. It is quite similar to the structure of the websites. This is the only approach that enables realization of the Web of data, i.e. proper projection of an RDF graph to the Web graph.

So, how come such powerful idea has never come to life? First, Tim presented this idea primarily as a syntax convention (sugar), failing to realize the full potential of his own words. Second, it relies on hated, URI-less, evil blank nodes. The only way to fix it is to somehow add URIs to these nodes. But, isn’t the very absence of URI references what makes this approach possible?

It sounds almost like a paradox. On one hand you have paths without URIs, and on the other there are opaque URIs… containing no paths. There are two clear requirements – one from each side of the equation. Paths and URIs are both needed. Therefore, we have no other choice than to connect them.

And don’t forget: ?x.con:mailbox is x’s mailbox.

* Of course a tree is a already a (kind of a) graph. Here, the term “graph” can be thought of as a graph in a wider sense.

  • Two types of links on the Web

    [...] = "twitter,facebook,delicious,google_plusone,digg,hackernews,favorites,email,print";In the last post I discussed the hierarchical aspect of the Web, suggesting that there are two types of links on the [...]

  • The Challenge of Building the Semantic Web

    [...] using a very simple method, so I won’t go into details here. In short, the only way to do so is by utilising the paths, i.e. using names for nodes that correspond to graph traversal. In this way we can assign [...]

  • Introducing Hypernotation, an alternative to Linked Data

    [...] The idea of folders is something that most people using computers understand. If you know how to make a folder and navigate through a file system, and if you understand the idea of hyperlinks, you know everything you need to understand how Hypernotation works. After all, the Web is just a bunch of trees plus shortcuts. [...]