What the Semantic Web can learn from JavaScript

RDF 1.1 Primer starts with the following example:

<Bob> <is a> <person>.
<Bob> <is a friend of> <Alice>.
<Bob> <is born on> <the 4th of July 1990>.
<Bob> <is interested in> <the Mona Lisa>.
<the Mona Lisa> <was created by> <Leonardo da Vinci>.
<the video 'La Joconde à Washington'> <is about> <the Mona Lisa>

Then it goes on and shows the visualisation of the triples as a connected graph:

Informal graphs of the sample triples

The problem of these two representations is that they can’t be read easily: the triple-based notation hides the structure of data, while the diagram shows the structure but is orderless and hard to follow. Bear in mind that in practice graphs are usually much bigger, with URIs used as identifiers.

There is another way to represent RDF. Bob, the Mona Lisa and La Joconde à Washington can be thought of as the roots of tree-like structures from which the arcs are directed:

<Bob>
  <is a> <person>;
  <is a friend of> <Alice>;
  <is born on> <the 4th of July 1990>;
  <is interested in> <the Mona Lisa>.

<the Mona Lisa>
  <was created by> <Leonardo da Vinci>.

<the video 'La Joconde à Washington'>
  <is about> <the Mona Lisa>.

The graph is now broken into three logical parts organized around their common subjects. This way, the representation of even big graphs is much more readable.

However, if we want to represent this in some of the popular (tree-based) data formats, we run into problems. Take a look of this JSON:

{
  "Bob": {
    "is a": "person",
    "is a friend of": "Alice",
    "is born on": "the 4th of July 1990",
    "is interested in": "the Mona Lisa"
  },
  "the Mona Lisa": {
    "was created by": "Leonardo da Vinci"
  },
  "the video 'La Joconde a Washintod'": {
    "is about": "the Mona Lisa"
  }
}

We can use only nested objects/arrays and primitive types (eg. strings). In the tree data model, only parent-child connections are allowed. For instance, the value of the property is interested in is just a string "the Mona Lisa". The connection to the actual node (resource) is lost.

However, in JavaScript, which inspired JSON in the first place, we are dealing with object model that, like RDF, forms a connected graph. In JavaScript we can model the above example like this:

var data = {};

data["Alice"] = {};
data["person"] = {};
data["Leonardo da Vinci"] = {};

data["the Mona Lisa"] = {
  "was created by": data["Leonardo da Vinci"]
};

data["Bob"] = {
  "is a": data["person"],
  "is a friend of": data["Alice"],
  "is born on": "the 4th of July 1990",
  "is interested in": data["the Mona Lisa"]
};

data["the video 'La Joconde a Washintod'"] = {
  "is about": data["the Mona Lisa"]
};

The representation is still tree-based, but we are dealing with a real graph now, in which values can be references that point to any other object (node). For instance, data["Bob"]["is interested in"] will return the Mona Lisa object:
>> Object { was created by: "Leonardo da Vinci" }

The problem is that the RDF model uses resources identified by URIs, not JavaScript objects. That is, the Mona Lisa and the video 'La Jaconde a Washintod' are identified by http://www.wikidata.org/entity/Q12418 and http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619 respectivelly.

How to incorporate these into our simple example? An interesting thing is that in JavaScript we never bothered to make identifiers for nodes, we got IDs (paths) for free from the strings and the object structure. These IDs are dot/bracket notation paths, like data.Bob or data["the Mona Lisa"]["is created by"].

On the other hand, URIs also resemble paths, only using slash (/) instead of dots and brackets. However, URIs are opaque, and although certain URI patterns are used as best practices, one shouldn’t generally infer the meaning of the resource from the characters in an URI.

How to bridge this gap? Let’s make the first step:

var wikidata = {
  "href": "http://www.wikidata.org/"
  "entity": {
    "Q12418": {
      "was created by": {
        "href": "http://dbpedia.org/resource/Leonardo_da_Vinci"
      }
    }
  }
};

var dbpedia = {
  "href": "http://dbpedia.org/",
  "resource": {
    "Leonardo_da_Vinci": {}
  }
};

var example = {
  "href": "http://example.org/",
  "data": {
    "Bob": {
      "is interested in": {
        "href": "http://dbpedia.org/resource/Leonardo_da_Vinci"
      }
    }
  }
};

Now if you have wikidata["entity"]["Q12418"], you can get URI easily: wikidata.href + ["entity", "Q12418"].join("/"), which will return:
>> http://www.wikidata.org/entity/Q12418

On the other hand, by dereferencing http://wikidata.org/entity/Q12418, you could get something like:

{
  "was created by": {
    "href": "http://dbpedia.org/resource/Leonardo_da_Vinci"
  }
}

Our new “Slash notation” URIs are not a sequence of random characters but a path, just like in dot notation. The path is then broken into segments that build up a hierarchy, or constructed back from the tree structure. This approach brings two important benefits: more concise and idiomatic syntax and hiding unfriendly URIs from developers’ eyes in the structure itself.

In this post I deliberately simplified things and abstracted away many details. If you like the idea of “Slash notation” check out The “RDF graph” URI pattern.