Solving Linked Data problems with Hypernotation (DBpedia example)

In this post I will present a real use case showing what Hypernotation looks like in practice. I have chosen one of the most popular datasets in Linked Data – DBPedia, and published its dataset using Hypernotation principles. It’s published on hypernotation.org – the homepage contains some friendly examples, so I invite you to visit it as well.

I’m going to explain the basics of Hypernotation using some real-world examples. I’ll go through the four problems of Linked Data I discussed some time ago on this blog, and show how I attempted to solve them with Hypernotation.

1. Identity

The simple question of „What is exactly Linked Data?“ is not easy to answer. The main concern here is whether or not RDF is required.

In Hypernotation, the data model is a directed, labeled graph whose nodes and links are identified by HTTP URIs. It can be regarded as a simplified and more consistent version of the RDF model.

For instance, let’s take a look of the hyperNode (a node in Hypernotation graph) representing the area (in Km) of Iceland:

http://hypernotation.org/data__/dbpedia/Iceland/dbpedia2__areaKm
>> 103001

The URI of this hyperNode is the path (or pathname) between the two nodes (http://hypernotation.org and http://hypernotation.org/data__/dbpedia/Iceland/dbpedia2__areaKm). Below the URI is the response returned by looking it up, containing its value.

In the RDF model this is expressed using a single triple:

<http://hypernotation.org/data__/dbpedia/Iceland>
  dbpedia2:areaKm
    "103001"^^<http://www.w3.org/2001/XMLSchema#int> .

In Hypernotation this is expressed with three triples:

<http://hypernotation.org/data__/dbpedia/Iceland>
  dbpedia2:areaKm
    <http://hypernotation.org/data__/dbpedia/Iceland/dbpedia2__areaKm> .

    <http://hypernotation.org/data__/dbpedia/Iceland/dbpedia2__areaKm>
      rdf:value "103001" ;
      datatype: <http://www.w3.org/2001/XMLSchema#int> .

The concept of Iceland area is now a distinct resource identified with URI. Compared to the classical RDF, in Hypernotation every node has a global address, becoming the first class citizen on the Web. That is, each piece of data is easily linkable, shareable and bookmarkable.

2. Concept

Linked Data is highly determined by the level of data granularity. The trouble is that blank nodes cause this level to be too high. The basic building units are not triples, as one could logically assume, but ‘rdf molecules’, a concept not so easy to understand and deal with in practice. By setting the model on such a high level of granularity, a good deal of flexibility is lost.

In Hypernotation, the basic unit is a triple, and there are no blank nodes. However, the ‘spirit’ of blank nodes is preserved thanks to identifying nodes with paths encoded in URIs. Blank nodes enable indirect referencing, one that is generally more natural and closer to how people express themselves.

Using paths ensures that each node has a unique global address, so a sort of compromise is made between humans and ‘machines’. Paths are what enables the chaining of triples, connecting these basic units into more complex, meaningful ‘sentences’.

For instance, check out the resource representing Iceland calling code:

<http://hypernotation.org/data__/dbpedia/Iceland/dbpedia2__callingCode>
>> 354

Again, a resource in Hypernotation (called hyperNode) is determined by the two dimensions: its path and the response it returns. Its path is a URL that is optimized to be triple-friendly.

Here, several triples are chained in the path that can be understood as a question, or a query:

<http://hypernotation.org>
  data:
    <http://hypernotation.org/data__/dbpedia/Iceland> .

    <http://hypernotation.org/data__/dbpedia/Iceland>
      dbpedia2:callingCode
        <http://hypernotation.org/data__/dbpedia/Iceland/dbpedia2__callingCode> .

        <http://hypernotation.org/data__/dbpedia/Iceland/dbpedia2__callingCode>
          rdf:value
            ?value .

The data returned when the resource is looked up (in this case the literal “354″) is the anwser to the question.

How will a ‘machine’ know what is the meaning of dbpedia2:callingCode? To figure that out, it needs the namespace URI mapped to dbpedia2. The mapping is described on the path website.com/prefix__, where the prefix: (CURIE) is the predicate of the triple:

<http://hypernotation.org>
  prefix:
    <http://hypernotation.org/prefix__/dbpedia2> .

    <http://hypernotation.org/prefix__/dbpedia2>
      owl:sameAs
        ?value .

http://hypernotation.org/prefix__/dbpedia2 returns http://dbpedia.org/property/. Therefore, the full URI of the property is http://dbpedia.org/property/callingCode; by looking it up, one can find more about its meaning.

3. Publishing

Lots has been said about difficulties of publishing Linked Data. It seems that the main problem with Linked Data is that many requirements just don’t seem to be worth the effort. They are not justified well, at least from the point of an average developer.

Hypernotation, on the other side, is similar to the REST and follows the hypermedia ideas. It is based on HTML(5) format, namely (semantic) <a> and <li> HTML elements. The information about data structure is encoded in URLs, while data is formatted using HTML. A URL pattern and a few HTML elements are all what is needed for putting RDF graph on the Web.

The publisher is guided and have less freedom than in Linked Data. He knows where to put the data and what identifiers to use. Only a few naming conventions are used to ensure interoperability. Take the following hyperNode:

http://hypernotation.org/data__/dbpedia/Iceland/rdfs__comment/en

Here, there are two conventions: data__, as a default starting point for data and en, as a segment that implies the language in which the text is written. The rdfs__comment CURIE is also determined by the common prefix and defined local name. Therefore, in this case a publisher need to create only two segments: dbpedia – which is the same for all the resources belonging to that group, and Iceland, a unique key identifying the resource.

Practically, the Iceland is the only segment a publisher has to create, meaning that from thinking about URIs, a focus is shifted to the names, or IDs that are more friendly to people.

Regarding the “sensitive bits” of Linked Data (information vs. non-information resources, HTTP-range, content-negotiation, dealing with different syntaxes etc.), Hypernotation offers solutions for them, but in a way that doesn’t slap a prospective publisher in the face. These subjects will be discussed in detail in the future posts.

4. Consuming

The consuming aspect of Linked data is also problematic. When it comes to getting RDF data, there are two extremes – a primitive one vs. a highly sophisticated one. The former is about the idea of resource lookups and graph traversal, while the latter, of course, refers to SPARQL endpoints.

But where is the middle point? Linked Data alone doesn’t provide a way for meaningful compromise. It is inflexible and unable to evolve due to the inappropriate underlying model and the rigidity caused by the wrong level of granularity.

In Hypernotation, one doesn’t need to prepared for parsing a bunch of different formats when consuming data. Instead, he just look up the data URL and get the value. When needed, a minimum of (familiar) HTML syntax is used that can be easily parsed.

It is important that data is not just easily machine-parsable, but also easy for humans to consume it. Hypernotation enables you to share the URL of the exact chunk of data you are interested in. The receiver will get the readable results in his browser, together with the context of the data (encoded in the URL path) and the ability to interact with it, simply following the links.

Another benefit Hypernotation provides for the consumer is using predictable data locations, making data easier to find. Eariler in this post I described how the publisher is limited when it comes to minting URIs for data. The existing URIs (CURIES) determine the contents of new URIs, while conventions are used when interoperability is needed.

However, the real benefit of this approach is on the consumer side. Imagine you deal with a lot of different data published on different websites. In Linked Data or REST API you have no idea where the data is. You must go to each website separately, browsing through it and finding the location of data. Each one does it differently and the process can’t be automated. In Hypernotation, if you know the website’s URL, you know where is the data.

For example, if you know the website’s homepage:

http://hypernotation.org

… where to look for the published data? The answer:

http://hypernotation.org/data__

What vocabularies are used for describing it?

http://hypernotation.org/prefix__

Hypernotation also encourages the idea of ‘URL hacking’, i.e. guessing the URL based on the other URLs. Using familiar URL path segments, you can construct new logical URLs for the data you are looking for.

For instance, given that you know the following URI:

http://hypernotation.org/data__/dbpedia/Iceland/rdfs__comment/en

… what is the URI of the Iceland’s rdfs label in French? You guess it right:

http://hypernotation.org/data__/dbpedia/Iceland/rdfs__label/fr

And what is the homepage of Paris?

http://hypernotation.org/data__/dbpedia/Paris/foaf__homepage

Given that dbont:birthPlace is the property, give me the list of people born in Manchester.

http://hypernotation.org/data__/dbpedia/Manchester/is___dbont__birthPlace___of

As you can see, the idea of default data locations in Hypernotation is very important. The cool part is that the locations of data can be guessed not just by humans, but by ‘machines’ as well, thanks to the fact that the meaning of the relations is well-defined.

Finally, it is important to acknowledge that the opaque axiom doesn’t work in the context of data, and that we need a ‘transparency axiom’ (more on that soon). Structured data forms a graph, and graph is nearly useless without the ability to use paths. In order to use paths, we encode them in URIs, so they must be transparent by definition.

Hope you find the DBpedia example interesting. Any questions and suggestions are welcomed!