Problems of Linked Data (3/4): Publishing data

In the blog post What people find hard about Linked Data, Rob Styles covered the difficulties that people face when they first learn about publishing Linked Data. His analysis is based on the experience of teaching Linked Data hundreds of people with different profiles and backgrounds. According to Rob, people find Linked Data hard to learn because of several steps along the way – certain things that are conceptually difficult to grasp.
learning linked data publishing

One of these is understanding the difference between URI and URL:

First they [people on the course] have to recognise that they need different URIs for the document and the thing the document describes. It’s a leap to understand:

  • that they can just make these up
  • that no meaning should be inferred from the words in it (and yet best practice is to make the readable)
  • that they can say things about other peoples’ URIs (though those statements won’t be de-referencable)
  • that they can choose their own URIs and URI patterns to work to

The information/non-information resource distinction forms part of this difficulty too. While for naive cases this is easy to understand, how a non-information resource gets de-referenced and you get back a description of it is difficult.

Rob puts together HTTP, 303s and supporting custom URIs in a separate set of problems:

[...] Most web devs today will have had no reason to experience more than 200, 404 and 302 [HTTP status codes] — some will understand 401 if they’ve done some work with logins, but even then most of the framework will hide that for you.

So, the need to route requests to code using a mechanism other than filename in URL is something that, while simple, most people haven’t done before. Add into that the need to handle non-information resources, issue raw 303s and then handle the request for a very similar document URL and you have a bit of stuff that is out of the norm — and that looks complicated.

When Richard Cyganiak asked about the Impractical features of the RDF stack on, by far the highest voted answer, by Ed Summers, refers to the problem of the difference between information and non-information resources:

As a software developer the worst thing about Linked Data for me is trying to decide if something is an Information Resource or not… and minting identifiers and defining server side behavior accordingly. httpRange-14 is dead, long live httpRange-14! I personally have come to prefer REST‘s laissez-faire approach to the nature of resources. URLs identify Resources. Resources can be anything. When you resolve a URL you get a Representation back. Does it really have to be more complicated than that?

According to the HTTP Range 14 resolution, non-information resources are not allowed to return HTTP response “200 OK” after the HTTP request, but rather to redirect to the URL where the resource is described. Many large websites like Google, Yahoo, Bing, Facebook, New Your Times, Freebase are violating httpRange-14, sending a clear message of its impracticality. This idea is not strongly supported even in the Linked Data community where people often debate this controversial topic.

The content negotiation is another aspect that contributes to the complexity of Linked Data. In the Frequently Observed Problems on the Web of Data, an entire chapter is dedicated to frequent mistakes in practice related to how a document is accessed on the Web, with particular reference to HTTP-related issues. A significant number of errors is covered: incorrect Content-Type, content negotiation, incorrect interpretation of the Accept Header, missing Vary Header and the problems with caching.

Publishing Linked Data is often perceived as unduly difficult, demotivating people interested in publishing data. An average potential publisher has been „spoiled“ by much simpler solutions on the Web. She is used to getting quick explanations, and learning from 5 minute tutorials. When it comes to Linked Data, you need 5 minutes just to (try to) explain the difference between information and non-information resources. People have no other option than to learn how to publish Linked Data from 100 pages books and 3 hour lectures. It seems it’s not possible to explain Linked Data in less time and that’s what we should worry about.

It looks like the (selfish & lazy) nature of an average internet user is not well understood and exploited in Linked Data. For instance, explaining why the world will be a better place if one includes links to other things is not what motivates people. I don’t create hyperlinks on this blog because I will help „connecting data islands into a global, interconnected data space“. I do that because the links add value to my blog and help my readers by providing the context. The mere idea of telling people to make links in 2011 is just wrong. If one thing comes naturally on the Web (whether it is the Web of documents or the Web of data), it’s linking.

Linked Data is trying to follow the principles of the original Web, but instead of focusing on the most important one – simplicity, it insists on the implementation of various relatively complex and geeky technologies of the Web architecture. One can argue that neither of the technologies individually is that hard to understand and implement, but taken together, they make publishing Linked Data complex, esoteric and different to what people are used to on the Web.

  • Problems of Linked Data (4/4): Consuming data

    [...] defined properly. A lot of room for different interpretations indicates its substantial weakness. Publishing data by Linked Data rules for most people is very hard. Consuming data is hard. Understanding the [...]

  • Problems of Linked Data (1/4): Identity

    [...] publishing data [...]

  • The Challenge of Building the Semantic Web

    [...] Considering all the limitations it faced, Linked Data has offered perhaps the only reasonable solution. Of course, one can argue that there are many unnecessarily complicated aspects of it, partly caused by the same limitations and partly because of a number of problematic decisions. [...]

  • Solving Linked Data problems with Hypernotation (DBpedia example)

    [...] is said about difficulties of publishing Linked Data. It seems that the main problem with Linked Data is that many requirements just don’t seem to [...]