Problems of the RDF model: Literals

Literals are nodes in an RDF graph, used to identify values such as numbers and dates by means of a lexical representation. Literals may be plain or typed:

  • A plain literal is a string combined with an optional language tag. It is considered to denote itself, so has a fixed meaning.
  • A typed literal is a string combined with a datatype URI. It denotes the member of the identified datatype’s value space obtained by applying the lexical-to-value mapping to the literal string. For instance, the typed literals <xsd:boolean, “true”> and <xsd:boolean, “1″> denote the logical TRUE and are realised by mapping the strings „true“ or „1“ to the appropriate value defined using the XML schema.

Problems regarding literals can be divided into two categories:

  • problems with literals in general
  • problems associated with typed literals

Problems with literals in general

A literal represents a value and thus differs from the other nodes (HTTP references and blank nodes) that represent a resource. However, the boundary between the concept of a resource and the concept of a value is thin:

Anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals.

For example, it’s easier to use the literal “7″ (or “7″^^xsd:integer) than, for instance, the URI http://dbpedia.org/resource/7_(number). Literals are usually abstract values and describing them in most cases is not necessary nor practical. For instance, the fact that seven is an odd number is irrelevant in the context of the description of one’s age. Literals are end nodes in an RDF graph that don’t branch out. They can not be subjects in RDF triples – they are always the objects used to describe a resource.

To better understand literals, one can make an analogy between the RDF model and the object-oriented (OO) model. In this context, URI references would represent objects, while a literal would be the value of an object property. The properties that have literals as values usually belong to the so-called primitive (basic) datatype, which, unlike the object, is represented by a single value.

However, there are significant differences between these two models. First, a literal is assigned to an object’s property, not the object itself. In the RDF model, a literal is directly connected to an URI reference, forming a literal triple. Therefore, there is no “property” of an URI reference as a separate concept – a mediator between an URI reference and a literal. The meaning of this property is contained in the predicate of a triple. Outside the triple, information is lost and a literal is just data without any meaning.

Another difference is the way of identification. In the OO model, a literal is not identified by its value. Roughly speaking, one can say that a literal is “identified” as the value of an object’s property (which is uniquely identified in an object hierarchy). On the other hand, in the context of the RDF model, a literal uses its own method of identification, while the resource whose value it represents practically doesn’t exist.

A literal is identified by the value it represents (or the value with a language tag or datatype URI in the case of typed literal). This method of identification differs from the method of identification of URI references. A literal doesn’t have an URI, meaning that in the Web context it can not be realised as a Web resource, it can not be looked up and referred to.

Let’s look at the example of the literal triple describing the nickname of a person:

@prefix foaf: <http://xmlns.com/foaf/0.1/> . 
<http://carlosraynorris.com/data_/carlos> foaf:nick “Chuck” .

What is Carlos’ nickname? In everyday language, we would say “Chuck”. However, the precise answer is that “Chuck” is the value of his nickname. In the RDF model, a plain literal is interpreted as of type „xsd:string“, therefore “Chuck” is a five character string. The missing concept is the concept of this particular person’s nickname whose value is “Chuck”. This concept is a resource that can be represented by a new node – HTTP reference or blank node:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://carlosraynorris.com/data_/carlos> foaf:nick [ rdf:value “Chuck” ]

In this example, the blank node represents the instance of the class “Carlos’ nickname”, and its value “Chuck” is the literal realized with the property “rdf:value”. “Carlos’ nickname” is a permanent concept, while its instances may change over time, or there may be more of them. This new node is important because it contains meaning which previously required a separate RDF triple. Or, in plain English: “Carlos’ nickname” instead of the sentence “Carlos has a nickname.”

An angry literal

The existence of this node specifies more clearly the role of the literal, that becomes its value realized through the “rdf:value” property. This kind of “internode” doesn’t exist in the RDF model in general, so literals are practically values of non-existent concepts. That missing concept is a glue that connects a URI reference to a literal. It represents, so to speak, a kind of primitive-type variable whose value is a literal. This variable is conflated with its value (literal), which is the key problem of literals in the RDF model.

Problems with typed literals

A typed literal is a literal containing more than one value – in addition to data itself, it contains its datatype. In this way it deviates from the key idea of the RDF model, where all information “breaks” to the level of atomized data represented by nodes, and is descibed by RDF triples.

Typed literals are the special case of the RDF model and require a special syntax. Here is an example of such literal denoting a float number in Turtle notation:

@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<#somePlace> geo:lat "45.45385"^^xsd:float .

The typed literal represents data (45.45385) and the datatype identified by URI (http://www.w3.org/2001/XMLSchema#float). There is an implicit relation “rdf:type” between these two values, so practically the whole triple is contained in a single node.

The typed literals are a frequent source of errors and confusion in practice. In the document “Frequently observed problems on the Web of data” by the Pedantic Web Group, an entire chapter is dedicated to the typed literals.

[...] However, the definition of these datatypes, and their interpretation in RDF, is replete with gotchas — even the Web-standards boffins sometimes disagree on how datatypes should be defined and handled.

Authors then distinguish two types of problems:

  • Malformed datatype literals
  • Incompatibility with range datatype

The first problem is caused by many datatype classes that have complex lexical representation – particularly those related to “dataTime” class are a frequent subject of errors.

The second problem refers to the fact that the range of properties can be a datatype class. In the previous example, the property “geo:lat” is the part of the “geo” ontology in which it’s described. In this ontology, the property “rdfs:range” can be used to indicate the value of the ​​”geo:lat” property as follows:

geo:lat rdfs:range xsd:float .

Similarly, in many programming languages, when declaring a variable, its datatype is also defined, for example:

float lat;

This variable is then assigned a value:

lat = 45.45385;

In RDF, however, this is not possible because a literal, when used without a datatype (or a language tag), can be considered equivalent to the datatype class “xsd:string”. As literal can belong to only one datatype, it cannot be string and float at the same time. Therefore, despite the possibility of describing the range, e.g. some sort of “declaration” of a property, the type of a literal must always be explicitly stated and the plain literal cannot be used.

Finally, literals with language tag should be mentioned, as another special case of the RDF model. These nodes, like typed literals, contain two values, but instead of a datatype URI, a language tag is used to indicate the language of the contained text. Literals with language tags are in practice implemented using another special syntax. This solution is even more in conflict with the principles of RDF, because compared to the typed literal, it doesn’t even rely on a URI, but on a syntax convention.


  • http://milicicvuk.com/blog/2011/07/19/ultimate-problem-of-rdf-and-semantic-web/ The Ultimate Problem of RDF and the Semantic Web

    [...] posts I’ve covered two important problems of the RDF model, related to blank nodes and literals. Here, I’m going to focus on what I think is the key problem of RDF – the problem of the [...]

  • http://milicicvuk.com/blog/2011/08/10/fixing-the-rdf-model/ Fixing the RDF model

    [...] is the meaning of a literal and how to identify it correctly? In an earlier blog post, I compared the RDF model to the object-oriented model, making an analogy between objects and URI [...]

  • http://milicicvuk.com/blog/2011/08/16/literals-blank-nodes-n-ary-relations-and-rdfvalue/ Literals, blank nodes, n-ary relations and rdf:value

    [...] as such is always dependent on a resource whose value represents. As has been discussed in the post Problems of the RDF model: Literals, there is a need to clearly separate the concept of a literal from this resource that acts sort of [...]