Two types of links on the Web

In the last post I discussed the hierarchical aspect of the Web, suggesting that there are two types of links on the Web: tree links and graph links.

Tree links

The Web consists of web sites, which typically have a tree structure, i.e. one that involves parent-child links between the various levels of the hierarchy. A homepage is a root connected with the top level web pages, which are connected to the next level and so on. Let’s call this kind of links „tree“ links. Such links can be implemented in a website in two ways: implicitly and explicitly.

Implicit tree links

Take, for instance, the following two web pages, assuming that a product with the ID „456“ belongs to a category identified by „123“.

http://website.com/?categoryID=123

http://website.com/?productID=456

These two web pages form a parent-child (category-product) relation, even though that’s not clear from the structure of URLs. Here, URLs don’t provide any hint about the relation of the two, so we have no idea what is their relation until we look up them.

Explicit tree links

Another method is the one using explicit tree links:

http://website.com/categories/123

http://website.com/categories/123/products/456

Compared to implicit tree links, the hierarchical structure is clear from URLs. The difference between source (parent) and the target (child) URLs is the edge between the nodes in the tree. This difference (products/456) can be implemented as a relative hyperlink from the source to its immediate child:

<a href="products/456">products</a>

Take another example:

http://website.com/guitars/electric-guitars

Here, the tree structure is clear as well:

http://website.com

http://website.com/guitars

http://website.com/guitars/electric-guitars

Each additional child’s URL contains an extra segment that distinguishes it from its parent. The difference between URLs is encoded explicitly, as a part of target webpage’s URL.

Paths

In the last post I touched on the concept of paths in a graph. Paths allow one to refer to something indirectly, allowing the traversal from something to its property. We can make Tim Berners-Lee’s example „?x.con:mailbox is x’s mailbox“ more general: x/property is x’ property, where x is the source node that can be a website, or any other node in the hierarchy [*].

Therefore,

<http://website.com/property> is <http://website.com>’s property

meaning that in

<a href="property">this property</a>

placed in http://webpage.com, we know not just the direction of the link, but its name encoded as the value of the href attribute as well.

In the context of a hierarchy, this property is limited to „has child“, or „is parent of“. However, as we see in the example

http://website.com/guitars/electric-guitars

the properties „guitars“ and „electric-guitars“ represent richer relations between the web pages, that are intuitively understood by people.

Explicit tree links therefore connect web pages having URLs in the form of paths. They have the following features:

  • The URL of a target web page contains the URL of the source web page
  • The target web page is the source’s property, and is dependant on it
  • The difference between the URLs encodes the name of relation between web pages

For example:

<http://website.com/guitars> links to <http://website.com/guitars/electric-guitars>

  • The URL of the source web page is obviously contained in the URL of the target web page (http://website.com/guitars/electric-guitars)
  • The target node is the source’s property „electric-guitars“. If http://website.com/guitars is deleted, the child is deleted too, so it’s existentially dependant on it.
  • The difference between URLs is „electric-guitars“, representing the relation between the web pages

The third point is the key. It tells that tree links can have a name (or a type), without encoding rel attribute in the <a> tag of the web page that links to the target page. This name denotes a relation between two nodes in a graph (tree), using nothing but a fundamental technology of the Web architecture – URI.

In the hierarchical context, it has the meaning „has child“ (or „is parent of“). Not much, but the idea that the meaning of a relation between nodes is encoded in the path is exciting. It suggests that the meaning of other types of relations can be encoded as well, using the same principle.

The problem is that these relations are human-readable only. A machine has no clue what the string „guitars“ represents. But if instead of string we use a URI, we can encode the explicit relation (predicate).

A short history of tree links

Historically, explicit tree links (and paths) were common in the early phase of the Web, when static files were published and URLs simply reflexed the directories on the server. Soon, dynamic pages with query string URLs emerged and the relations of web pages were not clear from their URLs any more.

Finally, Web 2.0 has popularized URL rewriting leaving the actual locations of files on the server irrelevant. On the other hand, URLs based on the friendly patterns, often encoding the hierarchical relations between the web pages has been gaining popularity again.

One of the important axioms of the Web architecture is the opacity axiom. It’s a rule requiring URLs to be opaque, i.e. treated like compact identifiers/addresses that don’t encode any data that could indicate the nature of the resource behind it. Therefore, the value of paths is not recognized and it was not until the popularization of friendly and „hackable“ URLs that the rule is somewhat revised and a level of transparency is allowed in certain situations.

The opacity axiom makes sense in the global Web context where, in general, one can’t rely on the string value of the URL (URI). However, the power of paths and indirect referencing, especially in regards to their ability to give link a name can’t be ignored any more. If explicitly identified and defined using URIs, these links’ names open up a completely new paradigm which will require different rules. In that context, the URIs are not just transparent, but machine-readable, leaving the opacity axiom depreciated.

Graph links

The second type of links are the links that are not limited by the hierarchical order, but enable teleporting to a random web page on the Web. These links are the „real“ hyperlinks, which can be also called „graph“ links, to differentiate them from tree links.

Compared to explicit tree links, hyperlinks have the following features:

  • The URL of a target web page doesn’t contain the URL of the source web page
  • A target web page is not the property of the source web page and is not in any way dependant on it
  • They always hold the same meaning

For Example:

<http://website.com/guitars> links to <http://anotherwebsite.com/guitar-parts>

  • The URL of the source web page is obviously not contained in the target’s URL
  • http://anotherwebsite.com/guitar-parts is not dependant on http://website.com/guitars. If http://website.com/guitars disappears, that doesn’t affect the existence of target.
  • Because the URL of the source is not contained in the target URL, there is no difference between URLs, so the type of relation between them can’t be encoded.

Although graph links are capable of holding a meaning, this meaning is always the same. It just says: this web page points to another. Maybe it can be defined more specifically, using the verb „mention“ instead of „point“. But whatever common meaning people agree upon, it’s the same in all possible contexts.

The graph links, or hyperlinks, are directed edges of the Web graph, so it’s quite natural for them to hold only the information about the direction. The fact that one web page points to (or mentions) another tells us about the one-way direction of the hyperlink between them and nothing more.

Therefore, hyperlinks are the type of links on the Web that inherently can’t hold any other information than the direction. That’s why any effort to add a name to a hyperlink is ultimately doomed to fail. A hyperlink is just not intended for that purpose.

One example is using HTML tags and rel attributes trying to describe hyperlinks. Another is RDF links that use predicates with explicitly described meaning, and can be encoded in a number of special syntaxes. The problem is that the both approaches don’t respect the fact that hyperlinks inherently can’t contain any other information than the direction.

If we want to make the Semantic Web (as an extension to the Web) a reality, first we have to fully understand the Web. And we can’t understand the Web without understanding its fundamental elements – links. There are two types of links – tree links and hyperlinks. The only way for the Web to evolve to the Semantic Web is to recognize the fact that the explicit tree links are the only one able to encode names, and use them together with hyperlinks, respecting the distinct but powerful nature of both.

[*] I am using the terms property and relation somewhat interchangeably here. In one of the future posts I’ll cover the difference between the two in the context of paths in more details.

  • http://milicicvuk.com/blog/2011/10/11/this-is-how-i-imagine-the-web-3d-visualisation/ This is how I imagine the Web (3D visualisation)

    [...] web pages are connected with tree links. “Tree” links are links that can connect just a parent with a child in a hierarchy, [...]

  • http://milicicvuk.com/blog/2011/10/20/the-challenge-of-building-the-semantic-web/ The Challenge of Building the Semantic Web

    [...] the hyperlink represents „same as“ relation. Because hyperlinks can’t hold a name, they all have to share the same meaning. In the RDF context, this meaning can be expressed with the property [...]