Saturday, April 21, 2012

Stability and Fragility of Namespaces

While working on a blog post which will soon be published (and linked to) on CITYTECH, Inc's site, I mentally ran across the subject of updating a namespace definition within a domain of data.  More concretely, I was considering why Apache Jackrabbit does not allow updates (or unregistrations for that matter) to namespaces once they are established within a given Repository.  It seemed to me initially that allowing changes to namespaces would be valuable, for example, as new versions of an ontology were published.  Considering the matter further however I began to realize how dangerous such a practice would be.

Consider the following scenario.  Let us say that I told you that I bought a new shirt, the color of which was blue.  However, instead of saying that, I said "I bought a new shirt, tar klonor blue."  You would look quizically at me and perhaps question your hearing of my statement, because what I didn't tell you was that I had contrived a new statement, "tar klonor" which meant "having the color".

This example is somewhat absurd in and of itself but it is essentially what would happen to a machine's ability to understand linked-data statements if a namespace were changed in the domain of the data being represented.

Consider now a more concrete example.  Let us say that I have created a food ontology identified by the URI  Now let us say that I have two documents containing food information.  I present these documents in listing 1 and listing 2 respectively.

@prefix food: <> .
@prefix ex: <> .

ex:americanCheese a food:Cheese .
ex:lambChop a food:Meat .
ex:apple a food:Fruit .
ex:provalone a food:Cheese .

Listing 1

@prefix food: <> .
@prefix me: <> .

me:camembert a food:Cheese . 
me:brusselSprouts a food:Vegetable .

Listing 2

If I were to search over this dataset for all resources which are, I would find three things.  Now, let us say that I create a new version of the ontology and identify it with the URI however I only update document 1 with the new namespace.  Now, if I perform the same search, I only find one thing.  I know in my heart of hearts that I meant for to be semantically equivalant to, however a system looking at this data has no reason to make this connection (nor should it).  It is equivalant to me creating the new phrase "tar klonor" and then assuming that you will understand the meaning of my sentances including said phrase.  One solution to the problem would be to update the second document along with the first, however this assumes that all documents and systems utilizing the URI of this ontology are under your control.  If your ontology is more widely used, this is not viable.

OWL does expose some mechanisms for handling this (see, however these seem cumbersome and rely on a system to implement the understanding of the versioning constraints.  Further, some of the more robust constraints are only available in OWL-Full, the implementation and usage of which is far from trivial.  And this only covers ontology versioning.  What about specifications which are not ontologies?

Some time ago, a version 1.0 of Dublin Core existed and there was talk of creating a version 2.0 after version 1.1 (some old notes on this and on translations of DC).  Imagine if you already had your data published in DC 1.0 when 1.1 was pushed out.  The change to version 1.1 updated the URI of the specification and as such, made your data obsolete for all intents and purposes.  Given this, it's clear why the RDF URI still has "1999" in it.  Also, on some specification sites (such as FOAF) you will find statements concerning the stability of the specification URI, specifically, it's not changing.

Coming to the end of this rather long winded discussion, I suppose the bottom line is, Jackrabbit does not need to support changes to namespaces, because namespaces shouldn't change.  Updating a namespace in your domain of data is equivalent to updating all nodes of data using that namespace, which should not be taken lightly.