Saturday, April 21, 2012

Stability and Fragility of Namespaces

While working on a blog post which will soon be published (and linked to) on CITYTECH, Inc's site, I mentally ran across the subject of updating a namespace definition within a domain of data.  More concretely, I was considering why Apache Jackrabbit does not allow updates (or unregistrations for that matter) to namespaces once they are established within a given Repository.  It seemed to me initially that allowing changes to namespaces would be valuable, for example, as new versions of an ontology were published.  Considering the matter further however I began to realize how dangerous such a practice would be.

Consider the following scenario.  Let us say that I told you that I bought a new shirt, the color of which was blue.  However, instead of saying that, I said "I bought a new shirt, tar klonor blue."  You would look quizically at me and perhaps question your hearing of my statement, because what I didn't tell you was that I had contrived a new statement, "tar klonor" which meant "having the color".

This example is somewhat absurd in and of itself but it is essentially what would happen to a machine's ability to understand linked-data statements if a namespace were changed in the domain of the data being represented.

Consider now a more concrete example.  Let us say that I have created a food ontology identified by the URI http://example.com/food/v1.0/.  Now let us say that I have two documents containing food information.  I present these documents in listing 1 and listing 2 respectively.

@prefix food: <http://example.com/food/v1.0/> .
@prefix ex: <http://somesite.com/things/> .


ex:americanCheese a food:Cheese .
ex:lambChop a food:Meat .
ex:apple a food:Fruit .
ex:provalone a food:Cheese .

Listing 1

@prefix food: <http://example.com/food/v1.0/> .
@prefix me: <http://self.com/items/> .


me:camembert a food:Cheese . 
me:brusselSprouts a food:Vegetable .

Listing 2

If I were to search over this dataset for all resources which are http://example.com/food/v1.0/Cheese, I would find three things.  Now, let us say that I create a new version of the ontology and identify it with the URI http://example.com/food/v2.0/ however I only update document 1 with the new namespace.  Now, if I perform the same search, I only find one thing.  I know in my heart of hearts that I meant for http://example.com/food/v1.0/Cheese to be semantically equivalant to http://example.com/food/v2.0/Cheese, however a system looking at this data has no reason to make this connection (nor should it).  It is equivalant to me creating the new phrase "tar klonor" and then assuming that you will understand the meaning of my sentances including said phrase.  One solution to the problem would be to update the second document along with the first, however this assumes that all documents and systems utilizing the URI of this ontology are under your control.  If your ontology is more widely used, this is not viable.

OWL does expose some mechanisms for handling this (see http://www.w3.org/TR/2004/REC-owl-guide-20040210/#OntologyVersioning), however these seem cumbersome and rely on a system to implement the understanding of the versioning constraints.  Further, some of the more robust constraints are only available in OWL-Full, the implementation and usage of which is far from trivial.  And this only covers ontology versioning.  What about specifications which are not ontologies?

Some time ago, a version 1.0 of Dublin Core existed and there was talk of creating a version 2.0 after version 1.1 (some old notes on this and on translations of DC).  Imagine if you already had your data published in DC 1.0 when 1.1 was pushed out.  The change to version 1.1 updated the URI of the specification and as such, made your data obsolete for all intents and purposes.  Given this, it's clear why the RDF URI still has "1999" in it.  Also, on some specification sites (such as FOAF) you will find statements concerning the stability of the specification URI, specifically, it's not changing.

Coming to the end of this rather long winded discussion, I suppose the bottom line is, Jackrabbit does not need to support changes to namespaces, because namespaces shouldn't change.  Updating a namespace in your domain of data is equivalent to updating all nodes of data using that namespace, which should not be taken lightly.


DnL8Tar
-PCM

Friday, January 20, 2012

Automatic Shopping List - A Use Case for Linked Data

For some time now I've wanted shopping lists automatically generated from recipes.  In fact, I suspect there are sites which will perform this action on a single recipe basis, though I don't have the patience to search for them now.  From a single recipe it is fairly trivial to generate a shopping list.  In fact, one could simply print the recipe - there is normally an ingredients list included.  Working this way however, one would need to go shopping every single time they wanted to cook something, or one would need to print a number of recipes and reconcile overlap in the list manually. 


Consider then an application which would take n recipes and aggregate the ingredients into a shopping list.  Conceptually this is of value in situations where one is disciplined enough to plan their meals for the whole week. In a family setting I imagine the value is increased as you can plan meals for the whole family for a period of time and make sure you are minimizing the trips to the grocery store.  


There is a question of how the application would receive information about the recipes for which it is generating a list however.  This is where open linked data comes in.  If recipe providers (Food Network, All Chefs, etc) were to expose their recipe data as linked data, it could be collected into a single system and generally reasoned upon presuming it followed or was coerced into a somewhat standard ontology (or set thereof).  A user would enter a URL for a recipe into the application, indicating when they planned to prepare the dish.  After entering a number of recipes, the user would elect to generate a shopping list encompassing a certain time period and the system would generate the list based on all of the recipes at once.


One can imagine a number of optimizations to the results, but one which comes to mind, and is most often made manifest in my personal life, is a reconciliation of the shopping list with the contents of the pantry.  Last weekend, I was preparing beef stew.  Knowing that I would need Worcestershire sauce, I picked up a bottle, not remembering if I already had some.  When I arrived at home I found that I had an already unopened bottle sitting in my cupboard.  Had I known this while I was at the store, I could have avoided the expenditure.  Similarly, if the system, which I am endeavoring to contrive with this post, had access to exposed data concerning the users larder, it could adjust the list making sure it included only items which the user would need to buy to supplement their current stock.  


Considering the concept in complete reverse, the system could also suggest recipes based on what you currently have "in stock."  This feature may be more useful than those previously described depending on your lifestyle.


DnL8Tar
-PCM