Seeking Foundations: linked-data

Showing posts with label linked-data. Show all posts

Thursday, June 27, 2013

How are Publishers Rewarded for Exposing Linked-Data?

Disclaimer: This document poses the question asked in the title without offering anything which can be reasonably called an answer. It is my hope that members of the relevant communities who know more than I do on the topic can provide some insight into potential answers.

Utilization of linked-data by applications is predicated upon the existence of accessible linked-data. In much the same way that publishers were told they could put their content online in formats like HTML, we now tell them they can expose their information as linked-data using formats like RDFa and JSON-LD. However, where the former had the fairly obvious benefit of making the publisher’s content visible to human consumers, the later seems to lack any immediately realizable end.

Lofty visions of automated agents and reasoning engines which would operate over the ever expanding web of linked-data have been touted since around the time that the phrase “Semantic Web” was being coined. It was indicated that, by exposing their information as linked-data, publishers could “hook-in” to these agents, making themselves visible to their users. Such agents however have yet to materialize and seem to be touted less and less from my observation, which I feel is unfortunate, but that’s an entirely different post.

Many “Semantic Web Applications” which I have seen, either in writings online or at conferences, are in-fact Semantically-Enabled applications which use some Semantic technologies, some of which have been born of the forge of the Semantic Web, in combination with other technologies (AI, NPL, etc), in order to build up a triple store and reason over it or operate upon it. These have been interesting applications but they are not Semantic Web applications as they go well beyond the boundary of utilizing exposed linked-data. Further, they are often operating in specialized domains over semantically enabled datasets and not over arbitrarily exposed information on publisher’s sites. As such, in and of themselves, such applications are providing no reward to the average content publisher.

Search Engines have taken up the torch to some extent in the form of Schema.org. This gives publishers a reason to expose their data as well as a concrete vocabulary to use in its exposition, but it positions the “Semantic Web” to be re-branded as “SEO 2.0,” which in my mind would be a loss of the initial vision. It is, however, from what I can find, the only realizable end of publishing linked-data along with your content.

When talking about / attempting to explain the Semantic Web to friends, family, and co-workers, I often employ the Chicken or the Egg metaphor in accounting for why this concept has not yet become ubiquitous (though I am sure some would disagree with the statement that it is not ubiquitous). If we take the Chicken to be the accessible data and the Egg to be applications, we may be getting closer to the Chicken, with the help of efforts such as Schema.org to an extent, which would give the Egg a raison d’être. In my experience the lack of a reasonable Egg to point to greatly complicates the task of encouraging publishers to expose their information as linked-data.

A final note: I would be very happy to be corrected on my observations and to be told that the Egg already exists (ideally by being pointed to such an Egg).

Saturday, April 21, 2012

Stability and Fragility of Namespaces

While working on a blog post which will soon be published (and linked to) on CITYTECH, Inc's site, I mentally ran across the subject of updating a namespace definition within a domain of data. More concretely, I was considering why Apache Jackrabbit does not allow updates (or unregistrations for that matter) to namespaces once they are established within a given Repository. It seemed to me initially that allowing changes to namespaces would be valuable, for example, as new versions of an ontology were published. Considering the matter further however I began to realize how dangerous such a practice would be.

Consider the following scenario. Let us say that I told you that I bought a new shirt, the color of which was blue. However, instead of saying that, I said "I bought a new shirt, tar klonor blue." You would look quizically at me and perhaps question your hearing of my statement, because what I didn't tell you was that I had contrived a new statement, "tar klonor" which meant "having the color".

This example is somewhat absurd in and of itself but it is essentially what would happen to a machine's ability to understand linked-data statements if a namespace were changed in the domain of the data being represented.

Consider now a more concrete example. Let us say that I have created a food ontology identified by the URI http://example.com/food/v1.0/. Now let us say that I have two documents containing food information. I present these documents in listing 1 and listing 2 respectively.

@prefix food: <http://example.com/food/v1.0/> .
@prefix ex: <http://somesite.com/things/> .

ex:americanCheese a food:Cheese .
ex:lambChop a food:Meat .
ex:apple a food:Fruit .
ex:provalone a food:Cheese .

Listing 1

@prefix food: <http://example.com/food/v1.0/> .
@prefix me: <http://self.com/items/> .

me:camembert a food:Cheese .
me:brusselSprouts a food:Vegetable .

Listing 2

If I were to search over this dataset for all resources which are http://example.com/food/v1.0/Cheese, I would find three things. Now, let us say that I create a new version of the ontology and identify it with the URI http://example.com/food/v2.0/ however I only update document 1 with the new namespace. Now, if I perform the same search, I only find one thing. I know in my heart of hearts that I meant for http://example.com/food/v1.0/Cheese to be semantically equivalant to http://example.com/food/v2.0/Cheese, however a system looking at this data has no reason to make this connection (nor should it). It is equivalant to me creating the new phrase "tar klonor" and then assuming that you will understand the meaning of my sentances including said phrase. One solution to the problem would be to update the second document along with the first, however this assumes that all documents and systems utilizing the URI of this ontology are under your control. If your ontology is more widely used, this is not viable.

OWL does expose some mechanisms for handling this (see http://www.w3.org/TR/2004/REC-owl-guide-20040210/#OntologyVersioning), however these seem cumbersome and rely on a system to implement the understanding of the versioning constraints. Further, some of the more robust constraints are only available in OWL-Full, the implementation and usage of which is far from trivial. And this only covers ontology versioning. What about specifications which are not ontologies?

Some time ago, a version 1.0 of Dublin Core existed and there was talk of creating a version 2.0 after version 1.1 (some old notes on this and on translations of DC). Imagine if you already had your data published in DC 1.0 when 1.1 was pushed out. The change to version 1.1 updated the URI of the specification and as such, made your data obsolete for all intents and purposes. Given this, it's clear why the RDF URI still has "1999" in it. Also, on some specification sites (such as FOAF) you will find statements concerning the stability of the specification URI, specifically, it's not changing.

Coming to the end of this rather long winded discussion, I suppose the bottom line is, Jackrabbit does not need to support changes to namespaces, because namespaces shouldn't change. Updating a namespace in your domain of data is equivalent to updating all nodes of data using that namespace, which should not be taken lightly.

DnL8Tar

-PCM

Friday, January 20, 2012

Automatic Shopping List - A Use Case for Linked Data

For some time now I've wanted shopping lists automatically generated from recipes. In fact, I suspect there are sites which will perform this action on a single recipe basis, though I don't have the patience to search for them now. From a single recipe it is fairly trivial to generate a shopping list. In fact, one could simply print the recipe - there is normally an ingredients list included. Working this way however, one would need to go shopping every single time they wanted to cook something, or one would need to print a number of recipes and reconcile overlap in the list manually.

Consider then an application which would take n recipes and aggregate the ingredients into a shopping list. Conceptually this is of value in situations where one is disciplined enough to plan their meals for the whole week. In a family setting I imagine the value is increased as you can plan meals for the whole family for a period of time and make sure you are minimizing the trips to the grocery store.

There is a question of how the application would receive information about the recipes for which it is generating a list however. This is where open linked data comes in. If recipe providers (Food Network, All Chefs, etc) were to expose their recipe data as linked data, it could be collected into a single system and generally reasoned upon presuming it followed or was coerced into a somewhat standard ontology (or set thereof). A user would enter a URL for a recipe into the application, indicating when they planned to prepare the dish. After entering a number of recipes, the user would elect to generate a shopping list encompassing a certain time period and the system would generate the list based on all of the recipes at once.

One can imagine a number of optimizations to the results, but one which comes to mind, and is most often made manifest in my personal life, is a reconciliation of the shopping list with the contents of the pantry. Last weekend, I was preparing beef stew. Knowing that I would need Worcestershire sauce, I picked up a bottle, not remembering if I already had some. When I arrived at home I found that I had an already unopened bottle sitting in my cupboard. Had I known this while I was at the store, I could have avoided the expenditure. Similarly, if the system, which I am endeavoring to contrive with this post, had access to exposed data concerning the users larder, it could adjust the list making sure it included only items which the user would need to buy to supplement their current stock.

Considering the concept in complete reverse, the system could also suggest recipes based on what you currently have "in stock." This feature may be more useful than those previously described depending on your lifestyle.

DnL8Tar

-PCM

Seeking Foundations

Thursday, June 27, 2013

How are Publishers Rewarded for Exposing Linked-Data?

Saturday, April 21, 2012

Stability and Fragility of Namespaces

Friday, January 20, 2012

Automatic Shopping List - A Use Case for Linked Data

Blog Archive

Labels

About Me

Followers