Thursday, June 27, 2013

How are Publishers Rewarded for Exposing Linked-Data?

Disclaimer: This document poses the question asked in the title without offering anything which can be reasonably called an answer.  It is my hope that members of the relevant communities who know more than I do on the topic can provide some insight into potential answers.

Utilization of linked-data by applications is predicated upon the existence of accessible linked-data.  In much the same way that publishers were told they could put their content online in formats like HTML, we now tell them they can expose their information as linked-data using formats like RDFa and JSON-LD.  However, where the former had the fairly obvious benefit of making the publisher’s content visible to human consumers, the later seems to lack any immediately realizable end.

Lofty visions of automated agents and reasoning engines which would operate over the ever expanding web of linked-data have been touted since around the time that the phrase “Semantic Web” was being coined.  It was indicated that, by exposing their information as linked-data, publishers could “hook-in” to these agents, making themselves visible to their users.  Such agents however have yet to materialize and seem to be touted less and less from my observation, which I feel is unfortunate, but that’s an entirely different post.

Many “Semantic Web Applications” which I have seen, either in writings online or at conferences, are in-fact Semantically-Enabled applications which use some Semantic technologies, some of which have been born of the forge of the Semantic Web, in combination with other technologies (AI, NPL, etc), in order to build up a triple store and reason over it or operate upon it.  These have been interesting applications but they are not Semantic Web applications as they go well beyond the boundary of utilizing exposed linked-data.  Further, they are often operating in specialized domains over semantically enabled datasets and not over arbitrarily exposed information on publisher’s sites.  As such, in and of themselves, such applications are providing no reward to the average content publisher.

Search Engines have taken up the torch to some extent in the form of Schema.org.  This gives publishers a reason to expose their data as well as a concrete vocabulary to use in its exposition, but it positions the “Semantic Web” to be re-branded as “SEO 2.0,” which in my mind would be a loss of the initial vision.  It is, however, from what I can find, the only realizable end of publishing linked-data along with your content.

When talking about / attempting to explain the Semantic Web to friends, family, and co-workers, I often employ the Chicken or the Egg metaphor in accounting for why this concept has not yet become ubiquitous (though I am sure some would disagree with the statement that it is not ubiquitous).  If we take the Chicken to be the accessible data and the Egg to be applications, we may be getting closer to the Chicken, with the help of efforts such as Schema.org to an extent, which would give the Egg a raison d’ĂȘtre.  In my experience the lack of a reasonable Egg to point to greatly complicates the task of encouraging publishers to expose their information as linked-data.

A final note: I would be very happy to be corrected on my observations and to be told that the Egg already exists (ideally by being pointed to such an Egg).