Sunday, May 15, 2011

On Data Not Accessible in an Expected Tuple Format - A Continuation of the Magic Hat Discussion

In my prior post I brought up the concept of using a "Magic Hat" mechanism to obtain resources regardless of the "physical" location of the resources.  An assumption which this mechanism makes is that those resources which are to be retrieved can be retrieved in a standardized format, in this case, data tuples.  There are however many systems housing interesting and useful data which do not serve this data in such a format directly.  

What is to be done then in the situation where access to data contained in such a  system through the Magic Hat is desirable is largely based  on what mechanisms the system affords for acquiring its data in general.  If the system provides no API via which to pull the data but does expose the ability to edit the template code with which the data is rendered, RDFa could be added to the template code in order to add semantics to the rendered data, making it more accessible by a tuple store.  This approach is quite limited however as, while it facilitates the pulling of a single resource through the magic hat, it does little to ease the asking of a question about all of the data contained in the system.  To elaborate, consider the request "provide all data created by [A] which concerns the topic [B] and was written after the date [C]."  Such a request would be hard to satisfy in such a system as we are largely limited to considering a single resource at a time.
The provision of an API has become quite standard in online systems however and any newer system or application which does not expose one is most likely a) in beta with an API on the way, or b) not worth using.  How requests are to be made to the API and the format of data returned by the API is left to the designer of the API and is not guaranteed to match the request and data format which the Magic Hat expects.  As such, some coercion on both the request and response end is necessary.  I've taken this approach with the retrieval of Blogger data and will speak further on it in my next post.  For now I shall suffice it to say that such a mechanism allows for much more robust requests but is limited by the complexity of coercion necessary to make more and more complex requests.

An approach which I have seen / heard taken by some applications is to retrieve all data of interest from a system and house it locally in a tuple format.  While this has the upside of allowing for robust data requests and combinations using expected request and response formats, it does have the significant overhead of maintaining consistency between the local system and the remote system which is still the owner of the data.