Monday, April 25, 2011

A Magic Hat

I have not been to many magic shows in my life so my perspective here may not be ideal.  Looking back on the few I have attended however, I do not recall ever seeing a magician pull anything out of their iconic top hat, nor even wearing said hat.  The pervasiveness of this image in popular media however assures me that the concept is not lost on society.

Consider the magicians hat.  A simple, undecorated, top hat, from which the trained user is able to pull any number of objects.  With a wave and a wink the magician is able to make materialize whatever object suits their fancy, be it a bouquet of flowers, a deck of playing cards, or the infamous bunny rabbit.  If one breaks themselves from the mundane reality of the trick, one is free to consider where these objects must have originated from.  The bunny rabbit for example surely had been hopping along a field of grass and clover prior to its arrival at the hat.  At the will of the magician it was plucked from its prior location in time and space, traveling at the speed of thought across the gap in these dimensions between the field and the hat, arriving finally in the magicians trained hand, ideally no worse for wear. 

The consumer of the trick, namely the audience, need not concern themselves with where the object appearing from the hat originated, they need only concern themselves with the fact that the objects, which certainly were not in the hat to begin with, have been made manifest in its confines and have been produced to the satisfaction of all onlookers.  

An application of this concept to the procurement of resources by an application is one of the projects keeping me occupied at the moment.  The "Magic Hat" is what I call a "Resource Container."  The principal of the container is the same as the magic hat insomuch as the user of the container is able to bring fourth from the container a resource required, not needing to concern themselves with where that resource is coming from.  The user may require a rabbit and a deck of playing cards, and while the rabbit may be pulled from a grassy field and the playing cards from a drug store shelve the users only concern is that they started off not having these two resources and they now have both of them.

A Resource Container discovers sources of resources via some discovery mechanism (the simplest of which is user configuration however more robust auto-discovery mechanisms are conceptually possible).  When a particular resource is requested of the container, if the container does not have a cached version, or if the cached version is stale, it determines, based on its known sources of resources and its knowledge of those sources, which sources to probe for information about the requested resource.  The container then produces this information in a standardized format to the satisfaction of the requester who need not know the source of the information (so long as the source is trustworthy by the users definition of trust).

To make this description more concrete, consider a resume display application.  The user may define a layout in which they want their resume to appear, indicating where they want their work history to appear, their education history, their personal information, etc.  The user may then attach this application to a Resource Container in order to acquire the actual information necessary to populate the various sections.  Imagine further an ideal online environment in which every place you have worked hosts some open API via which you are able to query your work history so long as you have proper authority to do so.  Similarly imagine that educational institutions have such a system as well.  When you request all work history tied to yourself, the Resource Container would be tasked with finding all known sources which would have resources of type "Work History" and would further be tasked with acquiring these resources.  You as a user and your front end application would not care that your work history is coming from three, four, five, etc different sources.  Using the same mechanisms to pull education history from your various schools and personal information from, perhaps, some government site, or from another service such as linkedin, the Resource Container would be able to provide to the front end application, all necessary data acquired from, ideally, the canonical sources of said data, abstracting from the front end the mundane details of the data sources.

Immediate Difficulties 

In implementing the aforementioned system I've ran into a couple immediate difficulties which I'm working through now and which are worth documenting.

Requested Resources vs Containing Resource: When requesting a particular resource, the subject of the resource being requested is often not enough information to determine where to acquire data concerning the resource.  Going back to the prior example, consider a resume application where any number of users are able to format their resumes online, connecting the front end to sources of resource information applicable to them (ie, their particular employers and educators).  If the system were requested to look up all work history data tied to a particular user, one could imagine the system being able to determine based on pre-set user configuration, which sources to look at for this information.  However, if the system were asked to pull up data for a particular work history resource, without being told much more than the URI of that resource, the system would have a hard time determining where to pull this data from.  In such an application it would quickly become impractical to ask every known employer data source whether they had information about the URI.  As such, a scheme must be arrived at to give the Resource Container more information about the particular request than just the URI.  

Performance Concerns: While I have not attempted any type of benchmarking with what's been coded thus far, I presume that performance is not going to be amazing in situations where data is needed from multiple sources at one time.  If one has a home page which is pulling data from 15 different sources, one can imagine that this may not be quick, and the speed of the population of this page will largely be dependent on the speed of the sources providing the data.  In such situations it is probably prudent to populate large portions of the page asynchronously which, while not adding much, if any, complexity to the Resource Container itself, adds to the complexity of the implementing front ends.