Monday, July 12, 2010

RDF as a Database Solution

With "linked data" taking a front seat in my research efforts over the last few years, my use of RDF has increased in state from "I have no idea what RDF is" to "RDF is the backbone of my system architecture."  I have a tenancy to go over board with such things, especially when learning a new technology or technique.  Case and point, Recursive-Living.  The JavaScript framework I created and build Recursive-Living on top of was made solely to deepen my knowledge of AJAX style programming and it's various pit falls.  This being the case, the entire site is delivered via AJAX and JavaScript.  While this is admittedly overkill, such a rigorous approach to the development of conceptual knowledge yielded many fruits, most detailing why such an architecture is ineffectual in practice. 


Two web applications which I have spoken of before, if not in this forum then in Buzz, Transaction-Tracker and Comic-Post, I have built wholly on an RDF backend, again, for the sake of deepening my knowledge of RDF, RDFa, and OWL.  The intention for both of these applications was, in part, the study and extrapolation of the needs of a suitable framework upon which such applications could be build, the very same framework of which I have spoken previously.  Delving deeper into this intention is fodder for a future post which I might write.  During development of the aforementioned applications I have compiled a good deal of information around the conceptual use of such systems.


There are many distinct advantages to using an RDF database, or an RDF abstraction over a database schema, as a data store.  These advantages are considered in comparison to more traditional data store architectures, where tables are designed around the data and relationships between the tables are conceptualized on smooth white boards spanning long walls in stark rooms.  I list below the advantages which I have formalized to date.

  • The tables with which the developer need be concerned are limited to the RDF implementation.  If new types of data are added to an existing application or if existing types of data are augmented or altered in any way, the tables themselves need not change, only the data on the tables.
  • Relationships between resources are concretely defined.  This is as opposed to the more abstract definition of relationships between tables which often occurs based on key columns.  Resources relationships are defined by linking their URI's together via a property or attribute.  Since these definitions are concrete they can be easily traversed without the need for explicit codification of each relationship.  Such an implementation allows for processes such as a recursive resource lookup, where a resource, and all it's related resources, and all resources related to those resources, etc, are found, built, and returned to the user.
  • RDFa output is made trivial if your backend is already in RDF.  While not particularly important at the moment, with more web applications being built around this concept, it will soon become crucial in promoting a site and helping automated agents find and navigate said site.
  • Type checking and existential validation is simplified in this model since each URI is a resource in your data store.  If using RDF in conjunction with RDFS (and OWL, though OWL is not immediately necessary for this purpose), the type of a URI (or class of a URI) can be quickly verified via a SPARQL ASK query.  Again, since the application need only trouble itself with one set of tables, only one such ASK query need be defined.  In comparison, one would need to define a query for each table in a classic data store infrastructure in order to see if a particularly keyed row existed.

Certainly there are disadvantages as well and a topic for future consideration which I have not begun to tread upon is performance comparisons.  These again, I leave as fodder for a future post.  


DnL8Tar
-PCM

No comments:

Post a Comment