Saturday, October 30, 2010

Concerning mod_rewrite as Related to URI Manifestation

While not a tenant of the Semantic-Web concept, certainly an ideal is the substantial manifestation of resource URI's insomuch as a web-page can be "substantial."  As defined by Semantic-Web methodologies (and RDF in general), each Resource is identified by a URI.  These URIs, being identifiers and not true locations as URLs are, are not required to point to anything substantial.  It is, however, helpful if they do, as this exposes much more robust access to the Resource.  


To exemplify this point, let us say that I have identified a particular resource, for the sake of example we'll say this resource is a single comic strip, with the uri http://example.com/comics/1.2/ns/a-funny-comic-1.  As a URI this is completely valid, regardless of whether the identifier points to anything useful when typed into a web-browser.  Let us further say that, while attributing identifiers to comics, you also set yourself to the task of creating an application housing these identified resources and exposing access to them via a webpage.  A user of the webpage is to enter the URI of the resource they wish to see and, if said resource exists within your system, it is returned to them.  Here we have a situation where the resource itself is uniquely identified by a URI and is accessible by an application.  


It would be much more convenient however if, instead of needing to know both the URI of the desired resource and the location and usage of the application via which said resource may be retrieved, the seeker of a resource could simply type the URI into a third application with which the user is readily familiar, such as a web-browser, in order to retrieve the desired resource.  In the particular example which we have been exploring this is somewhat trivial assuming you own the domain "http://example.com" (obviously, in practice, http://example.com would be replaced by a domain you do in fact own).  You could create a directory structure on your server equivalent to the URI requested and place an index file within the final child directory.  Then a request to the URI specified above would retrieve http://example.com/comics/1.2/ns/a-funny-comic-1/index.html.   


This is a reasonable starting place but the pitfalls are numerous and some not so obvious.  There is the clear downside of maintenance, be it manual or automatic.  There is also the less obvious downside of URI support.  Taking this approach forces you to use URLs as the URIs of all of your resources, limiting your naming potential.  To illustrate this, let us say that instead of the URI chosen above, you have identified your comic with the slightly different URI http://example.com/comics/1.2/ns#a-funny-comic-1.  The change is a single character but how a web-browser will handle this request is vastly different than how it would handle the previous request.  As per section 2.2 of RFC1738,


The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it.


You might be tempted to skirt around the issue by percent-encoding such characters, however this is not a viable solution either since, per section 2.2 of RFC398 (coincidence that these notes are both from section 2.2 of their respective documents?), "#" is a reserved general delimiter and, 


URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.


Given this let us return to a previous portion of the example.  What would be ideal is if you could leverage the URIs full scope of delimiters, and instead of the URI typed into the web-browser being treated as a URL, it was passed to a subsequent application which would properly treat it as a URI and return to the user the identified resource.

mod_rewrite

mod_rewrite is an Apache module which allows you to rewrite a requested URL based on a series of conditions and rules.  The rewriting happens on the sever side so the user is not made aware of the change in request (that is to say, the address which the user has entered into the address bar does not change).  This module gives us the ability to "catch" a requested URI before it is fully processed as a URL and rewrite the request into something the server will actually be able to process as the user expects.  mod_rewrite may be implemented at either the server or directory level and while it is always best to implement such modules at the server level, there are cases where it is more appropriate (or necessary in the case of a shared server) to implement the module at a directory level via a .htaccess file.

Returning again to our example, we might write rewrite rules stating "any time a user makes a request to an address starting with http://example.com/1.2, hand that request to our resource retrieval application (introduced earlier in this post) passing the fully requested URI as a GET parameter."  Such a rule might look like this (assuming implementation at the directory level starting at the root directory of example.com):

RewriteEngine on

RewriteRule ^comics/1.2/(.*) request-resource.php?uri=http://example.com/comics/1.2/ns/$1 [L]

Breaking this down, the RewriteRule first specifies the condition under which the rule should fire.  This condition is presented as a PERL regular expression.  The regular expression presented in this example states that this rule will fire whenever it encounters a request "starting with comics/1.2/ and followed by any number of characters.  The RewriteRule then specifies the action which should be taken when the rule is fired.  In this case, the request is rewritten as a request to the code "request-resource.php" and the requested URI is passed as a GET parameter (the $1 within the URI indicates the remaining portion of the request which was denoted by "(.*)" in the first portion of the RewriteRule).  Finally a set of optional flags may be presented to the RewriteRule.  In this example we present [L] meaning that this is the last rule which should fire.

Given this code (and the assumption that the previously discussed application is in-fact accessible via the php file request-resource.php) users may now enter URIs for resources defined within the http://example.com/comic/1.2/ns/ namespace simply by entering the URI into a web browser.  

Note: this technique will not handle all URIs, for example, "urn:oasis:names:specification:docbook:dtd:xml:4.1.2" would not be accessible via this methodology though it still stands as a valid URI.




DnL8Tar
-PCM

Tuesday, September 14, 2010

On Imagination and Experience

I posit the following query as a mental exercise.
Consider, if you will, a person, completely blind from birth, never having seen light of any type, knowing only sightlessness.  If asked, how would you describe color to this person?
I will reserve my own answer for it is irrelevant to the point overall however consider yours.  Indeed it would be a difficult task, but why?  Color is a concept familiar to us.  We see color every day and use color to differentiate one thing from another, or one type from another.  Further we can know of colors, their form, absent of their application to matter.  Color may be used in describing a thing known, or in conceiving a thing to be known.  Why then should so familiar a thing be so difficult to portray.

The answer lies in the blind persons lack of experience of color, or of sight at all.  Indeed color is perceived through the sense of sight and we only have experience of color through this sense.  The blind person however has no experience with this sense, no basis on which to know color.  And try as we may, we can only convey an abstraction of color, but can not impart a knowledge of color itself.

Our knowledge is built of our collective experiences, our perception of those experiences, and our rumination thereof.  We can give to our self more knowledge, or deeper knowledge, by building upon that gained by our experience, but we can not give our self knowledge without experience.  That is to say, we can not give our self knowledge from nothing.

What then is left to say of imagination?  For we do certainly contrive those things which do not exist and know them in or mind, and can give to others information about them so that they may know them as well.  But these things which are the objects of our imagination are not brought out of nothing, but are also brought of our own experience.  As an example, consider a dragon, a creature which, in this world, does not exist, but is brought forth from our imagination and made manifest to others.  Even this creature however is not brought fourth from nothing and we can consider its components and their familiarity to our experience (skin like a reptile, facial structure like a horse, wings like a bird, etc).  The objects of our imagination, however far they are from reality, are never so far that they do not extend from our experience.


DnL8Tar
-PCM

Thursday, September 2, 2010

On Knowledge and Information

http://informationr.net/ir/8-1/paper144.html


A decent paper focusing on the misuse (and overall meaninglessness) of the phrase "Knowledge Management."  This has given me reason for pause in my own use of the phrase and after some consideration I believe it may be best to cease using the phrase altogether.  


In the interest of definition
Knowledge is that which is known to an individual as gained by the individuals experience and as colored by the individuals perception.  Any attempt to "transfer Knowledge" from one individual to another (as the phrase is used popularly) entails first the transformation of Knowledge into Information via some medium (ie, writing, speaking, etc) and second the consumption of the Information and its interpretation resulting in Knowledge.  It is important to note here however that the Knowledge transferred into Information is not and can not be the same Knowledge known to the consumer of the Information.  It can be similar, and certainly if the Information is clear and the consumer is up to the task of consumption, it will be, but as it is imputed to the individual it will not be the same in two individuals.


DnL8Tar
-PCM

Saturday, July 24, 2010

A Brief Consideration of Continuity and it's Implications on Computing

Consider a computer, stripped of its layers of abstraction which expose to users access to its facilities.  At the core of a computer is a binary processor acting on logical operations.  Each such operation is executed at a set increment, controlled by the computers clock speed.  These small operations are combined together through layers of abstraction to produce the robust functionality we have come to expect from a machine such as this.  

On such a system one with proper knowledge could produce an animation of, let's say, a ball rolling across a table.  During the animation the ball traverses the table, rolling from a point we shall call a to a second point which we shall call b.  Once produced the user can watch a playback of the animation and, assuming it was well produced, would not be surprised to see what they would naturally expect to see if a ball were indeed rolling across a table.

Further, consider a true ball in the physical world rolling across a true table from the same point a to the same point b.  Along the path between these points, as many points as you please may be observed  as well, and never can we observe so many points in between that there are not more which can be observed.  This continuity of movement, and of being, is innate to objects in the natural world.  The ball from our example moves continuously along the path from a to b, through the infinity of measurable (measurable either arithmetically or geometrically) points over an amount of time also exhibiting continuity.

In comparison, the animation which is produced consists of a series of frames displayed at a set rate.  Each frame can be said to be a measurement of the position of our virtual ball at a specific point in time.  From this animation select two adjacent frames.  In between these frames could be inserted a third showing the position of the ball at a point in time half way between the point represented in the first frame and the point represented in the  second frame.  Again, selecting this new frame and it's adjacent frame, another new frame could be made to show the point in time representative of the mid point of these two frames.  Such a process could be repeated as many times as we please without reaching a final result.

This is representative of a large limitation of a binary representation of continuity.  While binary operations are able to perform logical processes with accuracy, the representation of continuity is limited to an approximation.  The producer of our ball and table animation is forced to select points in time which will be shown on the frames of the animation and then calculate where the ball would be at that point in time.  While such an approximation is more than suitable for the domain of animation (as the eye can be tricked into seeing continuity) it is less desirable for a true simulation where the interaction of events across a continuous flow of time is to be observed as opposed to the state of events at a discreet point in time.

DnL8Tar
-PCM

Monday, July 12, 2010

RDF as a Database Solution

With "linked data" taking a front seat in my research efforts over the last few years, my use of RDF has increased in state from "I have no idea what RDF is" to "RDF is the backbone of my system architecture."  I have a tenancy to go over board with such things, especially when learning a new technology or technique.  Case and point, Recursive-Living.  The JavaScript framework I created and build Recursive-Living on top of was made solely to deepen my knowledge of AJAX style programming and it's various pit falls.  This being the case, the entire site is delivered via AJAX and JavaScript.  While this is admittedly overkill, such a rigorous approach to the development of conceptual knowledge yielded many fruits, most detailing why such an architecture is ineffectual in practice. 


Two web applications which I have spoken of before, if not in this forum then in Buzz, Transaction-Tracker and Comic-Post, I have built wholly on an RDF backend, again, for the sake of deepening my knowledge of RDF, RDFa, and OWL.  The intention for both of these applications was, in part, the study and extrapolation of the needs of a suitable framework upon which such applications could be build, the very same framework of which I have spoken previously.  Delving deeper into this intention is fodder for a future post which I might write.  During development of the aforementioned applications I have compiled a good deal of information around the conceptual use of such systems.


There are many distinct advantages to using an RDF database, or an RDF abstraction over a database schema, as a data store.  These advantages are considered in comparison to more traditional data store architectures, where tables are designed around the data and relationships between the tables are conceptualized on smooth white boards spanning long walls in stark rooms.  I list below the advantages which I have formalized to date.

  • The tables with which the developer need be concerned are limited to the RDF implementation.  If new types of data are added to an existing application or if existing types of data are augmented or altered in any way, the tables themselves need not change, only the data on the tables.
  • Relationships between resources are concretely defined.  This is as opposed to the more abstract definition of relationships between tables which often occurs based on key columns.  Resources relationships are defined by linking their URI's together via a property or attribute.  Since these definitions are concrete they can be easily traversed without the need for explicit codification of each relationship.  Such an implementation allows for processes such as a recursive resource lookup, where a resource, and all it's related resources, and all resources related to those resources, etc, are found, built, and returned to the user.
  • RDFa output is made trivial if your backend is already in RDF.  While not particularly important at the moment, with more web applications being built around this concept, it will soon become crucial in promoting a site and helping automated agents find and navigate said site.
  • Type checking and existential validation is simplified in this model since each URI is a resource in your data store.  If using RDF in conjunction with RDFS (and OWL, though OWL is not immediately necessary for this purpose), the type of a URI (or class of a URI) can be quickly verified via a SPARQL ASK query.  Again, since the application need only trouble itself with one set of tables, only one such ASK query need be defined.  In comparison, one would need to define a query for each table in a classic data store infrastructure in order to see if a particularly keyed row existed.

Certainly there are disadvantages as well and a topic for future consideration which I have not begun to tread upon is performance comparisons.  These again, I leave as fodder for a future post.  


DnL8Tar
-PCM

Wednesday, May 5, 2010

Powered By [enter name here]

I have been a long time advocate of home-brewing (code, not beer... unless you know what you're doing, thought I suppose the same could be said for both).  I refused, perhaps from a desire to learn, or perhaps sheer arrogance, to use packages which were not included in a programming languages core.  For me, to download a library, which someone else had written, and use it in my own code was closer to blasphemy than my sensibilities would let me tread.  In retrospect the distinction between those things distributed with a language via it's core and that which one would need to download seems paltry, though my adherence to a rejection of the later has led to some interesting thoughts on levels of knowledge.

Once, I heard it said, "You do not need to know how to build a car in order to drive a car."  Statistically this is sound simply based on the observable myriad of car-drivers in comparison to the relatively few auto-repair workers.  The statement may also be applied to domains outside the realm of automotive engineering.  For our purposes, we will apply it to programming.  My prior personal decision to set-aside external packages and, by inference, frameworks, forced me to learn how to build these packages and frameworks.  However, there was a limit to the depths to which I was willing to delve.  I did not, for instance, write my own programming language before using PHP.  Similarly I did not write an operating system before turning on my computer.  Essentially, the distinction between functionality provided by a languages core distribution and add-on functionality established the domain, or level, of knowledge I was endeavoring to internalize through work on my personal and academic projects. 

Often I've said (sometimes to others but usually to myself) that a serious Computer Scientist should not use a built package or framework without having make a comparable package themselves.  The statement is too broad however and does not accurately reflect the concept of knowledge levels.  It would be better stated in the context of a particular domain, such as "someone who is serious about learning how content management systems work should not heavily use a content management system without building their own."  Similarly, one could say "someone who is serious about understanding how a programming language works should write their own programming language."  Creating such a system may seem a waste of time in so much as the system will more than likely be tossed aside at a later point in favor of a more robust and mature system of the same genre.  The experience however is edifying.

One should explore ones purpose before setting forth on an endeavor of this ilk.  Are you attempting to truly understand how a system, or genre of systems, works, or are you attempting to setup something that functions as expected without any grandiose visions of future changes outside the bounds of what a particular framework provides?  If your purpose is the latter, home-brewing is admittedly overkill and, assuming a first attempt at brewing said genre of system, would result in an application of questionable stability.

This line of thinking was largely inspired by my explorations in JQuery, a Java Script library which I have given a wide berth until recently.  While elegant, understanding of the library, and usage thereof, is not trivial, barring a willingness to code on blind faith and the kindness of support forum members.  Statements like "functions are first class citizens," with which the JQuery documentation is rife, hold little weight to someone who has not coded a Javascript closure.  

Thoughts?  

DnL8Tar
-PCM

Sunday, April 25, 2010

TTT Discourse

Rules of a Tic Tac Toe universe
  1. A statement is valid if it is made in order [rule 2] and if the square of the board indicated by the statement is un-owned [rule 3].
  2. A statement made by a player is considered "in order" if the prior valid statement was made by the opposite player or if the player is player 1 and their statement is the first of the game.
  3. When a player makes a valid statement, ownership of the square on the board coinciding with the statement is given to the player.
  4. If a player owns three squares all in the same row, column, or diagonal of the board, that player has "won" the current game. The opposite player has "lost".
  5. If all squares of the board are owned and neither player has won, the game is drawn. Neither player wins and neither player looses.
  6. Once a winner is established or the game is drawn, ownership of all squares on the board is revoked returning each square to a neutral state. This starts a new game or iteration/generation of the universe.


I submit for consideration a reflection on an abstraction of the game of Tic-Tac-Toe.  This abstraction considers the moves of a tic-tac-toe game as a universe of discourse.  The rules of tic-tac-toe are, along the same line, the rules of the universe, governing the reaction of the universe to each "statement" made.

Three distinct entities make up the universe: two players and a board upon which the players interact.  The players I will refer to as player 1 and player 2 when differentiation is necessary.  The board is, to those who are familiar with the game, a standard tic-tac-toe board, consisting of 9 squares laid out in a 3x3 grid.  Players are the only acting entities.  Their interaction is performed via the board using the vocabulary set forth by the universe.  For purposes of the abstraction the specific semantics of the vocabulary are inconsequential so long as a player can formulate any valid tic-tac-toe move via use of the vocabulary.  The following set is one such vocabulary: { (-1,1), (0,1), (1,1), (-1,0), (0,0), (1,0), (-1,-1), (0,-1), (1,-1) } where each element in the set represents a square on the board.  Using a standard Cartesian coordinate system, the element (-1,-1) represents a move to the bottom left square and similarly the element (1,1) represents a move to the top right square.

At any point a player may make a statement by selecting a single element from the vocabulary.  The universe then responds to the statement.  As noted the players may make any statement at any time.  The notion of "turn" or "valid move" is not instilled in the player but in the rules of the universe.  This being the case it is possible for the players to make statements which do not change the state of the board based on the rules of the universe.  For instance, assume player 1 makes the statement (1,1) and player 2 subsequently makes the same statement.  The second statement is, in a sense, rejected by the universe as the state of the board is not changed by the statement. 

Such statements I will call meaningless.  Thus the meaningfulness of a statement is determined by whether the statement results in a change of the state of the board.  If it does, then the statement is considered meaningful, otherwise it is considered meaningless.  Of course, the rules of the universe determine whether a given statement changes the state of the board.

Given this abstraction one can conceive of a learning algorithm applied to the players such that the players learn from each statement made.  In order to learn the players must be directed toward some goal.  If they are not then the algorithm has no bearing with which to process each move.  One obvious goal of the game would be to win.  A similar and secondary goal would be to not lose.  A third rule of some usefulness would be to make only meaningful statements.

Given these goals, coupled with well defined rules over the universe, the players could conceivably "learn" the rules of tic-tac-toe based on their observations  of the reactions of the universe to their statements and the other player's statements.  Further, the players may develop stratagem coinciding with their goals of winning and not loosing (though this later item is of lesser interest to me at the moment).


DnL8Tar
-PCM

Thursday, April 1, 2010

Resource Driven Development - A Definition

As a thirty second Google (Topeka) search for the phrase "Resource Driven Development" did not turn up anything relevant to my purpose, and as I never allocate much more than thirty seconds to a Google search, I am going on blog-record with what I would define the phrase to mean.

Resource Driven Development refers to development done within a framework where the behavior of a system is based on resources defined within a system-specific ontology.  

This model of development is precisely what my framework hopes to deliver, abstracting presentation logic into resources and data population logic into the ontology.

DnL8Tar
-PCM