blog.rakeshpai.me: November 2007

I've learnt most of this primarily by reading through Google Calendar's JavaScript Client Library code. I've also picked up clues from lots of other material around the Internet. There are also some minor improvements I've added.

So, here's the use-case: You probably already have a REST (or similar) API for server to server communication. Having a JavaScript API would be a great idea (after all, JavaScript is the most deployed programming language available on almost every platform in the form of a browser runtime). This poses many problems. Most significantly, browsers are very strict about the same origin policy. You are aware of certain hacks out there to use JavaScript across domains, but at best they give you read access or rely on browser plugins. You can do writes using query string parameters, but you know that that's just plain wrong.

Whatever be your solution to this problem, you want to play within the browser's security model, and not depend on any browser-specific security loopholes. Another very important thing you want to achieve is to ensure that your API users do not have to do any setup at their end – be it in terms of installing a server-side proxy, or jumping through hoops of any other kind. If a setup is unavoidable, it should be very simple to do, requiring little or no effort. You might also add additional requirements of user authentication (after all, you are letting them do writes), preferably at your domain – OpenID style – and have access to your cookies even when your application is being used from another domain entirely.

People might point out solutions like CrossSafe and Subspace. From what I gather of both these ideas, their goal is to secure your site from any third-party script snippet. That is not a necessary goal in our case. Also, both these techniques rely very heavily on some form of setup at the API consumer's end (which aren't very easy to do either – may even be impossible for say shared hosting environments), which we don't want to have. The technique I'm suggesting here is very similar in it's operation to both Subspace and CrossSafe, but eliminates (or reduces drastically) the need for any setup at the user's end.

The JSONRequest specification also needs mentioning. Unfortunately, the spec itself is rather new. Needless to say, there's no native working implementation of it as of this writing. CrossSafe comes rather close as an implementation, but it's not complete. (To make matters worse, completing the implementation will require even more server-side co-operation at the API consumer's end.) That said, I don't know why Doug Crockford has decided to keep PUT and DELETE methods out of the spec, among others. I guess it might be for simplicity. However, I think in today's RESTful days not having those methods supported is not a good idea. If Crockford's spec ever becomes the standard, I will be a little unhappy that the additional methods are not supported. The API creation technique I'm mentioning here supports all the HTTP methods that the browser supports for HTML forms (which is only GET and POST for all major browsers to the best of my knowledge), but at least it's a browser limitation – not one imposed by this technique.¹

So, let's get started. Here's what you require to get cross-domain read write JavaScript APIs to work.

The "setup" required at the client's end is that he should have at least one static cacheable resource embedded in the page where he's consuming the API, which is loaded from the same domain as his page. This could be in the form of a static CSS file, or an image. If the page doesn't have either, it will be required to insert one – maybe in the form of a 1px image hidden away by using inline style attributes. This is usually not too much to ask for, considering that pages are either made up of spacer GIFs or CSS documents, usually loaded from within the same domain. The static resources I mentioned could even be from a different sub-domain within the same domain, but it might complicate scripts slightly to have it set up that way. If this setup is not possible at all (oh, come on!), you could still find a work around², but I think that this is the easiest way to get things up and running.
You will need to do some setup at your end, if you are the creator of the API. In particular, you will need to setup a "proxy" page that intercepts the requests from the JavaScript client API, conditions the data, and passes it along to the REST API. This proxy page also reads the response from the REST API, conditions the data to suit the client, and flushes it down to the JavaScript.

Now, let's go over the process of actually orchestrating the communication.

The API client library is included on the page by means of a script tag pointing to your domain (your domain being the host of the client library). This is similar to including the Google Maps API on the page.
Once included, the script scans the page for the static resource mentioned above. This is done by walking the DOM looking for link or img tags, and checking the value of the href/src attribute to ensure it lies within the same domain as the calling page. The URL of this resource is stored for use later. At this point, if required, the client library can signal to the developer that it is ready for communication with the server. If the resource is not found, the client-library should throw an error and terminate.
When a request requires to be made, the client library takes the request parameters and prepares the markup for a form. This form can have any method attribute value, and should have it's action attribute set to the proxy page on your domain. The parameters to be sent to the server should be enumerated as hidden fields within the form. The client library also specifies the resource (in a RESTful sense) that needs to be acted upon. Also, the name of the static resource we had hunted down earlier is passed on to the server. This form is not appended to the document yet. This markup is then wrapped into <html> and <body> tags. The body tag should have onload=”document.forms[0].submit();”.
The client library then creates a 0px x 0px iframe, without setting the src attribute, and appends it to the page's DOM. This makes the browser think that the iframe exists in the same domain as the calling page. Then, by using the iframe document object's open(), write() and close() methods the markup created in the previous step is dumped into the iframe. As soon as the close method is called, the form gets submitted to the proxy page on your domain because of the onload in the body tag. Also note that this gives the server access to any cookies it might have created from within it's domain, letting you do things like authentication. In this way one part of the communication is complete, and the data has been sent to the server across domains. However, the iframe's document.domain has now switched to point to your domain. The browser's security model now prevents any script access to most parts of the iframe.
The proxy page sitting on your server now queries your REST API – basically doing it's thing – and gets the response. Response in hand, the proxy is now ready to flush the response to the client.
If the response is rather large in size, as might be the case with a huge GET call for instance, the proxy breaks it up into chunks of not more than say 1.5k characters³.
The proxy is now ready to flush the response. The response consists of iframes – one iframe for each of these 1.5k chunks. The iframe's src attribute is set to the static resource we had discovered earlier. It is for exactly this purpose that we had hunted the resource down and passed on the URL to the server. At the end of each of these URLs, the proxy appends one of the chunks of the response, after a “#” symbol, so that it works as a URL fragment identifier. Also, the iframe tags are each given a name attribute, so that the client script can locate them.
Meanwhile, the client-side code is where it had left off at the end of step 4 above. The script then starts polling the iframe it created to check for the existance of child iframes. This check of iframes will need to based on the iframe name the server will be sending down. It will look something like this: window.frames[0].frames[“grandChildIframeName”]. Since the static resource we have loaded into the grandchild iframe is of the same domain as the parent page, the parent page now has access to it, even the intermediate iframe is of a different domain.
The client script now reads the src attributes of the iframe, isolates the URL fragments (iframe.location.hash), and reassembles the data. This data would typically be some JSON string. This JSON can then be eval'd and passed on to a success handler. This completes the down-stream communication from the server to the client, again across domains.
With the entire process complete, the client-library can now perform some cleanup actions, and destroy the child iframe it created. Though leaving the iframe around is not a problem, it is not necessary and simply adds to junk lying around in the DOM. It's best to get rid of it.

This was simply the outline of the process, and there are several additions/improvements that can be done. For example, better control on reading/writing HTTP headers, having a reliable readyState value, error handling in case of HTTP errors (4xx, 5xx errors), handling of HTTP timeouts, etc. are all desired. However, this should be enough to get you started.

If you haven't already realized the significance of this, we should now be able to build much more sophisticated mashups that do much more than the current breed of mashups on the web. It opens up the floodgates to entirely new kind of applications on the Internet – applications we haven't seen as yet.

Let's enable better mashups! Nothing should now stop you from being able to give open secure access to your site's functionality in JavaScript.

A little creative thinking will let you circumvent the problem of browser-restricted HTTP methods when querying your REST API. Send an extra parameter to the proxy page when you are creating the form to specify which method to use. Let the proxy page then hit your REST API with the specified method.
The work around to not having any same-domain static resource would be to ask the API user to have a blank HTML page on his domain, the URL for which should be manually provided by the user to the client script. I don't think this is a great idea since it is an extra step that the API user has to do. However this can be used for one of those if-all-else-fails situations.
This 1.5k restriction is to overcome a URL length restriction in Internet Explorer, though most other browsers allow much more. Note, HTTP itself does not impose any restriction on the URL length.