I've learnt most of this primarily by reading through Google Calendar's JavaScript Client Library code. I've also picked up clues from lots of other material around the Internet. There are also some minor improvements I've added.
So, here's the use-case: You probably already have a REST (or similar) API for server to server communication. Having a JavaScript API would be a great idea (after all, JavaScript is the most deployed programming language available on almost every platform in the form of a browser runtime). This poses many problems. Most significantly, browsers are very strict about the same origin policy. You are aware of certain hacks out there to use JavaScript across domains, but at best they give you read access or rely on browser plugins. You can do writes using query string parameters, but you know that that's just plain wrong.
Whatever be your solution to this problem, you want to play within the browser's security model, and not depend on any browser-specific security loopholes. Another very important thing you want to achieve is to ensure that your API users do not have to do any setup at their end – be it in terms of installing a server-side proxy, or jumping through hoops of any other kind. If a setup is unavoidable, it should be very simple to do, requiring little or no effort. You might also add additional requirements of user authentication (after all, you are letting them do writes), preferably at your domain – OpenID style – and have access to your cookies even when your application is being used from another domain entirely.
People might point out solutions like CrossSafe and Subspace. From what I gather of both these ideas, their goal is to secure your site from any third-party script snippet. That is not a necessary goal in our case. Also, both these techniques rely very heavily on some form of setup at the API consumer's end (which aren't very easy to do either – may even be impossible for say shared hosting environments), which we don't want to have. The technique I'm suggesting here is very similar in it's operation to both Subspace and CrossSafe, but eliminates (or reduces drastically) the need for any setup at the user's end.
The JSONRequest specification also needs mentioning. Unfortunately, the spec itself is rather new. Needless to say, there's no native working implementation of it as of this writing. CrossSafe comes rather close as an implementation, but it's not complete. (To make matters worse, completing the implementation will require even more server-side co-operation at the API consumer's end.) That said, I don't know why Doug Crockford has decided to keep PUT and DELETE methods out of the spec, among others. I guess it might be for simplicity. However, I think in today's RESTful days not having those methods supported is not a good idea. If Crockford's spec ever becomes the standard, I will be a little unhappy that the additional methods are not supported. The API creation technique I'm mentioning here supports all the HTTP methods that the browser supports for HTML forms (which is only GET and POST for all major browsers to the best of my knowledge), but at least it's a browser limitation – not one imposed by this technique.1
So, let's get started. Here's what you require to get cross-domain read write JavaScript APIs to work.
The "setup" required at the client's end is that he should have at least one static cacheable resource embedded in the page where he's consuming the API, which is loaded from the same domain as his page. This could be in the form of a static CSS file, or an image. If the page doesn't have either, it will be required to insert one – maybe in the form of a 1px image hidden away by using inline style attributes. This is usually not too much to ask for, considering that pages are either made up of spacer GIFs or CSS documents, usually loaded from within the same domain. The static resources I mentioned could even be from a different sub-domain within the same domain, but it might complicate scripts slightly to have it set up that way. If this setup is not possible at all (oh, come on!), you could still find a work around2, but I think that this is the easiest way to get things up and running.
You will need to do some setup at your end, if you are the creator of the API. In particular, you will need to setup a "proxy" page that intercepts the requests from the JavaScript client API, conditions the data, and passes it along to the REST API. This proxy page also reads the response from the REST API, conditions the data to suit the client, and flushes it down to the JavaScript.
Now, let's go over the process of actually orchestrating the communication.
The API client library is included on the page by means of a script tag pointing to your domain (your domain being the host of the client library). This is similar to including the Google Maps API on the page.
Once included, the script scans the page for the static resource mentioned above. This is done by walking the DOM looking for
link
orimg
tags, and checking the value of thehref
/src
attribute to ensure it lies within the same domain as the calling page. The URL of this resource is stored for use later. At this point, if required, the client library can signal to the developer that it is ready for communication with the server. If the resource is not found, the client-library should throw an error and terminate.When a request requires to be made, the client library takes the request parameters and prepares the markup for a form. This form can have any
method
attribute value, and should have it'saction
attribute set to the proxy page on your domain. The parameters to be sent to the server should be enumerated as hidden fields within the form. The client library also specifies the resource (in a RESTful sense) that needs to be acted upon. Also, the name of the static resource we had hunted down earlier is passed on to the server. This form is not appended to the document yet. This markup is then wrapped into<html>
and<body>
tags. The body tag should haveonload=”document.forms[0].submit();”
.The client library then creates a 0px x 0px iframe, without setting the
src
attribute, and appends it to the page's DOM. This makes the browser think that the iframe exists in the same domain as the calling page. Then, by using the iframe document object'sopen()
,write()
andclose()
methods the markup created in the previous step is dumped into the iframe. As soon as the close method is called, the form gets submitted to the proxy page on your domain because of theonload
in the body tag. Also note that this gives the server access to any cookies it might have created from within it's domain, letting you do things like authentication. In this way one part of the communication is complete, and the data has been sent to the server across domains. However, the iframe'sdocument.domain
has now switched to point to your domain. The browser's security model now prevents any script access to most parts of the iframe.The proxy page sitting on your server now queries your REST API – basically doing it's thing – and gets the response. Response in hand, the proxy is now ready to flush the response to the client.
If the response is rather large in size, as might be the case with a huge GET call for instance, the proxy breaks it up into chunks of not more than say 1.5k characters3.
The proxy is now ready to flush the response. The response consists of iframes – one iframe for each of these 1.5k chunks. The iframe's
src
attribute is set to the static resource we had discovered earlier. It is for exactly this purpose that we had hunted the resource down and passed on the URL to the server. At the end of each of these URLs, the proxy appends one of the chunks of the response, after a “#” symbol, so that it works as a URL fragment identifier. Also, the iframe tags are each given aname
attribute, so that the client script can locate them.Meanwhile, the client-side code is where it had left off at the end of step 4 above. The script then starts polling the iframe it created to check for the existance of child iframes. This check of iframes will need to based on the iframe name the server will be sending down. It will look something like this:
window.frames[0].frames[“grandChildIframeName”]
. Since the static resource we have loaded into the grandchild iframe is of the same domain as the parent page, the parent page now has access to it, even the intermediate iframe is of a different domain.The client script now reads the
src
attributes of the iframe, isolates the URL fragments (iframe.location.hash
), and reassembles the data. This data would typically be some JSON string. This JSON can then be eval'd and passed on to a success handler. This completes the down-stream communication from the server to the client, again across domains.With the entire process complete, the client-library can now perform some cleanup actions, and destroy the child iframe it created. Though leaving the iframe around is not a problem, it is not necessary and simply adds to junk lying around in the DOM. It's best to get rid of it.
This was simply the outline of the process, and there are several additions/improvements that can be done. For example, better control on reading/writing HTTP headers, having a reliable readyState value, error handling in case of HTTP errors (4xx, 5xx errors), handling of HTTP timeouts, etc. are all desired. However, this should be enough to get you started.
If you haven't already realized the significance of this, we should now be able to build much more sophisticated mashups that do much more than the current breed of mashups on the web. It opens up the floodgates to entirely new kind of applications on the Internet – applications we haven't seen as yet.
Let's enable better mashups! Nothing should now stop you from being able to give open secure access to your site's functionality in JavaScript.
- A little creative thinking will let you circumvent the problem of browser-restricted HTTP methods when querying your REST API. Send an extra parameter to the proxy page when you are creating the form to specify which method to use. Let the proxy page then hit your REST API with the specified method.
- The work around to not having any same-domain static resource would be to ask the API user to have a blank HTML page on his domain, the URL for which should be manually provided by the user to the client script. I don't think this is a great idea since it is an extra step that the API user has to do. However this can be used for one of those if-all-else-fails situations.
- This 1.5k restriction is to overcome a URL length restriction in Internet Explorer, though most other browsers allow much more. Note, HTTP itself does not impose any restriction on the URL length.
19 comments:
You will probably want to check out
how to cross domain javascript
for a clean method of read/write. There's a couple of caveats, but the solution solves many cases.
Thanks, Alex. I actually did come across your site when researching about how to achieve cross-domain Ajax cleanly. The only reason I gave it a pass (and Subspace and CrossSafe), was because of the setup required for the API user.
I guess getting cross-domain Ajax to work has been discussed at length here and elsewhere, and there are various workable solutions at hand. What this solution's USP is that there's absolutely no setup required by API users to start using the API. No DNS (re)mapping, no "put this file on your server", no "use XYZ toolkit to write your JavaScript", nothing. As a bonus, there are no additional resources requested in the form of bridges of any kind.
Hi,
Thanks for the great tutorial. I have run into a problem though. I am creating a new object in my api that has a key. Thus, the constructor takes the key as an argument, connects to the server, retrieves a bunch of information according to the key and stores it in various object variables.
function myObject(key) {
//get info from server (cross domain)
this.variable = info; //info is obtained from server
}
Now when you say that "polling the iframe it created to check for the existance of child iframes", I couldn't find any other method other than to use the setInterval function to do this.
function myObject(key) {
//get info from server (cross domain)
this.func = function () {
//connect to server via iframe based proxy and get info
clearInterval(id);
this.variable = info; //info is obtained from server
}
id=setInterval(this.func,10);
}
But now, the constructor ends before the object variables are set!! Is there any way to avoid this? Is there some way to stall the constructor till all object variables are set?
Hi Ankit,
This seems like a fairly basic JavaScript question. AFAIK, there's no easy way to "stall" execution in JavaScript (though I'm fairly sure there can be complex ways of doing it). In any case, you should not need to - it's probably a bad design idea anyway.
Instead, you should use the power of closures here. Set all your variables and methods inside the setTimeout if you want to wait for computing results inside the setTimeout first.
Since the setTimeout function call is inside the closure, your variable values are "trapped", even if the execution of the constructor is complete, giving you access to all the variables, even letting you set them at a later time.
Great blog post!
How do you solve the situation when: "the static resources I mentioned could even be from a different sub-domain within the same domain"?
Anonymous,
So, if the static resource is from a different subdomain within the same domain, you might face a problem in step 8 above. (I say "might" because I haven't tried this as yet. So, obviously, the rest of this comment is just a theory - but one I think should work perfectly.)
Now, when polling for the iframe, you won't directly have access to the iframe, since it's of a different document.domain (subdomain included). Moreover, the document.domain cannot be changed. It can, however, be made 'shorter' in some ways. Which means that images.example.com and www.example.com can both have their document.domain shortened to example.com.
Once that's done, the parent window and the grandchild iframe can communicate with each other seamlessly.
Thanks for the answer!
I assume that you without problem can shorten document.domain in the parent window. But if you for example set the iframe src to a static resource image located at images.example.com, is it in some way possible to shorten the document.domain also for the grandchild iframe?
I looks like the javascript in the parent window (at example.com) doesn't have permission to shorten the document.domain in the grandchild iframe (at images.example.com).
Is it perhaps possible to set the document.domain before the iframe src is loaded?
I'm not really a javascript programmer, but I'm always curious when it comes to corner cases like this... :)
Anonymous,
It sounds like you might be right about the domain switching thing in the context of sub domains. I will have to try it out to be sure. I shall get back to you about it soon.
In step 2 I am assuming that they are verifying the domain of the parent page so the can match it up against an API key on their end to prevent evil sites from silently accessing people's calendars. You say that The URL of this resource is stored for use later. How is it stored such that they prevent the parent page from substituting in a "trusted" domain that may already be authenticated? (a version of a CSRF attack)
My bad, I missed the part in step 7 where you explain how the static resource is used.
I still have a general question about how this approach prevents sites that trying to use your API to gain access to your authenticated user's data.
Hi Eric.
The idea of this technique is to _give_ access to other sites that want authenticated user data.
The best (and probably sufficient) way to ensure that this data doesn't fall into unauthorized hands is to have the user opt-in to give the data to any site he wishes. This can be done using simple mechanisms - like what Google Calendar API currently does. It's kind of like an OpenID authentication of sorts, having a similar user experience.
True. But if your widget/mashup/API is being used on lots of sites it could be inconvenient for the user to have to grant access to each site. You would have to balance this against the sensitivity of the data being delivered by the API or the potential destructiveness/spamminess of the API on the server side.
But if you could securely verify that the domain of the hosting page matched the domain of the API key, and the API keys were granted in a secure fashion (ie, not just anyone can get one), then you could grant access to the site based on their API key.
I'm thinking about Facebook Beacon as an example here.
True. I agree completely.
Now only if we could find a decent way to decide who should get an API key.
Yes. In the case of Beacon it's only Beacon partners that get a key and that's a whole contractual process - not exactly scalable/viral but it doesn't have to be for them. Another route is to charge some small amount for a key and ask for a credit card. Or freely give keys that access your test environment but charge for production keys.
Thanks for your reply.
Hello, Rakesh, very nice post!
I'm trying to implement your technique and I'm almost there. I just can't figure out one thing though: the server builds as many iframes as needed to fragment the data. How to make sure that all the response markup has finished loading so that I can access the iframes from my js API?
Because right after I call the close method of the iframe to send the request (as you describe in step 4) the next line gets executed (the one that tries to retrieve the data from the response iframes), and as the response hasn't yet arrived there is no data to collect, no iframes to be found...
I have searched a lot (you wouldn't believe) and couldn't figure that out. I've read about closures but still can't see how it fits into this scenario.
If you could provide a sample implementation of your technique, it would be most helpful.
Thanks in advance!
levidad,
I guess I should've explained that better. The only way I've been able to do this is to look for an identifier in the data that signals that the data is complete. For example, you could terminate the data with something like "||end||" or such to indicate that there's nothing more to be flushed down.
The client meanwhile, polls the iframe, assembles the data, and checks if the data ends with ||end||. If it doesn't, there's more to be read. If it does, the ||end|| can be stripped, and the rest of the data can be used.
It's nice to know you are using this technique. Would be great if you can write about your experience using it. I'll look forward to it. Would also be great if you can show what you are using it for.
Hi, Rakesh, thanks for the help! With your information I was able to make it work and finish the core functionality.
You asked where and how I was going to use that. It is that my company is going to build kind of a JavaScript API for a client that does not want its users to have to download the javascript file. It must be as simple as including a reference to the library in the page a calling a function for displaying some ads in a div of their choice. I thought I would help and research some cross-domain techniques and found yours interesting compared to others I've seen.
The API is not complete yet, it lacks some tuning, but that's easy work now. When it's done I will post an article in a blog I'm going to start, publishing my source code so that others can improve and extend it, and when that happens I'll remember to point this page of yours as the source of the technique, and I can also post a comment here with the link to my source code.
Thanks again and take care!
levidad, awesome. Will be looking forward to your post. Do let me know.
You can actually learn A LOT developing for a CMS. Thanks
Post a Comment