I was glancing through the list of fixed bugs in the Apex 3.1.2 release and noticed that one of my long time ‘annoyances’ has been fixed.
Many people have had problems getting their Apex applications indexed by Google and other search engines. One of the reasons for this was that Google (and other search engines) would often index the page using a different session id each time, since Google (and others…ok, let’s just say Google from here on in?) does not ‘understand’ the format of Apex style URL’s and would therefore consider each URL unique.
In short, you could end up with the same page in your application indexed multiple times, with a different session id in each and therefore Google would not treat it as the same page.
The Apex team introduced the idea of Session 0, specifically for the purpose of being able to provide a link without having to specify a valid session id (or rather 0 is considered a valid session id, however when the user requires a ‘real’ session id then one is generated at that point).
So instead of links being indexed like this –
The link can be –
Where the 0 represents the session id, so all the links in Google would reference the same URL.
So, that’s all great isn’t it? Well…almost…unfortunately there was a problem with using Session 0, and it was to do with redirects.
First, let’s look at what happens when a user (or really a browser) requested a page using session 0 in Apex 3.1.1 and earlier (note that in the following code I’ve removed certain identifying things like IP addresses etc).
[jes@MBP ~]$ GET -d -e "http://dbvm/pls/apex/f?p=101:1:0" Connection: close Date: Fri, 29 Aug 2008 05:00:26 GMT Location: f?p=101:1:0 Content-Length: 0 Content-Type: text/html; charset=UTF-8 Client-Date: Fri, 29 Aug 2008 05:00:26 GMT Client-Peer: 192.168.0.100 Client-Response-Num: 1 Client-Warning: Redirect loop detected (max_redirect = 7) Set-Cookie: WWV_PUBLIC_SESSION_101=1074339690918688
Here I am using the GET command from the libwww-perl toolkit to simulate a browser request for the page. The parameters I use tell the command to only show me the HTTP Response details (the -e parameter) and that I’m not interested in seeing the actual response (the -d parameter).
The key thing here is the ‘Redirect loop detected’ message, this is the GET command telling you that it has found a redirect back to the same URI. The message is a little misleading since it sort of implies an infinite loop (which you’d think would make your browser hang). However if we simulate the same URL request using plain old telnet, you’ll see the real response:
[jes@MBP ~]$ telnet dbvm 80 Trying 192.168.0.100... Connected to 192.168.0.100... Escape character is '^]' GET /pls/apex/f?p=101:1:0 HTTP/1.1 HOST: foo.com <p>HTTP/1.1 302 Found Date: Fri, 29 Aug 2008 05:00:56 GMT Location: f?p=101:1:0 Set-Cookie: WWV_PUBLIC_SESSION_101=507304029881630 Content-Type: text/html; charset=UTF-8 Content-Length: 0 Connection: close
The line with “Location: f?p=101:1:0” is the killer line here, as it tells the browser to redirect back to the same page (using a relative link rather than an absolute one), however also notice that a cookie is being set.
So the sequence of events is:
- Browser requests a page using session 0.
- Server responds with a redirect location and a cookie
- Browser requests the redirect location
So, essentially whenever a browser requested a URL using session 0 your webserver would actually be hit at least twice. For a small website this might not be a problem, however for a large site with lots of users who bookmarked a link to the home page with a Session 0 link, or people who used a link from another site etc, this could potentially add a big overhead to the number of requests your webserver had to handle (whilst the redirect response is not a large amount of content to server, it is still nonetheless a web request that needs to be handled).
This was also a potential problem for search engines, since many search engines do not always handle redirects nicely, since they may assume that when they try to access (and index) a page that if you’re sending them somewhere else with a redirect that something is ‘not quite right’. It’s certainly a factor in getting Google to nicely index Apex applications.
So, let’s take a look at how it works after patching to 3.1.2, running the same request for the same session 0 URL we get:
[jes@MBP ~]$ GET -d -e "http://dbvm/pls/apex/f?p=101:1:0" Connection: close Date: Fri, 22 Aug 2008 05:02:01 GMT Content-Length: 13352 Content-Type: text/html; charset=UTF-8 Content-Type: text/html; charset=utf-8 Client-Date: Fri, 29 Aug 2008 05:02:01 GMT Client-Peer: 192.168.0.100 Client-Response-Num: 1 Set-Cookie: WWV_PUBLIC_SESSION_101=8141285575191180
Notice how this time there is no redirect at all, the content is returned directly (note the Content-Length response header) and the cookie is automatically set.
If you’ve never had first hand experience of the problems the previous Session 0 behaviour could cause, then this might not look that interesting, however the fact it is now patched has huge consequences for most Apex applications out there, in two key areas:
- Your webserver will now not need to handle all those additional redirect requests, meaning the the webserver is freed up to support even more ‘real’ end user requests.
- Search engines can now more easily index Apex applications, without you having to do a single thing (well besides installing the patch).
In short, by applying this patch you have taken another step forward in making your Apex infrastructure much more scalable and I also expect to start seeing many more Apex applications ranked higher in Google (and other search engines….had to say it, sorry).
I’ll hopefully post some more on the other implications in some of the patches if I get a chance…