Application Express 3.1.2 Upgrade – Session Zero and Redirects

I was glancing through the list of fixed bugs in the Apex 3.1.2 release and noticed that one of my long time ‘annoyances’ has been fixed.

Many people have had problems getting their Apex applications indexed by Google and other search engines. One of the reasons for this was that Google (and other search engines) would often index the page using a different session id each time, since Google (and others…ok, let’s just say Google from here on in?) does not ‘understand’ the format of Apex style URL’s and would therefore consider each URL unique.

In short, you could end up with the same page in your application indexed multiple times, with a different session id in each and therefore Google would not treat it as the same page.

The Apex team introduced the idea of Session 0, specifically for the purpose of being able to provide a link without having to specify a valid session id (or rather 0 is considered a valid session id, however when the user requires a ‘real’ session id then one is generated at that point).

So instead of links being indexed like this –

http://foo.com/pls/apex/f?p=101:1:7678676125675

The link can be –

http://foo.com/pls/apex/f?p=101:1:0

Where the 0 represents the session id, so all the links in Google would reference the same URL.

So, that’s all great isn’t it? Well…almost…unfortunately there was a problem with using Session 0, and it was to do with redirects.

First, let’s look at what happens when a user (or really a browser) requested a page using session 0 in Apex 3.1.1 and earlier (note that in the following code I’ve removed certain identifying things like IP addresses etc).

[jes@MBP ~]$ GET -d -e "http://dbvm/pls/apex/f?p=101:1:0"
Connection: close
Date: Fri, 29 Aug 2008 05:00:26 GMT
Location: f?p=101:1:0
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Client-Date: Fri, 29 Aug 2008 05:00:26 GMT
Client-Peer: 192.168.0.100
Client-Response-Num: 1
Client-Warning: Redirect loop detected (max_redirect = 7)
Set-Cookie: WWV_PUBLIC_SESSION_101=1074339690918688

Here I am using the GET command from the libwww-perl toolkit to simulate a browser request for the page. The parameters I use tell the command to only show me the HTTP Response details (the -e parameter) and that I’m not interested in seeing the actual response (the -d parameter).

The key thing here is the ‘Redirect loop detected’ message, this is the GET command telling you that it has found a redirect back to the same URI. The message is a little misleading since it sort of implies an infinite loop (which you’d think would make your browser hang). However if we simulate the same URL request using plain old telnet, you’ll see the real response:

[jes@MBP ~]$ telnet dbvm 80
Trying 192.168.0.100...
Connected to 192.168.0.100...
Escape character is '^]'
GET /pls/apex/f?p=101:1:0 HTTP/1.1
HOST: foo.com
<p>HTTP/1.1 302 Found
Date: Fri, 29 Aug 2008 05:00:56 GMT
Location: f?p=101:1:0
Set-Cookie: WWV_PUBLIC_SESSION_101=507304029881630
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Connection: close

The line with “Location: f?p=101:1:0” is the killer line here, as it tells the browser to redirect back to the same page (using a relative link rather than an absolute one), however also notice that a cookie is being set.

So the sequence of events is:

  1. Browser requests a page using session 0.
  2. Server responds with a redirect location and a cookie
  3. Browser requests the redirect location

So, essentially whenever a browser requested a URL using session 0 your webserver would actually be hit at least twice. For a small website this might not be a problem, however for a large site with lots of users who bookmarked a link to the home page with a Session 0 link, or people who used a link from another site etc, this could potentially add a big overhead to the number of requests your webserver had to handle (whilst the redirect response is not a large amount of content to server, it is still nonetheless a web request that needs to be handled).

This was also a potential problem for search engines, since many search engines do not always handle redirects nicely, since they may assume that when they try to access (and index) a page that if you’re sending them somewhere else with a redirect that something is ‘not quite right’. It’s certainly a factor in getting Google to nicely index Apex applications.

So, let’s take a look at how it works after patching to 3.1.2, running the same request for the same session 0 URL we get:

[jes@MBP ~]$ GET -d -e "http://dbvm/pls/apex/f?p=101:1:0"
Connection: close
Date: Fri, 22 Aug 2008 05:02:01 GMT
Content-Length: 13352
Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=utf-8
Client-Date: Fri, 29 Aug 2008 05:02:01 GMT
Client-Peer: 192.168.0.100
Client-Response-Num: 1
Set-Cookie: WWV_PUBLIC_SESSION_101=8141285575191180

Notice how this time there is no redirect at all, the content is returned directly (note the Content-Length response header) and the cookie is automatically set.

If you’ve never had first hand experience of the problems the previous Session 0 behaviour could cause, then this might not look that interesting, however the fact it is now patched has huge consequences for most Apex applications out there, in two key areas:

  • Your webserver will now not need to handle all those additional redirect requests, meaning the the webserver is freed up to support even more ‘real’ end user requests.
  • Search engines can now more easily index Apex applications, without you having to do a single thing (well besides installing the patch).

In short, by applying this patch you have taken another step forward in making your Apex infrastructure much more scalable and I also expect to start seeing many more Apex applications ranked higher in Google (and other search engines….had to say it, sorry).

I’ll hopefully post some more on the other implications in some of the patches if I get a chance…

3 thoughts on “Application Express 3.1.2 Upgrade – Session Zero and Redirects

  1. Jimbo

    Is there something I don’t understand or is this as useful as a fart in a hurricane as far as pagerank is concerned?

    If I find something interesting on asktom then I will paste the url, which includes the session_id, into my blog and google will consider the page to be unique. Now that we can use a session-zero those in the know can replace the session_id with a zero and the page will no longer be unique. But removing the session_id will do the same thing, so this has never been an issue for those in the know.

    For this feature to make an ApEx app more visible on the web, each page would have to have those little buttons which allow you to link from digg, facebook, reddit, etc etc…the number of little buttons on some sites grows by the month! And people would have to use the buttons rater than simply copy and paste the URL. I doubt this feature will push ApEx apps to the top of google’s listings.

    The session id does not belong in the url. That’s the real issue.

    Like

    Reply
  2. Anton Nielsen

    Jimbo,

    It’s important in for other ways as well. In general search engines will only index a limited number of parameterized URLs (typically well less than 100). As all APEX URLs have params, most pages on a large site won’t get indexed. One way around this is to create a site index that dynamically generates all the URLs, and register that index with Google. Unfortunately, Google’s rules don’t allow ANY redirects in a site index. Hence the need for session zero and the need that it not do a redirect.

    One more not on this fix. It had an unfortunate side effect that on that very first page, when using session zero, you can not programatically add to the http header. I’m sure very very few people will try to do this, but I logged it as a bug and I think it will be fixed in the next release.

    Anton

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s