Default Pages

Posted on 2008-07-05

I like to end my URLs with a file name (just my personal preference). For SEO, I want to make sure that the default page (index.html in my case) for a directory is indexed only once, so I wrote a little RedirectIndex stub that I had mapped to the various possible directories.

Internally, I always use the page name, so I figured that was good enough. However, GoogleBot seems to guess that index.html is the default page, and I caught it trying to access a page with a query string without the file name. In other words, it tried to hit /tag/square/?page=2 instead of /tag/square/index.html?page=3.

Changing the mappings from / to /(?:[?].*)? solved it. It seems like something in the webapp needs the group to be non-capturing. Otherwise you get

Traceback (most recent call last):
  File "/path/to/google_appengine/google/appengine/ext/webapp/__init__.py", line 499, in __call__
    handler.get(*groups)
TypeError: get() takes exactly 1 argument (2 given)

Tags: appengine