Posted on 2008-07-05
I like to end my URLs with a file name (just my personal preference). For SEO, I want to make sure that the default page (index.html in my case) for a directory is indexed only once, so I wrote a little RedirectIndex stub that I had mapped to the various possible directories.
Internally, I always use the page name, so I figured that was good enough. However, GoogleBot seems to guess that index.html is the default page, and I caught it trying to access a page with a query string without the file name. In other words, it tried to hit /tag/square/?page=2 instead of /tag/square/index.html?page=3.
Changing the mappings from / to /(?:[?].*)? solved it. It seems like something in the webapp needs the group to be non-capturing. Otherwise you get
Traceback (most recent call last):
File "/path/to/google_appengine/google/appengine/ext/webapp/__init__.py", line 499, in __call__
handler.get(*groups)
TypeError: get() takes exactly 1 argument (2 given)
Tags: appengine