I'm working on a website running on a shared Apache v2.2 server, so all configuration is via .htaccess files, and I wanted to use mod_rewrite to map URLs to the filesystem in less-than-completely-straightforward way. Just for example's sake, let's say that what I wanted to do was this:
- Map URL
www.mysite.com/Alice
to filesystem folder /public_html/Bob
- Map URL
www.mysite.com/Bob
to filesystem folder /public_html/Alice
Now, after several hours work carefully designing the ruleset (the real one, not the Alice/Bob one!) I put all my carefully crafted rewriting rules in a .htaccess file in /public_html, and tested it out ...only to get a 500 server error!
I'd been caught out by a well documented "gotcha!" in Apache: When mod_rewrite rules are used inside a .htaccess file, a re-written URL is re-submitted for another round of processing (as if it were an external request). That happens so that any rules in the target folder of the re-written request can be applied, but it can result in some very counter-intuitive behaviour by the webserver!
In the above example, that means that a request for www.mysite.com/Alice/foo.html
gets rewritten to /Bob/foo.html
, and then resubmitted (internally) to the server as a request for www.mysite.com/Bob/foo.html
. This is then re-rewritten back to /Alice/foo.html
and resubmitted, which causes it to get re-re-rewritten to /Bob/foo.html
, and so on; an infinite loop ensues... broken only by a server timeout error.
The question is, how to ensure that a .htaccess mod_rewrite ruleset only gets applied ONCE?
The [L] flag in a RewriteRule stops all further rewriting during a single pass through the ruleset, but doesn't stop the entire ruleset from being re-applied after the re-written URL is resubmitted to the server. According to the documentation, Apache v2.3.9+ (currently in Beta) contains an [END] flag that provides precisely this functionality. Unfortunately, the web host is still using Apache 2.2, and they declined my polite request to upgrade to the beta version!
What's needed is a workaround that provides similar functionality to the [END] flag. My first thought was that I could use an environment variable: Set a flag during the first rewriting pass that would tell subsequent passes to do no further rewriting. If I called my flag variable 'END', the code might look like this:
# Prevent further rewriting if 'END' is flagged
RewriteCond %{ENV:END} =1
RewriteRule .* - [L]
# Map /Alice to /Bob, and /Bob to /Alice, and flag 'END' when done
RewriteRule ^Alice(/.*)?$ Bob$1 [L,E=END:1]
RewriteRule ^Bob(/.*)?$ Alice$1 [L,E=END:1]
Unforunately this code doesn't work: After a bit of experimentation, I discovered that environment variables don't survive the process of re-submitting the rewritten URL to the server. The last line on this Apache documentation page suggests that environment variables ought to survive internal redirects, but I found that not to be the case.
[EDIT: On some servers, it does work. If so, it's a better solution than what follows below. You'll have to try it for yourself on your own server to see.]
Still, the general idea can be salvaged. After many hours of hair-pulling, and some advice from a colleague, I realised that HTTP request headers are preserved across internal redirects, so if I could store my flag in one of those, it might work!
Here's my solution:
# This header flags that there's no more rewriting to be done.
# It's a kludge until use of the END flag becomes possible in Apache v2.3.9+
# ######## REMOVE this directive for Apache 2.3.9+, and change all [...,L,E=END:1]
# ######## to just [...,END] in all the rules below!
RequestHeader set SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj 1 env=END
# If our special end-of-rewriting header is set this rule blocks all further rewrites.
# ######## REMOVE this directive for Apache 2.3.9+, and change all [...,L,E=END:1]
# ######## to just [...,END] in all the rules below!
RewriteCond %{HTTP:SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj} =1 [NV]
RewriteRule .* - [L]
# Map /Alice to /Bob, and /Bob to /Alice, and flag 'END' when done
RewriteRule ^Alice(/.*)?$ Bob$1 [L,E=END:1]
RewriteRule ^Bob(/.*)?$ Alice$1 [L,E=END:1]
...and, it worked! Here's why: Inside a .htaccess file, directives associated with various apache modules execute in the module order defined in the main Apache configuration (or, that's my understanding, anyway...). In this case (and critically for the success of this solution) mod_headers was set to execute after mod_rewrite, so the RequestHeader directive gets executed after the rewrite rules. That means the the SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj
header gets added to the HTTP request iff a RewriteRule with [E=END:1] in its flag list gets matched. On the next pass (after the re-written request is resubmitted to the server) the first RewriteRule detects this header, and aborts any further rewriting.
Some things to note about this solution are:
It won't work if Apache is configured to run mod_headers before mod_rewrite. (I'm not sure if that's even possible, or if so, how unusual it'd be).
If an external user includes a SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj
header in their HTTP request to the server, it'll disable all URL rewriting rules, and that user will see the filesystem directory structure "as-is". That's the reason for the random string of ascii characters at the end of the header name - it's to make the header hard to guess. Whether this is a feature or a security vulnerability depends on your point of view!
The idea here was a workaround to mimic the use of the [END] flag in Apache versions that don't yet have it. If all you wanted was to ensure your ruleset only runs once, regardless of which rules are triggered, then you could probably drop the use of the 'END' environment variable and just do this:
RewriteCond %{HTTP:SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj} =1 [NV]
RewriteRule .* - [L]
RequestHeader set SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj 1
# Map /Alice to /Bob, and /Bob to /Alice
RewriteRule ^Alice(/.*)?$ Bob$1 [L]
RewriteRule ^Bob(/.*)?$ Alice$1 [L]
Or even better, this (though the REDIRECT_* variables are poorly documented in the Apache v2.2 documetation - they seem to be only mentioned here) - so I can't guarantee it'd work on all versions of Apache):
RewriteCond %{ENV:REDIRECT_STATUS} !^$
RewriteRule .* - [L].
# Map /Alice to /Bob, and /Bob to /Alice
RewriteRule ^Alice(/.*)?$ Bob$1 [L]
RewriteRule ^Bob(/.*)?$ Alice$1 [L]
However, once you're running Apache v2.3.9+, I expect that using the [END] flag would be more efficient than the above solution, because (presumably) it altogether avoids the rewritten URL being re-submitted to the server for another rewriting pass.
Note that you may also want to block rewriting of subrequests, in which case you can a RewriteCond
to the don't-do-any-more-rewriting rule, like this:
RewriteCond %{ENV:REDIRECT_STATUS} !^$ [OR]
RewriteCond %{IS_SUBREQ} =true
RewriteRule .* - [L]
The idea here was a workaround to replace the use of the [END] flag in Apache versions that don't yet have it. But in fact you can use this general approach to store more than just a single flag - you could store arbitrary strings or numbers that would persist across an internal server redirect, and design your rewrite rules to depend on them based on any of the test conditions RuleCond provides. (I can't, off the top of my head, think of a reason why you'd want to do that... but hey, the more flexibility and control you have, the better, right?)
I guess anyone who's read this far has figured out that I'm not really asking a question here. It's more a matter of my having found my own solution to a problem I had, and wanting to post it up here for reference in case anyone else has run into the same problem. That's a big part of what this webiste is for, right?
...
But since this is supposed to be a question-and-answer forum, I'll ask:
- Can anyone see any potential problems with this solution (other than those I've already mentioned)?
- Or does anyone have a better way of achieving the same thing?
See Question&Answers more detail:
os