Allowing or Blocking based on Country with .htaccess

GeoIP Apache API

Requirements
In order to run this API, you need the following installed:

Download
Downloads are available for Apache 1.3.x and Apache 2.x.

Install
See the INSTALL file included with the mod_geoip API download for detailed instructions.

Usage
mod_geoip looks up the IP address of the client end user. If you need to input the IP address instead of simply using the client IP address, you will need to use another one of our APIs.

For the country database, mod_geoip sets two environment variables, GEOIP_COUNTRY_CODE and GEOIP_COUNTRY_NAME. For other databases, see the README file included with the mod_geoip API.

It also sets two entries in Apache’s notes table with the same names as above.

For more documentation, see the README file included with the mod_geoip API download.

Examples
Redirection with mod_geoip and mod_rewrite
Below are examples of how to perform redirection based on country with mod_geoip and mod_rewrite. This configuration should be added to your Apache httpd.conf or .htaccess file.

GeoIPEnable On
GeoIPDBFile /path/to/GeoIP.dat

# Redirect one country
RewriteEngine on
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA$
RewriteRule ^(.*)$ http://www.canada.com$1 [L]

# Redirect multiple countries to a single page
RewriteEngine on
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(CA|US|MX)$
RewriteRule ^(.*)$ http://www.northamerica.com$1 [L]

This example redirects all pages on your site to a corresponding page on www.canada.com. For more details on how to use Apache’s redirection features, see the Apache 1.3 URL Rewriting Guide.

Blocking unwanted countries
The following Apache configuration directives uses GeoIP Country to block traffic from China and Russia:

GeoIPEnable On
GeoIPDBFile /path/to/GeoIP.dat

SetEnvIf GEOIP_COUNTRY_CODE CN BlockCountry
SetEnvIf GEOIP_COUNTRY_CODE RU BlockCountry
# ... place more countries here

Deny from env=BlockCountry

# Optional - use if you want to allow a specific IP address from the country you denied
# (See http://httpd.apache.org/docs/1.3/mod/mod_access.html for more details)
Allow from 10.1.2.3

Allowing only specified countries
The following Apache configuration directives uses GeoIP Country to only allow traffic from US, Canada, and Mexico.

GeoIPEnable On
GeoIPDBFile /path/to/GeoIP.dat

SetEnvIf GEOIP_COUNTRY_CODE US AllowCountry
SetEnvIf GEOIP_COUNTRY_CODE CA AllowCountry
SetEnvIf GEOIP_COUNTRY_CODE MX AllowCountry
# ... place more countries here

Deny from all
Allow from env=AllowCountry

# Optional - use if you want to allow a specific IP address from the country you denied
# (See http://httpd.apache.org/docs/1.3/mod/mod_access.html for more details)
Allow from 10.1.2.3

Accessing a HostGator SVN repository via SVN+SSH on Windows

Accessing a HostGator SVN repository via SVN+SSH on Windows

This information should be helpful to anyone trying to access an svn repository stored on a remote (shared) server which does not expose an svn server.

My host is HostGator (good speeds, reliable ssh, cgi-only, MyISAM-only, decent support, non-existent knowledgebase). HostGator runs SSH over port 2222 which presents a few problems when trying to use traditional methods to connect to an SVN repository via SSH.

For these steps you will need Putty. Just get the whole suite.

http://kjvarga.blogspot.com/2008/04/accessing-hostgator-svn-repository-via.html

X-Content-Type-Options: nosniff header

Over the past two months, we’ve received significant community feedback that using a new attribute on the Content-Type header would create a deployment headache for server operators. To that end, we have converted this option into a full-fledged HTTP response header.  Sending the new X-Content-Type-Options response header with the value nosniff will prevent Internet Explorer from MIME-sniffing a response away from the declared content-type.

For example, given the following HTTP-response:

HTTP/1.1 200 OK
Content-Length: 108
Date: Thu, 26 Jun 2008 22:06:28 GMT
Content-Type: text/plain;
X-Content-Type-Options: nosniff

<html>
<body bgcolor=”#AA0000″>
This page renders as HTML source code (text) in IE8.
</body>
</html>

Browsers sniff mime types of HTTP responses, initially because page authors frequently don’t get them right* and now because browsers have done it historically.

The worst instance related to mime sniffing is an old IE bug. As I understand it their sniffer tried some image formats and then HTML; then when they added PNG sniffing it was added to the sniff list after HTML, either by mistake or to maintain compatibility with pages that were currently being sniffed as HTML. The result of this is that even valid PNG images can be sniffed as HTML, converting a user-uploadable image into a Javascript (XSS) vector. The Chromium mime sniffer’s comments (which are quite readable, and tabulate various browsers’ behaviors) describe this as a “dangerous mime type”.

But there are plenty of other ways that sniffing can screw you as a site author. Your only defenses if you’re building a site are:

  • either make sure user-uploaded images are on a different origin than your site’s cookies;
  • or set the Content-disposition: attachment header, preventing people from displaying the image in their browser.

I believe this bug is why you cannot view images attached to gmail messages — if you click “view image” in gmail you instead get an HTML page with an <img> tag, and if you right-click on that image and pick “view image” you’ll get it served with the attachment header.

To solve this mess, IE introduced the X-Content-Type-Options: nosniff header, which means “don’t sniff the mime type”. It looks like a reasonable workaround to me: it lets new pages opt into sane behavior without breaking old ones. Chromium added support for it.

It sounded good to developers of a Google-internal HTTP server as well; they added it by default to all responses. And then the bug reports started coming in: “Why does my page render in all browsers but Chromium?” It turned out many of these sites were sending no Content-type header, which, when coupled with the nosniff header, meant Chromium would pick the default of application/octet-stream, triggering a download box.

The fix is to match IE (r8559) for this corner case, which is to instead default to text/plain; I made wisecracks about adding an X-Content-Type-Options-Options: no-really-none-of-these-mime-shenanigans header. Adam (master of content-type sniffing, and I believe editor of the HTML5 sniffing spec) also wrote r8257. This collects stats (aggregated anonymized and only from users who have opted in) on what fraction of pages that we normally would’ve sniffed but were instead blocked by the header.

* In fairness, the greater problem is that page authors sometimes don’t control HTTP headers. They’re frequently defined by server configuration, which often requires root on the server or at least a lot more technical know-how than “click on the upload button in your website creator program”

How to change configuration settings in php from .htaccess

To change the configuration for php running as cgi those handy module commands won't work.. The work-around is being able to tell php to start with a custom php.ini file.. configured the way you want.

With multiple custom php.ini files

 -------------------------------------------
 /site/ini/1/php.ini
 /site/ini/2/php.ini
 /site/ini/3/php.ini
 --
 


The trick is creating a wrapper script to set the location of the php.ini file that php will use. Then it exec's the php cgi.

 shell script /cgi-bin/phpini.cgi
 -------------------------------------------
 #!/bin/sh
 export PHPRC=/site/ini/1
 exec /cgi-bin/php5.cgi
 --


Now all you have to do is setup Apache to run php files through the wrapper script instead of just executing the php cgi.

 In your .htaccess or httpd.conf file
 -------------------------------------------
 AddHandler php-cgi .php
 Action php-cgi /cgi-bin/phpini.cgi
 --


So to change the configuration of php you just need to change the PHPRC variable to point to a different directory containing your customized php.ini.. You could also create multiple shell wrapper scripts and create multiple Handler's+Actions in .htaccess..

 in your .htaccess
 -------------------------------------------
 AddHandler php-cgi1 .php1
 Action php-cgi1 /cgi-bin/phpini-1.cgi
 
 AddHandler php-cgi2 .php2
 Action php-cgi2 /cgi-bin/phpini-2.cgi
 
 AddHandler php-cgi3 .php3
 Action php-cgi3 /cgi-bin/phpini-3.cgi
 --


The only caveat here is that it seems like you would have to rename the file extensions, but there are ways around that too outlined by AskApache:  Custom PHP.INI with .htaccess tricks.


Multi-Language and Content-Negotiation for Mirror

AskApache Intro and Mirror Update

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi guys,

I just made some fixes to the mirrors I run at:

 http://tor.askapache.com/
 http://tor.askapache.com/dist/

This is the .htaccess file that I came up with, note I painstakingly
determined the languages..  (In the future it would be great to get each
corresponding charset for the AddCharset directive)

Note I also added compression for basic text/ plain/ type files and
added some simple Cache-Control through the mod_expires module.

#######################################################################
Options FollowSymLinks Indexes MultiViews
Order allow,deny
Allow from all
SetEnv SERVER_ADMIN webmaster@xxxxxxxxxxxxx
SetEnv TZ America/California
DirectoryIndex index

# German (de)
# English (en)
# Spanish (es)
# Farsi (fa)
# Suomi (fi)
# French (fr)
# Italian (it)
# Japanese (ja)
# Korean (ko)
# Dutch (nl)
# Norwegian (no)
# Russian (ru)
# Portugese (pt)
# Polish (pl)
# Svenska (se)
# Turkish (tr)
# Simplified Chinese (zh-CN)

AddLanguage de .de
AddLanguage en .en
AddLanguage es .es
AddLanguage fa .fa
AddLanguage fi .fi
AddLanguage fr .fr
AddLanguage it .it
AddLanguage ja .ja
AddLanguage ko .ko
AddLanguage nl .nl
AddLanguage no .no
AddLanguage pl .pl
AddLanguage pt .pt
AddLanguage ru .ru
AddLanguage se .se
AddLanguage tr .tr
AddLanguage zh-CN .zh-cn

# TODO: Get all the charsets for each lang
AddCharset ISO-8859-1 .iso8859-1 .nl .se
AddCharset UTF-8 .utf8

AddDefaultCharset UTF-8
DefaultType text/html
DefaultLanguage en

LanguagePriority en de es fr ja ko pt-br ru tr
ForceLanguagePriority Prefer Fallback

AddType text/html .tr
RemoveHandler .pl

<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/html text/plain text/css
AddOutputFilterByType DEFLATE text/xml application/javascript
</IfModule>

<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault M3600
</IfModule>

- --
AskApache

- -- 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJKZ908AAoJEMrKIrNtw6tfP60IALHBKMC8m+8WZ3cSKX2hySjH
xy4HcAmKcXHv0pyBayuh8v0QBLAMD/cLSnF1/NYdP2kWm5C2S2UPdF/lIykG3cvi
TtvKQ1jY0LGDLKm5b5DsS1goqm33ogmGxueyKJPb3j5lhQighAaPUniW5n3kp3P9
vstwl5zHfCEGi4NUPRjDkIbGHHO+fw+A1P6G/J8/T1XFrsFb6wMus6KZZUEGLoGU
39WwFQocq0qopXf/1eQEpE/BQHVO/nezlyhfxWLy21BGKwaFgF/0p1+xkCcuBqlq
py/LBVbDsRXnBnHZ+cDBRDuN68IamX1Ba56apuU4mb/stpXU6XwwsMSSmBGi+NM=
=F8dX
-----END PGP SIGNATURE-----

Proxy Authentication with Squid

How does Proxy Authentication work in Squid?

Users will be authenticated if squid is configured to use proxy_auth ACLs.

Browsers send the user’s authentication credentials in the Authorization request header.

If Squid gets a request and the http_access rule list gets to a proxy_auth ACL, Squid looks for the Authorization header. If the header is present, Squid decodes it and extracts a username and password.

If the header is missing, Squid returns an HTTP reply with status 407 (Proxy Authentication Required). The user agent (browser) receives the 407 reply and then prompts the user to enter a name and password. The name and password are encoded, and sent in the Authorization header for subsequent requests to the proxy. Also see this example Authorization Header from .htaccess files.

NOTE: The name and password are encoded using “base64″ (See section 11.1 of RFC 2616). However, base64 is a binary-to-text encoding only, it does NOT encrypt the information it encodes. This means that the username and password are essentially “cleartext” between the browser and the proxy. Therefore, you probably should not use the same username and password that you would use for your account login.

Authentication is actually performed outside of main Squid process. When Squid starts, it spawns a number of authentication subprocesses. These processes read usernames and passwords on stdin, and reply with “OK” or “ERR” on stdout. This technique allows you to use a number of different authentication protocols (named “schemes” in this context). When multiple authentication schemes are offered by the server (Squid in this case), it is up to the User-Agent to choose one and authenticate using it. By RFC it should choose the safest one it can handle; in practice usually Microsoft Internet Explorer chooses the first one it’s been offered that it can handle, and Mozilla browsers are bug-compatible with the Microsoft system in this field.

The Squid source code comes with a few authentication backends (“helpers“) for Basic authentication. These include:

  • LDAP: Uses the Lightweight Directory Access Protocol
  • NCSA: Uses an NCSA-style username and password file.
  • MSNT: Uses a Windows NT authentication domain.
  • PAM: Uses the Unix Pluggable Authentication Modules scheme.
  • SMB: Uses a SMB server like Windows NT or Samba.
  • getpwam: Uses the old-fashioned Unix password file.
  • SASL: Uses SALS libraries.
  • mswin_sspi: Windows native authenticator
  • YP: Uses the NIS database

In addition Squid also supports the NTLM, Negotiate and Digest authentication schemes which provide more secure authentication methods, in that where the password is not exchanged in plain text over the wire. Each scheme have their own set of helpers and auth_param settings. Notice that helpers for different authentication schemes use different protocols to talk with squid, so they can’t be mixed.

For information on how to set up NTLM authentication see NTLM config examples.

In order to authenticate users, you need to compile and install one of the supplied authentication modules found in the helpers/basic_auth/ directory, one of the others, or supply your own.

You tell Squid which authentication program to use with the auth_param option in squid.conf. You specify the name of the program, plus any command line options if necessary. For example:

auth_param basic program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd

How do I use authentication in access controls?

Make sure that your authentication program is installed and working correctly. You can test it by hand.

Add some proxy_auth ACL entries to your squid configuration. For example:

acl foo proxy_auth REQUIRED
http_access allow foo
http_access deny all

The REQUIRED term means that any authenticated user will match the ACL named foo.

Squid allows you to provide fine-grained controls by specifying individual user names. For example:

acl foo proxy_auth REQUIRED
acl bar proxy_auth lisa sarah frank joe
acl daytime time 08:00-17:00
http_access allow bar
http_access allow foo daytime
http_access deny all

In this example, users named lisa, sarah, joe, and frank are allowed to use the proxy at all times. Other users are allowed only during daytime hours.

How do I ask for authentication of an already authenticated user?

If a user is authenticated at the proxy you cannot “log out” and re-authenticate. The user usually has to close and re-open the browser windows to be able to re-login at the proxy. A simple configuration will probably look like this:

acl my_auth proxy_auth REQUIRED
http_access allow my_auth
http_access deny all

But there is a trick which can force the user to authenticate with a different account in certain situations. This happens if you deny access with an authentication related ACL last in the http_access deny statement. Example configuration:

acl my_auth proxy_auth REQUIRED
acl google_users proxyauth user1 user2 user3
acl google dstdomain .google.com
http_access deny google !google_users
http_access allow my_auth
http_access deny all

In this case if the user requests www.google.com then first second http_access line matches and triggers re-authentication unless the user is one of the listed users. Remember: it’s always the last ACL on a http_access line that “matches”. If the matching ACL deals with authentication a re-authentication is triggered. If you didn’t want that you would need to switch the order of ACLs so that you get http_access deny !google_users google.

You might also run into an authentication loop if you are not careful. Assume that you use LDAP group lookups and want to deny access based on an LDAP group (e.g. only members of a certain LDAP group are allowed to reach certain web sites). In this case you may trigger re-authentication although you don’t intend to. This config is likely wrong for you:

acl ldapgroup-allowed external LDAP_group PROXY_ALLOWED

http_access deny !ldapgroup-allowed
http_access allow all

The second http_access line would force the user to re-authenticate time and again if he/she is not member of the PROXY_ALLOWED group. This is perhaps not what you want. You rather wanted to deny access to non-members. So you need to rewrite this http_access line so that an ACL matches that has nothing to do with authentication. This is the correct example:

acl ldapgroup-allowed external LDAP_group PROXY_ALLOWED

http_access deny !ldapgroup-allowed all
http_access allow all

This way the http_access line still matches. But it’s the all ACL which is now last in the line. Since all is a static ACL (that always matches) and has nothing to do with authentication you will find that the access is just denied.

More Info

Example .htaccess

Send Custom Headers

Header set P3P "policyref=\"http://www.askapache.com/w3c/p3p.xml\""
Header set X-Pingback "http://www.askapache.com/xmlrpc.php"
Header set Content-Language "en-US"
Header set Vary "Accept-Encoding"

Blocking based on User-Agent Header

SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(libwww-perl|aesop_com_spiderman) HTTP_SAFE_BADBOT
Deny from env=HTTP_SAFE_BADBOT

proxy_auth acl causing challenge loop
> Well, I really prefer the old behaviour, so I hope the behaviour is not
> hardcoded, but configurable.

It’s not hardcoded, instead it is dependent on how your http_access rules
are constructed.

Squid prompts for login credentials if the user is denied access by an
authentication related acl (proxy_auth, proxyauth_regex, external using
%LOGIN).

http_access deny someacl authacl
prompts for new credentials if matched (denied by authacl)
http_access deny authacl someacl
does nor prompt for new credentials (denied by someacl)

Further Resources

  1. smb.conf man page
  2. smbclient man page
  3. ntlm_auth man page
  4. Configuring Squid Proxy To Authenticate With Active Directory
  5. Samba & Active Directory
  6. The Linux-PAM System Administrators’ Guide

Original Source: ProxyAuthentication © Creative Commons Attribution Sharealike 2.5 License

Python working on Apache

This page gives some nice but advanced tricks for a moin Apache setup with php and .htaccess.  The directives
on this page assume that you have knowledge about Apache configuration, newbies should
stick to the basic setup. This requires the Apache module “mod_rewrite” for rewriting  (which should be standard).

mod_python is an Apache module that embeds the Python interpreter within the server. With mod_python you can write web-based applications in Python that will run many times faster than traditional CGI and will have access to advanced features such as ability to retain database connections and other data between hits and access to Apache internals.

Read More »

List of .htaccess Examples

Advanced mod_rewrite Expert Tricks

Are you an advanced mod_rewrite expert or guru? This article is for YOU too!

The following undocumented techniques and methods will allow you to utilize mod_rewrite at an “expert level” by showing you how to unlock its secrets.

Most if not all web developers and server administrators struggle with Apache mod_rewrite. It’s very tough and only gets a little easier with practice. Until Now! Get ready to explode your learning curve,….

Decoding Mod_Rewrite Variables

So when I realized my problem was that I didn’t know the value of the variable being tested by the RewriteCond, I set out to try and discover how to view those variables.. Keep in mind you can also use RewriteLogging, but its only allowed for root users who can edit the httpd.conf, this is .htaccess.

Setting Environment Variables with RewriteRule

I discovered a multitude of methods to set and view apache environment variables, using various modules and some core tricks, but the method that allows me to view the most environment variables is RewriteRule.. I wanted to use SetEnvIf more, but its just not as powerful as mod_rewrite, due to programming.

This code sets the variable INFO_REQUEST_URI to have the value of REQUEST_URI.

RewriteEngine On
RewriteBase /
RewriteRule .* - [E=INFO_REQUEST_URI:%{REQUEST_URI},NE]

Saving the Apache Variable Values

Now the trick is how to view that environment variable… The method I came up with is nice… We will send the environment variable value in an HTTP Header, as there isn’t much data manipulation/validation so you get an accurate look at the actual value.. At first I tried adding the variable value to a redirection using the query_string.. but a HTTP_USER_AGENT value doesn’t play well as a query_string.

Using RequestHeader in .htaccess

This code takes advantage of the incredible mod_headers apache module to actually ADD a whole new header to YOUR request. Seriously one of the coolest tricks I’ve found yet.. Its almost the same as being able to spoof POST requests! Since Headers can be protected data… especially the HTTP_COOKIE header..

RequestHeader set INFO_REQUEST_URI "%{INFO_REQUEST_URI}e"

Viewing the Variable Values

Now you can use any kind of server-run interpreter like perl, php, ruby, etc., to view all the variable values. All cgi-script handlers like those are able to view request headers..

PHP Code to access Apache Variables

Works even in safe-mode… any interpreter can view HTTP Headers! Note that each of these variables are added as HTTP headers to the request for the script.. kinda confusing.. So each variable sent as a header is prefixed with HTTP_ to denote it was a header.

<?php
header("Content-Type: text/plain");
$INFO=$MISS=array();
foreach($_SERVER as $v=>$r)
{
  if(substr($v,0,9)=='HTTP_INFO')
  {
    if(!empty($r))$INFO[substr($v,10)]=$r;
    else $MISS[substr($v,10)]=$r;
  }
}

/* thanks Mike! */
ksort($INFO);
ksort($MISS);
ksort($_SERVER);

echo "Received These Variables:\n";
print_r($INFO);

echo "Missed These Variables:\n";
print_r($MISS);

echo "ALL Variables:\n";
print_r($_SERVER);
?>

Time to Get Crazy

Just create the above php file on your site as /test/index.php or whatever, then create /test/.htaccess which should contain the below .htaccess file snippet. Now just request /test/index.php and be amazed!

Ok, so I’ve prepared the .htaccess code you can use to view the values of all these variables. Just add it to a .htaccess file and make a request. For this test I created an index.php file that printed out all the $_SERVER variables, and made requests to it.

RewriteEngine On
RewriteBase /
RewriteRule .* - [E=INFO_API_VERSION:%{API_VERSION},NE]
RewriteRule .* - [E=INFO_AUTH_TYPE:%{AUTH_TYPE},NE]
RewriteRule .* - [E=INFO_CONTENT_LENGTH:%{CONTENT_LENGTH},NE]
RewriteRule .* - [E=INFO_CONTENT_TYPE:%{CONTENT_TYPE},NE]
RewriteRule .* - [E=INFO_DOCUMENT_ROOT:%{DOCUMENT_ROOT},NE]
RewriteRule .* - [E=INFO_GATEWAY_INTERFACE:%{GATEWAY_INTERFACE},NE]
RewriteRule .* - [E=INFO_HTTPS:%{HTTPS},NE]
RewriteRule .* - [E=INFO_HTTP_ACCEPT:%{HTTP_ACCEPT},NE]
RewriteRule .* - [E=INFO_HTTP_ACCEPT_CHARSET:%{HTTP_ACCEPT_CHARSET},NE]
RewriteRule .* - [E=INFO_HTTP_ACCEPT_ENCODING:%{HTTP_ACCEPT_ENCODING},NE]
RewriteRule .* - [E=INFO_HTTP_ACCEPT_LANGUAGE:%{HTTP_ACCEPT_LANGUAGE},NE]
RewriteRule .* - [E=INFO_HTTP_CACHE_CONTROL:%{HTTP_CACHE_CONTROL},NE]
RewriteRule .* - [E=INFO_HTTP_CONNECTION:%{HTTP_CONNECTION},NE]
RewriteRule .* - [E=INFO_HTTP_COOKIE:%{HTTP_COOKIE},NE]
RewriteRule .* - [E=INFO_HTTP_FORWARDED:%{HTTP_FORWARDED},NE]
RewriteRule .* - [E=INFO_HTTP_HOST:%{HTTP_HOST},NE]
RewriteRule .* - [E=INFO_HTTP_KEEP_ALIVE:%{HTTP_KEEP_ALIVE},NE]
RewriteRule .* - [E=INFO_HTTP_MOD_SECURITY_MESSAGE:%{HTTP_MOD_SECURITY_MESSAGE},NE]
RewriteRule .* - [E=INFO_HTTP_PROXY_CONNECTION:%{HTTP_PROXY_CONNECTION},NE]
RewriteRule .* - [E=INFO_HTTP_REFERER:%{HTTP_REFERER},NE]
RewriteRule .* - [E=INFO_HTTP_USER_AGENT:%{HTTP_USER_AGENT},NE]
RewriteRule .* - [E=INFO_IS_SUBREQ:%{IS_SUBREQ},NE]
RewriteRule .* - [E=INFO_ORIG_PATH_INFO:%{ORIG_PATH_INFO},NE]
RewriteRule .* - [E=INFO_ORIG_PATH_TRANSLATED:%{ORIG_PATH_TRANSLATED},NE]
RewriteRule .* - [E=INFO_ORIG_SCRIPT_FILENAME:%{ORIG_SCRIPT_FILENAME},NE]
RewriteRule .* - [E=INFO_ORIG_SCRIPT_NAME:%{ORIG_SCRIPT_NAME},NE]
RewriteRule .* - [E=INFO_PATH:%{PATH},NE]
RewriteRule .* - [E=INFO_PATH_INFO:%{PATH_INFO},NE]
RewriteRule .* - [E=INFO_PHP_SELF:%{PHP_SELF},NE]
RewriteRule .* - [E=INFO_QUERY_STRING:%{QUERY_STRING},NE]
RewriteRule .* - [E=INFO_REDIRECT_QUERY_STRING:%{REDIRECT_QUERY_STRING},NE]
RewriteRule .* - [E=INFO_REDIRECT_REMOTE_USER:%{REDIRECT_REMOTE_USER},NE]
RewriteRule .* - [E=INFO_REDIRECT_STATUS:%{REDIRECT_STATUS},NE]
RewriteRule .* - [E=INFO_REDIRECT_URL:%{REDIRECT_URL},NE]
RewriteRule .* - [E=INFO_REMOTE_ADDR:%{REMOTE_ADDR},NE]
RewriteRule .* - [E=INFO_REMOTE_HOST:%{REMOTE_HOST},NE]
RewriteRule .* - [E=INFO_REMOTE_IDENT:%{REMOTE_IDENT},NE]
RewriteRule .* - [E=INFO_REMOTE_PORT:%{REMOTE_PORT},NE]
RewriteRule .* - [E=INFO_REMOTE_USER:%{REMOTE_USER},NE]
RewriteRule .* - [E=INFO_REQUEST_FILENAME:%{REQUEST_FILENAME},NE]
RewriteRule .* - [E=INFO_REQUEST_METHOD:%{REQUEST_METHOD},NE]
RewriteRule .* - [E=INFO_REQUEST_TIME:%{REQUEST_TIME},NE]
RewriteRule .* - [E=INFO_REQUEST_URI:%{REQUEST_URI},NE]
RewriteRule .* - [E=INFO_SCRIPT_FILENAME:%{SCRIPT_FILENAME},NE]
RewriteRule .* - [E=INFO_SCRIPT_GROUP:%{SCRIPT_GROUP},NE]
RewriteRule .* - [E=INFO_SCRIPT_NAME:%{SCRIPT_NAME},NE]
RewriteRule .* - [E=INFO_SCRIPT_URI:%{SCRIPT_URI},NE]
RewriteRule .* - [E=INFO_SCRIPT_URL:%{SCRIPT_URL},NE]
RewriteRule .* - [E=INFO_SCRIPT_USER:%{SCRIPT_USER},NE]
RewriteRule .* - [E=INFO_SERVER_ADDR:%{SERVER_ADDR},NE]
RewriteRule .* - [E=INFO_SERVER_ADMIN:%{SERVER_ADMIN},NE]
RewriteRule .* - [E=INFO_SERVER_NAME:%{SERVER_NAME},NE]
RewriteRule .* - [E=INFO_SERVER_PORT:%{SERVER_PORT},NE]
RewriteRule .* - [E=INFO_SERVER_PROTOCOL:%{SERVER_PROTOCOL},NE]
RewriteRule .* - [E=INFO_SERVER_SIGNATURE:%{SERVER_SIGNATURE},NE]
RewriteRule .* - [E=INFO_SERVER_SOFTWARE:%{SERVER_SOFTWARE},NE]
RewriteRule .* - [E=INFO_THE_REQUEST:%{THE_REQUEST},NE]
RewriteRule .* - [E=INFO_TIME:%{TIME},NE]
RewriteRule .* - [E=INFO_TIME_DAY:%{TIME_DAY},NE]
RewriteRule .* - [E=INFO_TIME_HOUR:%{TIME_HOUR},NE]
RewriteRule .* - [E=INFO_TIME_MIN:%{TIME_MIN},NE]
RewriteRule .* - [E=INFO_TIME_MON:%{TIME_MON},NE]
RewriteRule .* - [E=INFO_TIME_SEC:%{TIME_SEC},NE]
RewriteRule .* - [E=INFO_TIME_WDAY:%{TIME_WDAY},NE]
RewriteRule .* - [E=INFO_TIME_YEAR:%{TIME_YEAR},NE]
RewriteRule .* - [E=INFO_TZ:%{TZ},NE]
RewriteRule .* - [E=INFO_UNIQUE_ID:%{UNIQUE_ID},NE]

RequestHeader set INFO_API_VERSION "%{INFO_API_VERSION}e"
RequestHeader set INFO_AUTH_TYPE "%{INFO_AUTH_TYPE}e"
RequestHeader set INFO_CONTENT_LENGTH "%{INFO_CONTENT_LENGTH}e"
RequestHeader set INFO_CONTENT_TYPE "%{INFO_CONTENT_TYPE}e"
RequestHeader set INFO_DOCUMENT_ROOT "%{INFO_DOCUMENT_ROOT}e"
RequestHeader set INFO_GATEWAY_INTERFACE "%{INFO_GATEWAY_INTERFACE}e"
RequestHeader set INFO_HTTPS "%{INFO_HTTPS}e"
RequestHeader set INFO_HTTP_ACCEPT "%{INFO_HTTP_ACCEPT}e"
RequestHeader set INFO_HTTP_ACCEPT_CHARSET "%{INFO_HTTP_ACCEPT_CHARSET}e"
RequestHeader set INFO_HTTP_ACCEPT_ENCODING "%{INFO_HTTP_ACCEPT_ENCODING}e"
RequestHeader set INFO_HTTP_ACCEPT_LANGUAGE "%{INFO_HTTP_ACCEPT_LANGUAGE}e"
RequestHeader set INFO_HTTP_CACHE_CONTROL "%{INFO_HTTP_CACHE_CONTROL}e"
RequestHeader set INFO_HTTP_CONNECTION "%{INFO_HTTP_CONNECTION}e"
RequestHeader set INFO_HTTP_COOKIE "%{INFO_HTTP_COOKIE}e"
RequestHeader set INFO_HTTP_FORWARDED "%{INFO_HTTP_FORWARDED}e"
RequestHeader set INFO_HTTP_HOST "%{INFO_HTTP_HOST}e"
RequestHeader set INFO_HTTP_KEEP_ALIVE "%{INFO_HTTP_KEEP_ALIVE}e"
RequestHeader set INFO_HTTP_MOD_SECURITY_MESSAGE "%{INFO_HTTP_MOD_SECURITY_MESSAGE}e"
RequestHeader set INFO_HTTP_PROXY_CONNECTION "%{INFO_HTTP_PROXY_CONNECTION}e"
RequestHeader set INFO_HTTP_REFERER "%{INFO_HTTP_REFERER}e"
RequestHeader set INFO_HTTP_USER_AGENT "%{INFO_HTTP_USER_AGENT}e"
RequestHeader set INFO_IS_SUBREQ "%{INFO_IS_SUBREQ}e"
RequestHeader set INFO_ORIG_PATH_INFO "%{INFO_ORIG_PATH_INFO}e"
RequestHeader set INFO_ORIG_PATH_TRANSLATED "%{INFO_ORIG_PATH_TRANSLATED}e"
RequestHeader set INFO_ORIG_SCRIPT_FILENAME "%{INFO_ORIG_SCRIPT_FILENAME}e"
RequestHeader set INFO_ORIG_SCRIPT_NAME "%{INFO_ORIG_SCRIPT_NAME}e"
RequestHeader set INFO_PATH "%{INFO_PATH}e"
RequestHeader set INFO_PATH_INFO "%{INFO_PATH_INFO}e"
RequestHeader set INFO_PHP_SELF "%{INFO_PHP_SELF}e"
RequestHeader set INFO_QUERY_STRING "%{INFO_QUERY_STRING}e"
RequestHeader set INFO_REDIRECT_QUERY_STRING "%{INFO_REDIRECT_QUERY_STRING}e"
RequestHeader set INFO_REDIRECT_REMOTE_USER "%{INFO_REDIRECT_REMOTE_USER}e"
RequestHeader set INFO_REDIRECT_STATUS "%{INFO_REDIRECT_STATUS}e"
RequestHeader set INFO_REDIRECT_URL "%{INFO_REDIRECT_URL}e"
RequestHeader set INFO_REMOTE_ADDR "%{INFO_REMOTE_ADDR}e"
RequestHeader set INFO_REMOTE_HOST "%{INFO_REMOTE_HOST}e"
RequestHeader set INFO_REMOTE_IDENT "%{INFO_REMOTE_IDENT}e"
RequestHeader set INFO_REMOTE_PORT "%{INFO_REMOTE_PORT}e"
RequestHeader set INFO_REMOTE_USER "%{INFO_REMOTE_USER}e"
RequestHeader set INFO_REQUEST_FILENAME "%{INFO_REQUEST_FILENAME}e"
RequestHeader set INFO_REQUEST_METHOD "%{INFO_REQUEST_METHOD}e"
RequestHeader set INFO_REQUEST_TIME "%{INFO_REQUEST_TIME}e"
RequestHeader set INFO_REQUEST_URI "%{INFO_REQUEST_URI}e"
RequestHeader set INFO_SCRIPT_FILENAME "%{INFO_SCRIPT_FILENAME}e"
RequestHeader set INFO_SCRIPT_GROUP "%{INFO_SCRIPT_GROUP}e"
RequestHeader set INFO_SCRIPT_NAME "%{INFO_SCRIPT_NAME}e"
RequestHeader set INFO_SCRIPT_URI "%{INFO_SCRIPT_URI}e"
RequestHeader set INFO_SCRIPT_URL "%{INFO_SCRIPT_URL}e"
RequestHeader set INFO_SCRIPT_USER "%{INFO_SCRIPT_USER}e"
RequestHeader set INFO_SERVER_ADDR "%{INFO_SERVER_ADDR}e"
RequestHeader set INFO_SERVER_ADMIN "%{INFO_SERVER_ADMIN}e"
RequestHeader set INFO_SERVER_NAME "%{INFO_SERVER_NAME}e"
RequestHeader set INFO_SERVER_PORT "%{INFO_SERVER_PORT}e"
RequestHeader set INFO_SERVER_PROTOCOL "%{INFO_SERVER_PROTOCOL}e"
RequestHeader set INFO_SERVER_SIGNATURE "%{INFO_SERVER_SIGNATURE}e"
RequestHeader set INFO_SERVER_SOFTWARE "%{INFO_SERVER_SOFTWARE}e"
RequestHeader set INFO_THE_REQUEST "%{INFO_THE_REQUEST}e"
RequestHeader set INFO_TIME "%{INFO_TIME}e"
RequestHeader set INFO_TIME_DAY "%{INFO_TIME_DAY}e"
RequestHeader set INFO_TIME_HOUR "%{INFO_TIME_HOUR}e"
RequestHeader set INFO_TIME_MIN "%{INFO_TIME_MIN}e"
RequestHeader set INFO_TIME_MON "%{INFO_TIME_MON}e"
RequestHeader set INFO_TIME_SEC "%{INFO_TIME_SEC}e"
RequestHeader set INFO_TIME_WDAY "%{INFO_TIME_WDAY}e"
RequestHeader set INFO_TIME_YEAR "%{INFO_TIME_YEAR}e"
RequestHeader set INFO_TZ "%{INFO_TZ}e"
RequestHeader set INFO_UNIQUE_ID "%{INFO_UNIQUE_ID}e"

Mod_Rewrite Variables Decoded!

[API_VERSION] => 20020903:12
[AUTH_TYPE] => Digest
[DOCUMENT_ROOT] => /home/user/www_root/askapache.com
[HTTPS] => off
[HTTP_ACCEPT] => text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
[HTTP_COOKIE] => PHPSESSID=752ee6d56e15f305233e30045987e5ce568c034; __qca=1176541225-59967328-5223185;
[HTTP_HOST] => www.askapache.com
[HTTP_REFERER] => http://www.askapache.com/protest/index.php?askapache=awesomeness&you=rock
[HTTP_USER_AGENT] => Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16
[IS_SUBREQ] => false
[QUERY_STRING] => e=404
[REMOTE_ADDR] => 22.162.144.211
[REMOTE_HOST] => 22.162.144.211
[REMOTE_PORT] => 4511
[REMOTE_USER] => administrator
[REQUEST_FILENAME] => /home/user/www_root/askapache.com/protest/index.php
[REQUEST_METHOD] => GET
[REQUEST_URI] => /protest/index.php
[SCRIPT_FILENAME] => /home/user/www_root/askapache.com/protest/index.php
[SCRIPT_GROUP] => daemonu
[SCRIPT_USER] => askapache
[SERVER_ADDR] => 208.113.134.190
[SERVER_ADMIN] => webmaster@askapache.com
[SERVER_NAME] => www.askapache.com
[SERVER_PORT] => 80
[SERVER_PROTOCOL] => HTTP/1.1
[SERVER_SOFTWARE] => Apache/2.0.61 (Unix) PHP/4.4.7 mod_ssl/2.0.61 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2
[THE_REQUEST] => GET /protest/adf HTTP/1.1
[TIME] => 20080820014309
[TIME_DAY] => 20
[TIME_HOUR] => 01
[TIME_MIN] => 43
[TIME_MON] => 08
[TIME_SEC] => 09
[TIME_WDAY] => 3
[TIME_YEAR] => 2008

Request using HTTPS

[API_VERSION] => 20020903:12
[AUTH_TYPE] => Digest
[DOCUMENT_ROOT] => /home/user/www_root/askapache.com
[HTTPS] => on
[HTTP_ACCEPT] => text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
[HTTP_COOKIE] => PHPSESSID=752ee6d56e15f305233e30045987e5ce568c034; __qca=1176541225-59967328-5223185;
[HTTP_HOST] => www.askapache.com
[HTTP_REFERER] => http://www.askapache.com/protest/index.php?askapache=awesomeness&you=rock
[HTTP_USER_AGENT] => Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16
[IS_SUBREQ] => false
[QUERY_STRING] => hi=you&whats=&amp;you
[REMOTE_ADDR] => 22.162.144.211
[REMOTE_HOST] => 22.162.144.211
[REMOTE_PORT] => 4605
[REMOTE_USER] => administrator
[REQUEST_FILENAME] => /home/user/www_root/askapache.com/protest/index.php
[REQUEST_METHOD] => GET
[REQUEST_URI] => /protest/index.php
[SCRIPT_FILENAME] => /home/user/www_root/askapache.com/protest/index.php
[SCRIPT_GROUP] => daemonu
[SCRIPT_USER] => askapache
[SERVER_ADDR] => 208.113.134.190
[SERVER_ADMIN] => webmaster@askapache.com
[SERVER_NAME] => www.askapache.com
[SERVER_PORT] => 443
[SERVER_PROTOCOL] => HTTP/1.1
[SERVER_SOFTWARE] => Apache/2.0.61 (Unix) PHP/4.4.7 mod_ssl/2.0.61 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2
[THE_REQUEST] => GET /protest/index.php?hi=you&whats=&amp;you HTTP/1.1
[TIME] => 20080820015016
[TIME_DAY] => 20
[TIME_HOUR] => 01
[TIME_MIN] => 50
[TIME_MON] => 08
[TIME_SEC] => 16
[TIME_WDAY] => 3
[TIME_YEAR] => 2008

Emulating ErrorDocuments with Mod_Rewrite

The ErrorDocument directive is helpful because an errordocument is called differently then a normal file, and it contains special variables to help an admin debug.

I’ve wanted to use a RewriteCond + a RewriteRule to cause an Apache ErrorDocument to be displayed for a long time… I finally figured it out. Simply use the HTTP STATUS CODE trick in combination with a simple RewriteRule to trigger an Apache ErrorDocument.

This code emulates the internal 404 process Apache goes through.. If the file is not found it requests the /test/trigger-error/404 internally which triggers the 404 ErrorDocument.

source: Crazy Advanced Mod_Rewrite

Running CherryPy behind Apache using mod_rewrite

Here are some myths about running CherryPy behind mod_rewrite:

Myth 1: using mod_rewrite will make my site slower

If you’re talking about raw HTTP speed then yes, using mod_rewrite does add a little bit of overhead. On my current laptop, a benchmark of CherryPy exposed gave 460 requests/second (2.2ms/req), and a benchmark of CherryPy running behind Apache with mod_rewrite gave 320 requests/second (3.1ms/req). This means that mod_rewrite adds 0.9ms per request… But for a typical web app, a page will take at least several tens or hundreds of milliseconds to build. So you can see that these extra 0.9ms won’t really matter much!

Also, keep in mind that Apache will serve static files directly, which will be faster than serving them from CherryPy.

Myth 2: I will lose some data about the client if I use mod_rewrite

When using mod_rewrite, requests to CherryPy will look like they’re coming from the local Apache server (the “Host” header will be “localhost:port” and the client IP address will be 127.0.0.1). However, if you use Apache2, it will pass to you the original “Host” header in the “X-Forwarded-Host” header. Also, it will pass to you the IP address of the remote client in the “X-Forwarded-For” header. So you still have access to all the data about the original request.

Configuring Apache

Let’s assume that CP application is listening on port 8000. The thing I did was add to the apache’s config file (usually /etc/apache/httpd.conf or /etc/httpd/conf/httpd.conf) the following lines (mod_rewrite works as well with .htaccess if you cannot edit your httpd.conf) :

RewriteEngine on
RewriteRule ^(.*) http://127.0.0.1:8000$1 [proxy]

In the proper !VirtualHost directive. Be careful with Directory directives because Apache will strip the directory prefix for pattern matching and not add it back. So the above configuration would result in Apache trying to proxy http://127.0.0.1:8000pagee instead of http://127.0.0.1:8000/page. You would remedy this situation by adding a ‘/’ or whatever prefix you need into the rewrite rule. For example:

RewriteEngine on
RewriteRule ^(.*) http://127.0.0.1:8000/$1 [proxy]

If you want to configure Apache to serve all your static files directly (and thus free CherryPy from this task), use the a configuration like this:

RewriteEngine on
RewriteRule ^/static/(.*) /home/user/files/static/$1 [last]
RewriteRule ^(.*) http://127.0.0.1:8000$1 [proxy]

If you don’t want to (or cannot) use Apache’s Virtual Hosts, just add one line after !RewriteEngine. For example, you want to map the requests to the www.example.info host to your !CherryPy, so you get:

RewriteEngine on
RewriteCond %{HTTP_HOST} www\.example\.info
RewriteRule ^(.*) http://127.0.0.1:8000$1 [proxy]

If your application is not running and a user tries to access it, Apache will give him 502 Proxy Error. So, there’s an easy way to start the application then: just add the !ErrorDocument directive that runs the CGI script starting your application and redirecting to it. You will also need to disable the mod_rewrite for that script (otherwise apache would try to get the CGI script from your CP application, and get another 502 error). So, I added 2 more lines to my configuration, and it now looks like this:

RewriteEngine on
RewriteCond  %{SCRIPT_FILENAME} !autostart\.cgi$
RewriteCond %{HTTP_HOST} www\.example\.info
RewriteRule ^(.*) http://127.0.0.1:8000/$1 [proxy]
ErrorDocument 502 /cgi-bin/autostart.cgi

The autostart.cgi file is a 5-line python script:

#!python
#!/usr/local/bin/python
print "Content-type: text/html\r\n"
print """Restarting site ..."""
import os
os.setpgid(os.getpid(), 0)
os.system('/usr/local/bin/python2.4 webserver.py &')

If you get "Forbidden - You don't have permission to access / on this server" errors, try enabling the proxy module.

Note: The os.setpgid(os.getpid(), 0) line seems to prevent Apache from killing the CP process after a period of inactivity (many thanks to Matt Lewis for this trick).

Beware the encoding bug

URL’s that are requested via HTTP must be escaped (%xx-encoded) before they are sent, but Apache2’s mod_rewrite unescapes path information which may generate invalid HTTP requests. In particular, spaces (which should be escaped as “%20“) are not. If CherryPy recieves a request with a raw space character in the URL, it chokes, because spaces are used to delimit the three parts of a request line (like “GET /path/to%20my/page HTTP/1.1“). A workaround to this is to add the following to your apache configuration:

# this cannot be on .htaccess (only on httpd.conf)
RewriteMap escape int:escape 

#and when writing RewriteRule:
RewriteRule ^(.*)$ http://localhost:6674/${escape:$1} [proxy]
#(i.e., use ${escape:$1} instead of $1)

AFAIK, this is a bug on mod_rewrite/apache since I’ve researched HTTP/1.1 and URI RFC’s and they all state that there must be only 2 spaces on the HTTP request line, i.e., CherryPy is parsing the request line correctly and Apache is sending invalid HTTP requests. Either way, I think this workaround will help people using CherryPy under apache’s modrewrite. I’ve only tested this on Apache2, I don’t know if RewriteMap int:escape exists on older versions of mod_rewrite. But the Apache people seem to be aware of this bug

Learn More at CherryPy