Switching robots.txt between HTTP and HTTPS mode
Published November 28th, 2008Problem Statement
Google penalizes Web sites that have duplicate contents. So when your Web site has both HTTP and HTTPS (SSL) mode and they both point to the same contents, you have a good chance of being downgraded for duplicate contents, says an SEO expert that one of our customers love to swear by. So in addition to making sure that access to https://server/pages are automatically redirected with a HTTP 301 (Moved Permanently), we thought making the robots.txt contents different for HTTP and HTTPS would also help. Here is how we switched robots.txt between HTTP and HTTPS access.
Step 1: Create a mod_rewrite rule for /robots.txt
Even though we dislike using mod_rewrite due its performance issues, we use it on customer projects from time to time where performance is not too critical. We added the following rule in the virtual host configuration file for the server, which can be also added in .htaccess file:
RewriteEngine on
# Rule: When robots.txt is requested in HTTPS (port 443) mode, send robots_ssl.txt instead
RewriteCond %{SERVER_PORT} =443
RewriteRule ^robots\.txt$ robots_ssl.txt [L]
Step 2: Create a HTTPS version of robots.txt
The HTTPS version of robots.txt is a separate file called robots_ssl.txt, which should have the following contents:
User-agent: Googlebot Disallow: /
This file tells Google’s Web site crawler to not index anything in the site in HTTPS mode. Of course, if you wish to advice all crawlers the same thing, than change it to be:
User-agent: * Disallow: /
Step 3: Manually test the changes
Now to test the setup access the site as follows:
- Point your web browser to http://server/robots.txt and see if you get the original robots.txt contents shown on the browser
- Point your web browser to https://server/robots.txt and see if you get the original robots_ssl.txt contents shown on the browser
If the above tests show appropriate contents, you are done. If not, you might not have setup the mod_rewrite rule correctly; check the rule again. Also, make sure you *have* mod_rewrite enabled in your Web server configuration. Of course, you should also check if it is installed as well.
If you are like us and build mod_rewrite as part of the core Apache httpd process, you can test if it is installed by running:
$ /path/to/bin/httpd -l | grep mod_rewrite
Example:
$ /home/apache/bin/httpd -l | grep mod_rewrite
This shows mod_rewrite.c module as part of the httpd binary, which is exactly what needs to there for mod_rewrite to work.
nicelikelove on December 29, 2008
ip 213.160.112.83 user administrator forgot password