Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Web Development (http://www.go4expert.com/articles/web-development/)
-   -   Fun With Apache .htaccess (http://www.go4expert.com/articles/fun-apache-htaccess-t2672/)

Janu 23Jan2007 21:56

Fun With Apache .htaccess
 
The Apache web server has a number of configuration options that are available to the server administrator. In a shared hosting environment, you don't have access to the main Apache configuration so you're stuck with the default configuration. However, it is possible to override some of the default settings by creating (or editing) a file named .htaccess.

The .htaccess is a simple ASCII text file placed in your www directory or in a subdirectory of your www directory. You can create or edit this file in any text editor (such as NotePad) and then upload it to the directory for which you want to modify the settings. Be sure that the file is uploaded in ASCII (not BINARY) mode, and be sure that the file permissions for the file are set to 644 (rw-r--r--).

Commands in the .htaccess file affect the directory that it's placed in and all subdirectories. If you place the .htaccess file in your root directory, it will affect your entire web site. If you place it in a subdirectory of your www directory, it will affect only that directory plus and subdirectories of that directory.

Most .htaccess commands are designed to be placed on one line. If your text editor wraps lines automatically, you should disable that function before saving and uploading your file. Also, note that .htaccess commands are case-sensitive.

Here are some of the things you can do with .htaccess file:

Customize Error Messages



If you want to override the server's error pages, you can use .htaccess to define your own messages. This capability is discussed in the Custom Error Messages section of the manual. An example of the syntax is:

Code:

ErrorDocument 500 /server_error.html

Override SSI Settings



By default, only pages ending in the .shtml extension will parse server-side includes (SSI) on our servers. You can override this restriction in your .htaccess file:

If you want to override the default server configuration so that SSI will work with .html documents, you can create a file named .htaccess and upload it (in ASCII mode) to your main www directory. Add the following lines to your .htaccess file:

Code:

AddType text/html .html
AddHandler server-parsed .html

If you want both .html and .htm documents to parse SSI, create your .htaccess file with these lines:

Code:

AddType text/html .html
AddHandler server-parsed .html
AddHandler server-parsed .htm

Change Your Default Home Page



In order to browse your site by specifying the domain name only (e.g., http://www.go4expert.com) instead of having to specify an exact page filename (e.g., http://www.go4expert.com/filename.html), you must have an index page in your www directory. Default acceptable file names for index pages include index.htm,index.html,index.cgi,index.shtml, index.php, etc. Note that they're all named index.*.

There is also a default order of precedence for these names. So if you have both a file named index.cgi and a file named index.html in your directory, the server will display index.cgi because that name takes a higher precedence than index.html.

Using .htaccess, you can define additional index filenames and/or change the order of precedence. To define your index page as hieronymous.html add the following line to your .htaccess file:

Code:

DirectoryIndex myIndex.html
This will cause the server to look for a file named hieronymous.html. If it finds that file, it will display it. If it does not find that file, it will return a 404 Missing Page error.

To change the order of precedence, enter a DirectoryIndex command with multiple file names on the same line. The order in which the file names are listed (from left to right) determines the order of precedence. For example,

Code:

DirectoryIndex myIndex.html index.cgi index.php index.html

Enable Directory Browsing



Due to security concerns we have removed the default setting that allowed directory indexing. This is the option that allows the contents of a directory to be displayed in the browser when the directory does not contain an index page.

For example, if you make an http call to a directory such as http://go4expert.com/images/, it would list all the images in that directory without the need for an html page with links.

If you require this option on specific directories it is still available. You can reactivate it by adding the following line to your .htaccess file:

Code:

Options +Indexes
Once this is added, the directory will fully index again.

Block Users from Accessing Your Web Site



If you want to deny access to a particular individual, and you know the IP address or domain name that the individual uses to connect to the Internet, you can use .htaccess to block that individual from your web site.

Code:

<Limit GET>
order deny,allow
deny from 123.236.143.000
deny from 456.78.90.
deny from .aol.com
allow from all
</Limit>

In the example above, a user from the exact IP address 123.156.189.0 would be blocked; all users within a range of IP numbers from 156.78.90.0 to 156.78.90.255 would be blocked; and all users connecting from America Online (aol.com) would be blocked. When they attempted to browse your web site, they would be presented with the 403 Forbidden error.

Redirect Visitors to a New Page or Directory



Let's say you re-do your entire web site, renaming pages and directories. Visitors to the old pages will receive the 404 File Not Found error. You can solve this problem by redirecting calls to an old page to the new page. For example, if your old page was named oldpage.html and that page has been replaced by newpage.html, add this line to your .htaccess file:

Code:

Redirect permanent /oldpage.html http://www.go4expert.com/newpage.html
Of course, you want to replace go4expert.com with your actual domain name. Now, when the visitor types in http://www.go4expert.com/myoldpage.html, they will be automatically redirected to http://www.go4expert.com/mynewpage.html.

If you've renamed a directory, you can use one redirect line to affect all pages within the directory:

Code:

Redirect permanent /olddirectory http://www.go4expert.com/newdirectory/
Note that the old page or directory is specified using the system path relative to your www directory, while the new page or directory is specified by the absolute URL.

Prevent Hot Linking and Bandwidth Leeching



What if another web site owner is stealing your images and your bandwidth by linking directly to your image files from his/her web site? You can prevent this by adding this to your .htaccess file:

Code:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?go4expert.com/.*$ [NC]
RewriteRule \.(gif|jpg|jpeg|png)$ - [F]

Replace go4expert.com with your actual domain name. With this code in place, your images will only display when the visitor is browsing http://go4expert.com. Images linked from other domains will appear as broken images.

If you're feeling particularly nasty, you can even provide an alternative image to display on the hot linked pages -- for example, an image that says "Stealing is Bad ... visit http://go4expert.com to see the real picture that belongs here." Use this code to accomplish that:

Code:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?go4expert.com/.*$ [NC]
RewriteRule \.(gif|jpg|jpeg|png)$ http://www.go4expert.com/dontsteal.gif [R,L]

This time, replace go4expert.com with your domain name, and replace dontsteal.gif with the file name of the image you've created to discourage hot linking.

Prevent viewing of .htaccess or other files



To prevent visitors from seeing the contents of your .htaccess file, place the following code in the file:
Code:

<Files .htaccess>
order allow,deny
deny from all
</Files>

If you want to prevent visitors from seeing another file, just substitute that file's name for .htaccess in the Files specification.

Stopping the Email Collectors



While you positively want to encourage robot visitors from the search engines, there are other less benevolent robots you would prefer stayed away. Chief among these are those nasty 'bots that crawl around the web sucking email addresses from web pages and adding them to spam mail lists.

Code:

RewriteCond %{HTTP_USER_AGENT} Wget [OR]
RewriteCond %{HTTP_USER_AGENT} CherryPickerSE [OR]
RewriteCond %{HTTP_USER_AGENT} CherryPickerElite [OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro RewriteRule ^.*$ X.html [L]

Note that at the end of each line for a named robot there appears an '[OR]' - don't forget to include that if you add any others to this list.

This is by no means foolproof. Many of these sniffers do not identify themselves and it is almost impossible to create an exhaustive list of those that do. It's worth a try though if it even keeps some away. The above is as many as I could find.


All times are GMT +5.5. The time now is 15:48.