Fun With Apache .htaccess

Discussion in 'Web Development' started by pradeep, Jan 23, 2007.

  1. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    The Apache web server has a number of configuration options that are available to the server administrator. In a shared hosting environment, you don't have access to the main Apache configuration so you're stuck with the default configuration. However, it is possible to override some of the default settings by creating (or editing) a file named .htaccess.

    The .htaccess is a simple ASCII text file placed in your www directory or in a subdirectory of your www directory. You can create or edit this file in any text editor (such as NotePad) and then upload it to the directory for which you want to modify the settings. Be sure that the file is uploaded in ASCII (not BINARY) mode, and be sure that the file permissions for the file are set to 644 (rw-r--r--).

    Commands in the .htaccess file affect the directory that it's placed in and all subdirectories. If you place the .htaccess file in your root directory, it will affect your entire web site. If you place it in a subdirectory of your www directory, it will affect only that directory plus and subdirectories of that directory.

    Most .htaccess commands are designed to be placed on one line. If your text editor wraps lines automatically, you should disable that function before saving and uploading your file. Also, note that .htaccess commands are case-sensitive.

    Here are some of the things you can do with .htaccess file:

    Customize Error Messages



    If you want to override the server's error pages, you can use .htaccess to define your own messages. This capability is discussed in the Custom Error Messages section of the manual. An example of the syntax is:

    Code:
    ErrorDocument 500 /server_error.html 

    Override SSI Settings



    By default, only pages ending in the .shtml extension will parse server-side includes (SSI) on our servers. You can override this restriction in your .htaccess file:

    If you want to override the default server configuration so that SSI will work with .html documents, you can create a file named .htaccess and upload it (in ASCII mode) to your main www directory. Add the following lines to your .htaccess file:

    Code:
    AddType text/html .html
    AddHandler server-parsed .html
    If you want both .html and .htm documents to parse SSI, create your .htaccess file with these lines:

    Code:
    AddType text/html .html
    AddHandler server-parsed .html
    AddHandler server-parsed .htm

    Change Your Default Home Page



    In order to browse your site by specifying the domain name only (e.g., http://www.go4expert.com) instead of having to specify an exact page filename (e.g., http://www.go4expert.com/filename.html), you must have an index page in your www directory. Default acceptable file names for index pages include index.htm,index.html,index.cgi,index.shtml, index.php, etc. Note that they're all named index.*.

    There is also a default order of precedence for these names. So if you have both a file named index.cgi and a file named index.html in your directory, the server will display index.cgi because that name takes a higher precedence than index.html.

    Using .htaccess, you can define additional index filenames and/or change the order of precedence. To define your index page as hieronymous.html add the following line to your .htaccess file:

    Code:
    DirectoryIndex myIndex.html
    This will cause the server to look for a file named hieronymous.html. If it finds that file, it will display it. If it does not find that file, it will return a 404 Missing Page error.

    To change the order of precedence, enter a DirectoryIndex command with multiple file names on the same line. The order in which the file names are listed (from left to right) determines the order of precedence. For example,

    Code:
    DirectoryIndex myIndex.html index.cgi index.php index.html

    Enable Directory Browsing



    Due to security concerns we have removed the default setting that allowed directory indexing. This is the option that allows the contents of a directory to be displayed in the browser when the directory does not contain an index page.

    For example, if you make an http call to a directory such as http://go4expert.com/images/, it would list all the images in that directory without the need for an html page with links.

    If you require this option on specific directories it is still available. You can reactivate it by adding the following line to your .htaccess file:

    Code:
    Options +Indexes 
    Once this is added, the directory will fully index again.

    Block Users from Accessing Your Web Site



    If you want to deny access to a particular individual, and you know the IP address or domain name that the individual uses to connect to the Internet, you can use .htaccess to block that individual from your web site.

    Code:
    <Limit GET>
    order deny,allow
    deny from 123.236.143.000
    deny from 456.78.90.
    deny from .aol.com
    allow from all
    </Limit> 
    In the example above, a user from the exact IP address 123.156.189.0 would be blocked; all users within a range of IP numbers from 156.78.90.0 to 156.78.90.255 would be blocked; and all users connecting from America Online (aol.com) would be blocked. When they attempted to browse your web site, they would be presented with the 403 Forbidden error.

    Redirect Visitors to a New Page or Directory



    Let's say you re-do your entire web site, renaming pages and directories. Visitors to the old pages will receive the 404 File Not Found error. You can solve this problem by redirecting calls to an old page to the new page. For example, if your old page was named oldpage.html and that page has been replaced by newpage.html, add this line to your .htaccess file:

    Code:
    Redirect permanent /oldpage.html http://www.go4expert.com/newpage.html
    Of course, you want to replace go4expert.com with your actual domain name. Now, when the visitor types in http://www.go4expert.com/myoldpage.html, they will be automatically redirected to http://www.go4expert.com/mynewpage.html.

    If you've renamed a directory, you can use one redirect line to affect all pages within the directory:

    Code:
    Redirect permanent /olddirectory http://www.go4expert.com/newdirectory/
    Note that the old page or directory is specified using the system path relative to your www directory, while the new page or directory is specified by the absolute URL.

    Prevent Hot Linking and Bandwidth Leeching



    What if another web site owner is stealing your images and your bandwidth by linking directly to your image files from his/her web site? You can prevent this by adding this to your .htaccess file:

    Code:
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^$
    RewriteCond %{HTTP_REFERER} !^http://(www\.)?go4expert.com/.*$ [NC]
    RewriteRule \.(gif|jpg|jpeg|png)$ - [F]
    Replace go4expert.com with your actual domain name. With this code in place, your images will only display when the visitor is browsing http://go4expert.com. Images linked from other domains will appear as broken images.

    If you're feeling particularly nasty, you can even provide an alternative image to display on the hot linked pages -- for example, an image that says "Stealing is Bad ... visit http://go4expert.com to see the real picture that belongs here." Use this code to accomplish that:

    Code:
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^$
    RewriteCond %{HTTP_REFERER} !^http://(www\.)?go4expert.com/.*$ [NC]
    RewriteRule \.(gif|jpg|jpeg|png)$ http://www.go4expert.com/dontsteal.gif [R,L]
    This time, replace go4expert.com with your domain name, and replace dontsteal.gif with the file name of the image you've created to discourage hot linking.

    Prevent viewing of .htaccess or other files



    To prevent visitors from seeing the contents of your .htaccess file, place the following code in the file:
    Code:
    <Files .htaccess>
    order allow,deny
    deny from all
    </Files>
    If you want to prevent visitors from seeing another file, just substitute that file's name for .htaccess in the Files specification.

    Stopping the Email Collectors



    While you positively want to encourage robot visitors from the search engines, there are other less benevolent robots you would prefer stayed away. Chief among these are those nasty 'bots that crawl around the web sucking email addresses from web pages and adding them to spam mail lists.

    Code:
    RewriteCond %{HTTP_USER_AGENT} Wget [OR]
    RewriteCond %{HTTP_USER_AGENT} CherryPickerSE [OR]
    RewriteCond %{HTTP_USER_AGENT} CherryPickerElite [OR]
    RewriteCond %{HTTP_USER_AGENT} EmailCollector [OR]
    RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
    RewriteCond %{HTTP_USER_AGENT} EmailWolf [OR]
    RewriteCond %{HTTP_USER_AGENT} ExtractorPro RewriteRule ^.*$ X.html [L]
    Note that at the end of each line for a named robot there appears an '[OR]' - don't forget to include that if you add any others to this list.

    This is by no means foolproof. Many of these sniffers do not identify themselves and it is almost impossible to create an exhaustive list of those that do. It's worth a try though if it even keeps some away. The above is as many as I could find.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice