Beginner's Guide to Apache Mod Rewrite Rules

Discussion in 'PHP' started by ManzZup, Sep 21, 2011.

  1. ManzZup

    ManzZup New Member

    Joined:
    May 9, 2009
    Messages:
    278
    Likes Received:
    43
    Trophy Points:
    0
    Occupation:
    Production Manager:Software @ ZONTEK
    Location:
    Sri Lanka
    Home Page:
    http://zontek.zzl.org
    This time I took a bit of a change from the usual subjects I write, not a big change though. The reason for writing this set of articles (More to come and there's another beginners one as well) is because I felt that there isn't a proper or complete guide written for the pure beginners to start with mod rewrite. I myself had great trouble in understanding + searching through forums to understand certain facts. Anyway I hope many other would be saved of countless search entries by this article :D

    Introduction



    Couple of things to note before proceeding :
    1. If you are a real beginner and have no idea about mod rewrite and got this page after some Google search, then I recommend to read the first article of the set on my blog to get a better understanding of all related stuff.
    2. No one can master writing rules, it is all about expressions and what you get is what you get through experience. So this article is not n00b->pr0 conversion article but something full for you to get started with. I have referred to various forum, countless articles and blogs before to get the understanding about the subject [that was when I was on the blog moB project]. And I want to make this a COMPLETE ALL IN ONE GUIDE :D
    Let us start

    RewriteRule



    First we need to know what actually you get in a .htaccess file. Example file is below

    http://myoldsite.com/.htaccess (you can't access it through a URI though)

    Code:
    RewriteEngine On
    RewriteRule ^(.*) http://mysite.com/$1
    
    That's what will likely be there inside, but of curse it doesn't mean anything to you ATM
    But it WILL [soon :D]
    1. The first line is pretty self explanatory, it turns on the RewriteEngine, and it is needed to turn on the runtime engine.
    2. The second line is what you call the HERO of the story, the command lie here.
    3. It defines rules for the behaviour of the engine (ex: this line would redirect every browser going to myoldsite.com to mysite)
    The structure of a rule is:
    Code:
    RewriteRule Pattern Substitution [OptionalFlags]
    
    Now we'll split it up and analyse part-by-part

    Portion 1 : RewriteRule

    That is mandatory for defining a rule, basically it should be there for every rule [pretty stupid even to explain it :D]

    Portion 2 : Pattern

    Pattern is a Regular Expression that defines which kind of URL should be processed by the engine. I other term, all URLs send to the server having the defined Pattern would be rewritten/proceed or changed.

    Writing the Pattern is a bit tricky, so let's get to that part separately.

    Portion 3 : Substitution

    Plainly this tells what to be placed instead of the requested String. Or this is the replacement for any found match as defined above.

    Portion 4 : Optional Flags

    As the name says they are optional but it is good to know them. These put at the end of each Rule (within square bracket [] ) would affect the way script works with each Rule.
    The most common cases are as follows:
    • [F] - Forbidden. Restricting user access so he sees an "Error 404".
    • [L] - Last Rule. If this rules matches, non other would be processed down the list.
    • [R] - Visible Redirection. User sees the change of the URL, well this is not much preferred
      • [R=301 or 302] can be used
    • [G] - Force to go 401
    • [N] - Next round. Rerun the rules again from the start
    • [C] - Chains a rewrite rule together with the next rule.
    • [T] - use T=MIME-type to force the file to be a mime type
    • [NS] - Use if no sub request is requested
    • [NC] - Makes the rule case INsensitive
    • [QSA] - Query String Append use to add to an existing query string
    • [NE] - Turns of normal escapes that are default in the rewrite rule
    • [PT] - Pass through to the handler (together with mod alias)
    • - Skip the next rule S=3 skips the next 3 rules
      [*][E] - E=var sets an environmental variable that can be called by other rules

    Remember that you can 2 or more together in one condition by separating them via ' , ' ex: [L,R]

    So, now you know about the structure, let's write the "Hallow mod rewrite" first so that we can move further through examples [or you would sure to kill me for keep you waiting :D]

    Writing a .htaccess file


    1. First we need a case to write a htaccess file, Create two html files - come-here.html : add any content you like and now-here.html : add something like "Hallow mod rewrite? :D"
    2. Now put them in the root of your server [I am assuming that you have the required knowledge with servers and stuff]
    3. Open up your FAVORITE text editor and shoot this up. [case sensitive]
      Code:
      RewriteEngine On
      RewriteRule ^come-here.html$ now-here.html
      
    4. save it as .htaccess [if your editor donor allow you to save without a file name, put a file name and manually remove it]
    5. Add this with the other two files the root/home directory. And navigate to the come-here.html
      ex: http://mysite.com/come-here.html
      YEAH you will see the contents of the "now-here.html" file, but the URL is the same. VIOLA!@!? Not yet :D
    Now you ask me, WTF does this code means, hell I was trying to tell you when you wanted to code first. :) so let's learn the patterns.

    Writing mod rewrite Patterns



    Explaining common pattern characters
    • \ - Escape character. Like in real world programming languages, this escapes the functionality of normal regular expression characters like $ ^
      Code:
      \$ means the character $
      \^ means the character ^  
    • ^ - Starts with. This defines how to the expression starts
      Code:
      ^man : EVERYTHING starting with word "man", so that a URLs like 
      http://mysite.com/man   http://mysite.com/maned.html would qualify
    • $ - End with. This defines the ending of the expression
      Code:
      ^man$ : ONLY the EXACT match of "man" would qualify
      http://mysite.com/man would qualify but
      http://mysite.com/maned.html would NOT   
    • . - Any character. This says "match any ONE character that is there"
      Code:
      ^man.   : http://mysite.com/assed qualifies
      http://mysite.com/man DOESNOT qualifies as there should be at least one character after "man"
      ^man.$ : http://mysite.com/mane would qualify, but assed would not as the $ is used, only ONE character can
      be there after "man"
    • [ ] - Starts a segment / Class. Inside a Class you can put up a wide expression.
      Code:
      [man]$ : Search for the exact word "man" with cases sensitivity, http://mysite.com/man
      [^man]$ : Here the ^ character is behaving to give the action NOT. Means this says NOT "man" or
      everything except man. http://mysite.com/man would only returns an error
      [a-z]$  : Any word with lowercase letters
      [A-z]$ : Any word with lowercase + uppercase letters
      [A-z0-9]$ : YUP you guessed right, Any word with lowercase + uppercase letters + numbers from 0-9 
    • ( ) - Starts a Back Reference Point. This would be needed to get the matched pattern from the engine. You can get the matched string by using $1 in the substitution [will be explained later]
    • | - match this OR that
      Code:
      [aa]$|[bb]$ : http://mysite.com/aa OR http://mysite.com/bb would ONLY qualify
      [NOTE don't add spaces between expressions and | ]
    • ? - Match the expression 0 or 1 more times in a String.
      Code:
      ^[man]?$ : http://mysite.com/man
    • + - Match At Least 1 or More times
      Code:
      ^[man]+$ : http://mysite.com/manmanman
    • { } - Match to a given number of time
      Code:
      ^[man]{0,2}$ : Match the word "man" up to 3 times (don't ask me why 3 times :D)
    • * - Match indefinite (~) times
      Code:
      ^[man]*$ : http://mysite.com/manmanmanmanman
      But in all cases any other character/word interruption would break the search.
    • ! - Says NOT, put it in front of any expression to null it
    • < > = - Comparing stuff
    • -d - Directory
    • -f - File
    Now say !!VIOLA/2!! :D

    Why divided by 2? because there another half and a ton waiting to be explained. However they will be there while we move onto some examples. The practical stuff.

    I won't be going onto the "Writing the Rules steps again, only the Rule itself would be discussed.

    Example 1:

    Our previous code
    Code:
    RewriteRule ^come-here.html$ now-here.html
    
    It's your turn, guess it first! [obviously you aren't that dumb not to understand :D], in plain English it says
    "redirect any URL having the exact phrase 'come-here.html' to now-here.html". Pretty cool hah?

    Example 2:

    Problem : All traffic going to http://mysite.com/forums redirected as it is to http://myforum.com
    Understanding the situation:
    1. http://mysite.com/forum/view.php -> http://myforum.com/view.php
    2. http://mysite.com/forum/show.php?1223 -> http://myforum.com/show.php?1223
    Solution:

    The best way is to make a directory called "forum" in mysite and put a .htaccess file in it to redirect all the traffic to the new host.

    Note that you can put the .htaccess in the directory to have customized actions

    Rule:
    Code:
    RewriteRule (.*) http://myforum.com/$1
    
    NOTE: $1
    This is the Back referencing, the parenthesis allows the engine to remember the matched String and it is used by using $1 .

    There are many more real world problems, I will post the reference links at the end if you are curious about reading them.

    NEXT PART :::--

    Now the big star RewriteRule is introduced, here comes the sidekick RewriteCond
    Plainly it is the "if" statement of mod rewrite. Let's see when we will need such a thing.

    -->Back to Example 2

    So all the traffic now goes to http://myforum.com/ the big admin is really happy, but only until he found out that his [my ;)] articles have some links to Smiley images linked to the forum images. This is a good break point for you to understand that mod rewrite won't be working for hot linking, it would just get an ugly error, so now the whole thing screwed up and he needs to put up the script which says:

    "redirect all traffic except the links to the 'images' directory "

    and you say GOTO HELL this is not a freaking language and you are kicked out.
    NO, you call up a pretty little RewriteCond

    RewriteCond



    Structure :
    Code:
    RewriteCond TestString CondPattern
    
    TestString : A set of special Strings to check for a certain exception
    CondPattern : The pattern to be searched

    See this example:
    Code:
    RewriteCond %{REQUEST_URI} !^/images/
    RewriteRule (.*) http://myforum.com/$1
    
    Saw the addition? the new condition will say "if not the Requested URI contains /images/ process with search" if it is there, do nothing.

    The RewriteCond is something you would have to search for more as many have no need of it in there cases, but it is REALLY IMPORTANT TO KNOW THEM. Here's the list you can use. For more details refer to the official Apache documentation.

    • Http Headers
      • HTTP_USER_AGENT - The client's browser by the userAgent name. (MSIE/Firefox.Chrome)
      • HTTP_REFERER - The referring URL/URI/IRI/whatever of the referring site.
      • HTTP_COOKIE - Cookie
      • HTTP_FORWARDED - Not for beginner
      • HTTP_HOST - The Host obtained from the request of the client (http://zontek.zzl.org)
      • HTTP_PROXY_CONNECTION - The Proxy of course
      • HTTP_ACCEPT - Not for beginner
    • Connection and Request
      • REMOTE_ADDR - The IP of the current viewing page (192.168.0.1)
      • REMOTE_HOST - Mostly the domain name
      • REMOTE_PORT - Port used for the connection
      • REMOTE_USER - User accessing
      • REMOTE_IDENT - Not for beginner
      • REQUEST_METHOD - GET/POST other methods
      • SCRIPT_FILENAME
      • PATH_INFO
      • QUERY_STRING
      • AUTH_TYPE
    • Server Internals
      • DOCUMENT_ROOT
      • SERVER_ADMIN
      • SERVER_NAME
      • SERVER_ADDR
      • SERVER_PORT
      • SERVER_PROTOCOL
      • SERVER_SOFTWARE
    • System Stuff
      • TIME_YEAR
      • TIME_MON
      • TIME_DAY
      • TIME_HOUR
      • TIME_MIN
      • TIME_SEC
      • TIME_WDAY
      • TIME
    • Specials
      • API_VERSION
      • THE_REQUEST
      • REQUEST_URI
      • REQUEST_FILENAME
      • IS_SUBREQ
      • HTTPS
    To find out what they do I would put up some basic examples :),

    Example 1 : Redirecting A specific IP from accessing the site
    Code:
    RewriteCond %{REMOTE_ADDR} ^192.168.0.2$
    RewriteRule (.*) get-the-hell-out-of-here.html
    
    That will redirect all the traffic from 192.168.0.1 [my experimental pc :D] to the mentioned html file

    Example 2 : Blocking any user using Internet Explorer from access your site [YUP I HATE IT :|]
    Code:
    RewriteCond %{HTTP_USER_AGENT} MSIE
    RewriteRule (.*) get-the-hell-out-of-here.html
    
    That's enough I guess and so instead of reading example get yourself doing it.

    Other Directives



    There are other important directives, brief explanations of each is

    RewriteBase

    Set the base URL per directory. Once define, the expression would be applied to the path - the directory name.

    Structure :
    Code:
    RewriteBase URL-path
    
    Example :
    Code:
    RewriteBase /forum
    RewriteRule (.*) http://myforum.com/$1
    
    RewriteOptions

    This defines certain special options that can be used.
    ex: inherit will force the .htaccess to inherit properties form its parent

    Structure :
    Code:
    RewriteOptions Options
    
    Example:
    Code:
    RewriteOptions inherit
    
    NOTE: There are few other directives like RewriteMap, RewriteLog & RewriteLock. But these are configured by the Server config or the virtual host. So those aren't of very much use [but of importance]. If you would like to know about them please refer the official documentation

    HaaH that's ~~ALL~~ :D

    Some Stuff to remember:

    1. Rewriting takes PROCESSING power of your server, so make your rules far simple as possible. For an example if you want to check for a large number of availabilities but only a few exception in a String, make sure you check for the NON availability of the exception so that it saves the power as well your head.
    2. AGAIN all of these are CasE SENsitive
    3. Practice only would bring the excellence
    4. None other than the creator can ever know the whole thing, so don't worry about being the GURU, there's Google for that :)
    5. Theory without practice == NULL+VOID
    6. Come to my blog from time to time to check for new updates and to give my ads some impressions so I can save a few $$ for a hard disk :D

    References



    1. For a complete list of common Flags - webforgers.net and for full list of rewrite rules see Official documentation
    2. For the immense reference and support [The official document(s)] - http://httpd.apache.org/docs/current/mod/mod_rewrite.html
      http://httpd.apache.org/docs/2.0/misc/rewriteguide.html
      http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewritecond
    3. For some awesome 10+ tips to be added [MUST read] - noupe.com
    4. Some General Stuff - workingwith.me.uk
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice