1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Robots.txt Files

Discussion in 'Internet Marketing' started by CircuitX, Feb 2, 2009.

  1. CircuitX

    CircuitX New Member

    Joined:
    Feb 2, 2009
    Messages:
    18
    Likes Received:
    1
    Trophy Points:
    0
    Occupation:
    Student
    Location:
    England, UK
    Home Page:

    Introduction



    This is a very basic tutorial about robots.txt files. Alot of people on hackthis.co.uk have trouble with it, so i'm gonna do a tutorial here for entry level hackers all over the web.

    So lets get started.

    Background



    Search engines such as "Google" or "Yahoo" find the websites which they ultimately find for you using something called a "search bot" or "web crawler". It searches the web in search of websites basically, and so search engines use them to index their sites.

    How This Is Related to Hacking



    If a website has pages which it doesn't want search engines to find, then it can index the sites that search bots are excluded from in a ".txt" file. This is called a "robots.txt".

    The pages indexed in a "robots.txt" page could potentially contain information concerning usernames, passwords, personal details etc. (the information would probably be encrypted, but this isn't a decryption tutorial :p). So basically, if we find the robots.txt file, then we find a list of secret webpages for a particular site.

    The best bit is, is that there can only be one "robots.txt" fine for each website. So say it was for go4expert. The URL would be - http://www.go4expert.com/robots.txt

    The "robots.txt" would look a bit like this:
    Code:
    User-agent: *
    Disallow: /
    The "/" tells the search robots to ignore any page on this website.
    However it could look like this:
    Code:
    User-agent: *
    Disallow: uernamespasswords.txt
    This would mean that search engines ignore just one page...

    Summary: And Further Reading



    robotstxt.org - all about "robots.txt" files.
    hackthis.co.uk - a great website to learn hacking in a legal, user friendly enviroment. Main Level 7 is all about "robots.txt".

    Remember guys: KEEP IT LEGAL

    DISCLAIMER - I WILL NOT BE HELD RESPONSIBLE FOR THE ACTIONS OF ANYONE WHO READS THIS TUTORIAL. IT IS FOR EDUCATIONAL PURPOSES ONLY.
     
  2. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,285
    Likes Received:
    364
    Trophy Points:
    83
    Good to see your first Article but I would have preferred this to be in Search Engines and so moved to Search Engine with a permanent redirect in Hacking forum as well.
     
  3. CircuitX

    CircuitX New Member

    Joined:
    Feb 2, 2009
    Messages:
    18
    Likes Received:
    1
    Trophy Points:
    0
    Occupation:
    Student
    Location:
    England, UK
    Home Page:
    Ok, sorry about that :nice:.
     
  4. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,285
    Likes Received:
    364
    Trophy Points:
    83
    I guess the more relevancy and less relevancy. There is nothing as such to feel sorry about it.
     
  5. stephen186

    stephen186 New Member

    Joined:
    Feb 17, 2009
    Messages:
    43
    Likes Received:
    0
    Trophy Points:
    0
    anywasy, even if it is not suppose to be posted here.......i think some newbie webmasters should also know this.......may be they can prevent to hack their secure the pages at least in this way.
     
  6. CircuitX

    CircuitX New Member

    Joined:
    Feb 2, 2009
    Messages:
    18
    Likes Received:
    1
    Trophy Points:
    0
    Occupation:
    Student
    Location:
    England, UK
    Home Page:
    Ha, i'm a bit of a newbie webmaster myself. I haven't bothered to sort out security yet. I think i'll wait till its finished.

    So if you guys want to hack someone - hack me!!! :p
     
  7. stephen186

    stephen186 New Member

    Joined:
    Feb 17, 2009
    Messages:
    43
    Likes Received:
    0
    Trophy Points:
    0
    well that's good for you. when i started in internet, i did not know this. I only came to know of all this stuff after reading and participating in forums like this. Even, i am still learning day by day.
     
  8. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,285
    Likes Received:
    364
    Trophy Points:
    83
  9. imrantechi

    imrantechi New Member

    Joined:
    Feb 12, 2008
    Messages:
    116
    Likes Received:
    4
    Trophy Points:
    0
    Will use it in right way...
     
  10. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,285
    Likes Received:
    364
    Trophy Points:
    83
  11. DKS

    DKS New Member

    Joined:
    Mar 30, 2009
    Messages:
    9
    Likes Received:
    0
    Trophy Points:
    0
    Great info on robots.txt file. It is definently a tool that a lot of people don't utilise. Most likely its because they are not aware of it. Great post!
     
  12. yohan

    yohan New Member

    Joined:
    Jun 26, 2009
    Messages:
    150
    Likes Received:
    4
    Trophy Points:
    0
    It is easy for robots/spiders to crawl your site if you have this file..
     
  13. chathura

    chathura New Member

    Joined:
    Oct 31, 2009
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    0
    Home Page:
    Good. can you show how to write when we want to say nofollow url?
     
  14. technica

    technica New Member

    Joined:
    Dec 15, 2007
    Messages:
    107
    Likes Received:
    0
    Trophy Points:
    0
    Home Page:
    one should also include sitemap path in the robots.txt file for better results.
     
  15. devinedbyzero

    devinedbyzero New Member

    Joined:
    Jan 25, 2010
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
    hello bro,
    follow me,you need the letter "p".you will see me in hackthis.co.uk
    try to solve it.
     
  16. satyedra pal

    satyedra pal New Member

    Joined:
    Mar 26, 2010
    Messages:
    93
    Likes Received:
    1
    Trophy Points:
    0
    We Use Robots Tag on all pages that we want indexed for the website. This will instruct the robots to crawl the page of the website.
    Robot tag is implemented in following way:
    <meta name="robots" content="index" />
     
  17. unni krishnan.r

    unni krishnan.r New Member

    Joined:
    Apr 20, 2010
    Messages:
    204
    Likes Received:
    3
    Trophy Points:
    0
    Occupation:
    education
    Location:
    Kerala
    Home Page:
  18. unni krishnan.r

    unni krishnan.r New Member

    Joined:
    Apr 20, 2010
    Messages:
    204
    Likes Received:
    3
    Trophy Points:
    0
    Occupation:
    education
    Location:
    Kerala
    Home Page:
    i got a vast discription
     
  19. dvdv882

    dvdv882 New Member

    Joined:
    Jul 10, 2010
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    0
    Home Page:
    Robots.txt file is a special text file that is always located in your Web server's root directory. Robots.txt file contains restrictions for Web Spiders, telling them where they have permission to search. A Robots.txt is like defining rules for search engine spiders (robots) what to follow and what not to. It should be noted that Web Robots are not required to respect Robots.txt files, but most well written Web Spiders follow the rules you define.
     
  20. lingoway

    lingoway New Member

    Joined:
    Aug 10, 2010
    Messages:
    24
    Likes Received:
    4
    Trophy Points:
    0
    good article
     

Share This Page