Introduction
This is a very basic tutorial about robots.txt files. Alot of people on hackthis.co.uk have trouble with it, so i'm gonna do a tutorial here for entry level hackers all over the web.
So lets get started.
Background
Search engines such as "Google" or "Yahoo" find the websites which they ultimately find for you using something called a "search bot" or "web crawler". It searches the web in search of websites basically, and so search engines use them to index their sites.
How This Is Related to Hacking
If a website has pages which it doesn't want search engines to find, then it can index the sites that search bots are excluded from in a ".txt" file. This is called a "robots.txt".
The pages indexed in a "robots.txt" page could potentially contain information concerning usernames, passwords, personal details etc. (the information would probably be encrypted, but this isn't a decryption tutorial
). So basically, if we find the robots.txt file, then we find a list of secret webpages for a particular site.The best bit is, is that there can only be one "robots.txt" fine for each website. So say it was for go4expert. The URL would be - http://www.go4expert.com/robots.txt
The "robots.txt" would look a bit like this:
Code:
User-agent: * Disallow: /
However it could look like this:
Code:
User-agent: * Disallow: uernamespasswords.txt
Summary: And Further Reading
robotstxt.org - all about "robots.txt" files.
hackthis.co.uk - a great website to learn hacking in a legal, user friendly enviroment. Main Level 7 is all about "robots.txt".
Remember guys: KEEP IT LEGAL
DISCLAIMER - I WILL NOT BE HELD RESPONSIBLE FOR THE ACTIONS OF ANYONE WHO READS THIS TUTORIAL. IT IS FOR EDUCATIONAL PURPOSES ONLY.



.