Robots.txt Files

CircuitX · Feb 2, 2009

Introduction

This is a very basic tutorial about robots.txt files. Alot of people on hackthis.co.uk have trouble with it, so i'm gonna do a tutorial here for entry level hackers all over the web.

So lets get started.

Background

Search engines such as "Google" or "Yahoo" find the websites which they ultimately find for you using something called a "search bot" or "web crawler". It searches the web in search of websites basically, and so search engines use them to index their sites.

How This Is Related to Hacking

If a website has pages which it doesn't want search engines to find, then it can index the sites that search bots are excluded from in a ".txt" file. This is called a "robots.txt".

The pages indexed in a "robots.txt" page could potentially contain information concerning usernames, passwords, personal details etc. (the information would probably be encrypted, but this isn't a decryption tutorial :p). So basically, if we find the robots.txt file, then we find a list of secret webpages for a particular site.

The best bit is, is that there can only be one "robots.txt" fine for each website. So say it was for go4expert. The URL would be - http://www.go4expert.com/robots.txt

The "robots.txt" would look a bit like this:
Code:
User-agent: *
Disallow: /
The "/" tells the search robots to ignore any page on this website.
However it could look like this:
Code:
User-agent: *
Disallow: uernamespasswords.txt
This would mean that search engines ignore just one page...

Summary: And Further Reading

robotstxt.org - all about "robots.txt" files.
hackthis.co.uk - a great website to learn hacking in a legal, user friendly enviroment. Main Level 7 is all about "robots.txt".

Remember guys: KEEP IT LEGAL

DISCLAIMER - I WILL NOT BE HELD RESPONSIBLE FOR THE ACTIONS OF ANYONE WHO READS THIS TUTORIAL. IT IS FOR EDUCATIONAL PURPOSES ONLY.

shabbir · Feb 3, 2009

Good to see your first Article but I would have preferred this to be in Search Engines and so moved to Search Engine with a permanent redirect in Hacking forum as well.

CircuitX · Feb 3, 2009

shabbir said: ↑

Good to see your first Article but I would have preferred this to be in Search Engines and so moved to Search Engine with a permanent redirect in Hacking forum as well.
Click to expand...

Ok, sorry about that :nice:.

shabbir · Feb 3, 2009

CircuitX said: ↑

Ok, sorry about that :nice:.
Click to expand...

I guess the more relevancy and less relevancy. There is nothing as such to feel sorry about it.

stephen186 · Feb 21, 2009

anywasy, even if it is not suppose to be posted here.......i think some newbie webmasters should also know this.......may be they can prevent to hack their secure the pages at least in this way.

CircuitX · Feb 21, 2009

stephen186 said: ↑

anywasy, even if it is not suppose to be posted here.......i think some newbie webmasters should also know this.......may be they can prevent to hack their secure the pages at least in this way.
Click to expand...

Ha, i'm a bit of a newbie webmaster myself. I haven't bothered to sort out security yet. I think i'll wait till its finished.

So if you guys want to hack someone - hack me!!! :p

stephen186 · Feb 21, 2009

well that's good for you. when i started in internet, i did not know this. I only came to know of all this stuff after reading and participating in forums like this. Even, i am still learning day by day.

shabbir · Mar 4, 2009

Nominate this article for Article of the month for February 2009

imrantechi · Mar 17, 2009

Will use it in right way...

shabbir · Mar 17, 2009

Vote for this article for Article of the month February 2009

DKS · Apr 15, 2009

Great info on robots.txt file. It is definently a tool that a lot of people don't utilise. Most likely its because they are not aware of it. Great post!

yohan · Jul 29, 2009

It is easy for robots/spiders to crawl your site if you have this file..

chathura · Nov 3, 2009

Good. can you show how to write when we want to say nofollow url?

Deleted member 17220 · Dec 2, 2009

one should also include sitemap path in the robots.txt file for better results.

devinedbyzero · Jan 25, 2010

CircuitX said: ↑

Ok, sorry about that :nice:.
Click to expand...

hello bro,
follow me,you need the letter "p".you will see me in hackthis.co.uk
try to solve it.

satyedra pal · Jun 1, 2010

We Use Robots Tag on all pages that we want indexed for the website. This will instruct the robots to crawl the page of the website.
Robot tag is implemented in following way:
<meta name="robots" content="index" />

unni krishnan.r · Jun 1, 2010

great post

unni krishnan.r · Jun 1, 2010

i got a vast discription

dvdv882 · Jul 23, 2010

Robots.txt file is a special text file that is always located in your Web server's root directory. Robots.txt file contains restrictions for Web Spiders, telling them where they have permission to search. A Robots.txt is like defining rules for search engine spiders (robots) what to follow and what not to. It should be noted that Web Robots are not required to respect Robots.txt files, but most well written Web Spiders follow the rules you define.

lingoway · Aug 19, 2010

good article

Log in or Sign up

Robots.txt Files

CircuitX New Member

Introduction

Background

How This Is Related to Hacking

Summary: And Further Reading

shabbir Administrator Staff Member

CircuitX New Member

shabbir Administrator Staff Member

stephen186 New Member

CircuitX New Member

stephen186 New Member

shabbir Administrator Staff Member

imrantechi New Member

shabbir Administrator Staff Member

DKS New Member

yohan New Member

chathura New Member

Deleted member 17220 Guest

devinedbyzero New Member

satyedra pal New Member

unni krishnan.r Member

unni krishnan.r Member

dvdv882 New Member

lingoway New Member

Share This Page

Log in or Sign up

Robots.txt Files

CircuitX New Member

Introduction

Background

How This Is Related to Hacking

Summary: And Further Reading

shabbir Administrator Staff Member

CircuitX New Member

shabbir Administrator Staff Member

stephen186 New Member

CircuitX New Member

stephen186 New Member

shabbir Administrator Staff Member

imrantechi New Member

shabbir Administrator Staff Member

DKS New Member

yohan New Member

chathura New Member

Deleted member 17220 Guest

devinedbyzero New Member

satyedra pal New Member

unni krishnan.r Member

unni krishnan.r Member

dvdv882 New Member

lingoway New Member

Share This Page

Useful Searches