Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Internet Marketing (http://www.go4expert.com/articles/internet-marketing/)
-   -   Robots.txt Files (http://www.go4expert.com/articles/robotstxt-files-t16034/)

CircuitX 3Feb2009 03:10

Robots.txt Files
 

Introduction



This is a very basic tutorial about robots.txt files. Alot of people on hackthis.co.uk have trouble with it, so i'm gonna do a tutorial here for entry level hackers all over the web.

So lets get started.

Background



Search engines such as "Google" or "Yahoo" find the websites which they ultimately find for you using something called a "search bot" or "web crawler". It searches the web in search of websites basically, and so search engines use them to index their sites.

How This Is Related to Hacking



If a website has pages which it doesn't want search engines to find, then it can index the sites that search bots are excluded from in a ".txt" file. This is called a "robots.txt".

The pages indexed in a "robots.txt" page could potentially contain information concerning usernames, passwords, personal details etc. (the information would probably be encrypted, but this isn't a decryption tutorial :p). So basically, if we find the robots.txt file, then we find a list of secret webpages for a particular site.

The best bit is, is that there can only be one "robots.txt" fine for each website. So say it was for go4expert. The URL would be - http://www.go4expert.com/robots.txt

The "robots.txt" would look a bit like this:
Code:

User-agent: *
Disallow: /

The "/" tells the search robots to ignore any page on this website.
However it could look like this:
Code:

User-agent: *
Disallow: uernamespasswords.txt

This would mean that search engines ignore just one page...

Summary: And Further Reading



robotstxt.org - all about "robots.txt" files.
hackthis.co.uk - a great website to learn hacking in a legal, user friendly enviroment. Main Level 7 is all about "robots.txt".

Remember guys: KEEP IT LEGAL

DISCLAIMER - I WILL NOT BE HELD RESPONSIBLE FOR THE ACTIONS OF ANYONE WHO READS THIS TUTORIAL. IT IS FOR EDUCATIONAL PURPOSES ONLY.

shabbir 3Feb2009 07:37

Re: Robots.txt Files
 
Good to see your first Article but I would have preferred this to be in Search Engines and so moved to Search Engine with a permanent redirect in Hacking forum as well.

CircuitX 3Feb2009 13:20

Re: Robots.txt Files
 
Quote:

Originally Posted by shabbir (Post 42339)
Good to see your first Article but I would have preferred this to be in Search Engines and so moved to Search Engine with a permanent redirect in Hacking forum as well.

Ok, sorry about that :nice:.

shabbir 3Feb2009 14:01

Re: Robots.txt Files
 
Quote:

Originally Posted by CircuitX (Post 42350)
Ok, sorry about that :nice:.

I guess the more relevancy and less relevancy. There is nothing as such to feel sorry about it.

stephen186 21Feb2009 12:48

Re: Robots.txt Files
 
anywasy, even if it is not suppose to be posted here.......i think some newbie webmasters should also know this.......may be they can prevent to hack their secure the pages at least in this way.

CircuitX 21Feb2009 15:40

Re: Robots.txt Files
 
Quote:

Originally Posted by stephen186 (Post 43255)
anywasy, even if it is not suppose to be posted here.......i think some newbie webmasters should also know this.......may be they can prevent to hack their secure the pages at least in this way.

Ha, i'm a bit of a newbie webmaster myself. I haven't bothered to sort out security yet. I think i'll wait till its finished.

So if you guys want to hack someone - hack me!!! :p

stephen186 21Feb2009 16:10

Re: Robots.txt Files
 
well that's good for you. when i started in internet, i did not know this. I only came to know of all this stuff after reading and participating in forums like this. Even, i am still learning day by day.

shabbir 4Mar2009 09:56

Re: Robots.txt Files
 
Nominate this article for Article of the month for February 2009

imrantechi 17Mar2009 10:27

Re: Robots.txt Files
 
Will use it in right way...

shabbir 17Mar2009 12:16

Re: Robots.txt Files
 
Vote for this article for Article of the month February 2009

DKS 15Apr2009 21:24

Re: Robots.txt Files
 
Great info on robots.txt file. It is definently a tool that a lot of people don't utilise. Most likely its because they are not aware of it. Great post!

yohan 29Jul2009 06:44

Re: Robots.txt Files
 
It is easy for robots/spiders to crawl your site if you have this file..

chathura 3Nov2009 15:52

Re: Robots.txt Files
 
Good. can you show how to write when we want to say nofollow url?

technica 2Dec2009 12:30

Re: Robots.txt Files
 
one should also include sitemap path in the robots.txt file for better results.

devinedbyzero 26Jan2010 03:39

Re: Robots.txt Files
 
Quote:

Originally Posted by CircuitX (Post 42350)
Ok, sorry about that :nice:.

hello bro,
follow me,you need the letter "p".you will see me in hackthis.co.uk
try to solve it.

satyedra pal 1Jun2010 11:53

Re: Robots.txt Files
 
We Use Robots Tag on all pages that we want indexed for the website. This will instruct the robots to crawl the page of the website.
Robot tag is implemented in following way:
<meta name="robots" content="index" />

unni krishnan.r 1Jun2010 14:06

Re: Robots.txt Files
 
great post

unni krishnan.r 1Jun2010 14:06

Re: Robots.txt Files
 
i got a vast discription

dvdv882 23Jul2010 15:20

Re: Robots.txt Files
 
Robots.txt file is a special text file that is always located in your Web server's root directory. Robots.txt file contains restrictions for Web Spiders, telling them where they have permission to search. A Robots.txt is like defining rules for search engine spiders (robots) what to follow and what not to. It should be noted that Web Robots are not required to respect Robots.txt files, but most well written Web Spiders follow the rules you define.

lingoway 19Aug2010 17:05

Re: Robots.txt Files
 
good article

PradeepKr 10Sep2010 15:34

Re: Robots.txt Files
 
Few cents from me,
You can add sitemap location also in the robots.txt file.

This would tell robots where exactly your sitemap is and which links to crawl.

parryrater 14Sep2010 16:44

Re: Robots.txt Files
 
hi...

really robot.txt file has more useful to make user friendly site..!!! meet again.

raantint 17Sep2010 12:24

Re: Robots.txt Files
 
hi...

Is there any robot.txt generator tool like as sitemap generator...??? meet again.

shabbir 17Sep2010 13:51

Re: Robots.txt Files
 
Visit Google Webmaster tool and they have option for generating one for your domain

parrytint 20Sep2010 12:32

Re: Robots.txt Files
 
hi...

What are the benefits by the robots.txt file generate..??? meet again.

raanzen 23Sep2010 11:34

Re: Robots.txt Files
 
hi...

Which URL's may not eligible to crawl by search engine in any any site...??? meet again.

vimlesh 24Feb2011 11:30

Re: Robots.txt Files
 
The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. For those spiders that obey the file, it provides a map for what they can, and cannot index. The file must reside in the root directory of your web.

delhifirm 4Mar2011 11:21

Re: Robots.txt Files
 
hello
Thank for sharing.
It is a basic notes on Robots.txt Files that is good

denishverma 4Mar2011 15:49

Re: Robots.txt Files
 
Robots.txt
Used to give authentication to Google bot or search engine bot for website pages and other folders.
Robots.txt file is a simple text file where we gives authentication for whole website inner folders.
Robots.txt allow to bot of every search engine.
User agent of Robots.txt is used to do this.

stacey 1Jun2011 16:56

Re: Robots.txt Files
 
Yep the robots.txt file useful for the website to protect the personal information from the public.

rebeccaasmit 7Jun2011 13:08

Re: Robots.txt Files
 
Thanks for posting this !! A well apt complete knowledge of robots.txt.....

benivolentsoft 14Jun2011 16:38

Re: Robots.txt Files
 
Thanks for the detailed information and moreover Robots.txt will follow both nofollow and dofollow links

We cant give command to the Google robots to follow the needed links because basically robot will know all the linking process for the website promotion.

seo-marketing 17Jun2011 13:41

Re: Robots.txt Files
 
I have seen that Google tends to ignore robots.txt file sometimes. Pages which you have specified in robots as "Disallow" are sometimes seen crawled by Google.

tiwvinay 14Jul2011 17:11

Re: Robots.txt Files
 
thanks for given information.

seoforums85 29Jul2011 15:24

Re: Robots.txt Files
 
When you do not want to crawl the page then use robot.txt file

castorsandwheels 4Aug2011 17:11

Re: Robots.txt Files
 
Thanks for the information about Robot.txt.

Creativepromotion 12Aug2011 18:04

Re: Robots.txt Files
 
useful info for everyone

denishverma 15Aug2011 07:22

Re: Robots.txt Files
 
Robots.txt is an authentication file which used to allow/ disallow folders of website.
If disallow then anyone can not access the folder.
If allows then can access folders, inner folders, sub folders or cgi_bin etc.

thanks

jhon786 12Oct2011 11:21

Re: Robots.txt Files
 
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

mukeshsoftona 15Nov2011 15:38

Re: Robots.txt Files
 
Disallow or allow you website content from Google.


All times are GMT +5.5. The time now is 18:17.