what is robot.txt??

mark joshef · Oct 13, 2011

what is robot.txt??

neeraj_77 · Oct 13, 2011

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.

shabbir · Oct 13, 2011

Pretty much meaningless posts just for signature spam. I had to ban you once again.

harrysom · Oct 15, 2011

Robot.txt file is a simply text file.the purpose of robot.txt file is to tell the search engine to not crawl a page,which is robot.txt file.in a simple manner,search engine will not visit a page of your site,if you write a robot.txt command in a page.

benivolentsoft · Oct 18, 2011

"Robots.txt is a text file that has a special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that."

bobwarner01 · Oct 18, 2011

Robot are text files as we can see its extension but on the other hand we can say that robot always remains robot they react according to instruction We have some parts in website which are private we not want to be crawled by robot.

TM-Ali · Oct 19, 2011

Re: what is robots.txt?

Robots.txt is a file which is use to give the instructions to the robots of search engine. We can allow and disallow the robots of search engine on a particular folder and page.

Example:-

User-agent:*
Disallow:/Folder Name/
Disallow:/page-name.html

ozsubasi · Apr 2, 2012

sandrajolly said: ↑

Robots.txt file does not improve your search engine positioning.
It provides robots with information concerning which files you will not allow to be crawled and indexed in the search engines.
When the search engine robot crawls your site it looks for the robots.txt file.
If it doesn't find one it assumes automatically that it may crawl and index the entire site.

This allows all robots to crawl all files.
User-agent: *
Disallow:

This Disallows all robots to crawl a folder called /cmsbuffet/ .
User-agent: *
Disallow: /cmsbuffet/
Click to expand...

The clue to where this is copied from is in the reference to "cmsbuffet"
It originally comes from:
http://www.cmsbuffet.com/robots-txt-check.php

ozsubasi · Apr 13, 2012

The OP was banned for asking this (and other) silly question, and I think it has been fully answered.
So to any else who visits this thread, please read it first and only post to it if you have something new and relevant to say.

ozsubasi · Apr 11, 2012

sachinseo said: ↑

robots.txt is the file which doesnt allow crawlers to a site, for that you need to specify disallow function in webmaster tools and generate the txt file and upload to your server in root directory.
Click to expand...

Just to clarify this, these are the instructions from Google:

Generate a robots.txt file using the Create robots.txt tool
On the Webmaster Tools Home page, click the site you want.
Under Site configuration, click Crawler access.
Click the Create robots.txt tab.
Choose your default robot access. We recommend that you allow all robots, and use the next step to exclude any specific bots you don't want accessing your site. This will help prevent problems with accidentally blocking crucial crawlers from your site.
Specify any additional rules. For example, to block Googlebot from all files and directories on your site:
In the Action list, select Disallow.
In the Robot list, click Googlebot.
In the Files or directories box, type /.
Click Add. The code for your robots.txt file will be automatically generated.
Save your robots.txt file by downloading the file or copying the contents to a text file and saving as robots.txt. Save the file to the highest-level directory of your site. The robots.txt file must reside in the root of the domain and must be named "robots.txt". A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain. For instance, http://www.example.com/robots.txt is a valid location, but http://www.example.com/mysite/robots.txt is not.

(Source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449)

ozsubasi · May 5, 2012

Cole001 said: ↑

It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want.
Click to expand...

Yes, for example my site has an admin section which is for my use only and which is disallowed by robots.txt

ozsubasi · Jul 14, 2012

I think we have exhausted the possibilities on this subject. Thread closed.

Log in or Sign up

what is robot.txt??

mark joshef Banned

neeraj_77 New Member

shabbir Administrator Staff Member

harrysom New Member

benivolentsoft New Member

bobwarner01 New Member

TM-Ali New Member

ozsubasi New Member

ozsubasi New Member

ozsubasi New Member

ozsubasi New Member

ozsubasi New Member

Share This Page

Log in or Sign up

what is robot.txt??

mark joshef Banned

neeraj_77 New Member

shabbir Administrator Staff Member

harrysom New Member

benivolentsoft New Member

bobwarner01 New Member

TM-Ali New Member

ozsubasi New Member

ozsubasi New Member

ozsubasi New Member

ozsubasi New Member

ozsubasi New Member

Share This Page

Useful Searches