what is robot.txt??

mark joshef's Avatar
Banned
what is robot.txt??
0
neeraj_77's Avatar, Join Date: Aug 2011
Go4Expert Member
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.
0
shabbir's Avatar, Join Date: Jul 2004
Go4Expert Founder
Pretty much meaningless posts just for signature spam. I had to ban you once again.
0
harrysom's Avatar, Join Date: Oct 2011
Newbie Member
Robot.txt file is a simply text file.the purpose of robot.txt file is to tell the search engine to not crawl a page,which is robot.txt file.in a simple manner,search engine will not visit a page of your site,if you write a robot.txt command in a page.
0
benivolentsoft's Avatar, Join Date: Feb 2011
Go4Expert Member
"Robots.txt is a text file that has a special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that."
0
bobwarner01's Avatar, Join Date: Oct 2011
Light Poster
Robot are text files as we can see its extension but on the other hand we can say that robot always remains robot they react according to instruction We have some parts in website which are private we not want to be crawled by robot.
0
TM-Ali's Avatar, Join Date: Oct 2011
Go4Expert Member
Robots.txt is a file which is use to give the instructions to the robots of search engine. We can allow and disallow the robots of search engine on a particular folder and page.

Example:-

User-agent:*
Disallow:/Folder Name/
Disallow:/page-name.html
0
ozsubasi's Avatar, Join Date: Jan 2012
Invasive contributor
Quote:
Originally Posted by sandrajolly View Post
Robots.txt file does not improve your search engine positioning.
It provides robots with information concerning which files you will not allow to be crawled and indexed in the search engines.
When the search engine robot crawls your site it looks for the robots.txt file.
If it doesn't find one it assumes automatically that it may crawl and index the entire site.

This allows all robots to crawl all files.
User-agent: *
Disallow:

This Disallows all robots to crawl a folder called /cmsbuffet/ .
User-agent: *
Disallow: /cmsbuffet/
The clue to where this is copied from is in the reference to "cmsbuffet"
It originally comes from:
http://www.cmsbuffet.com/robots-txt-check.php
0
ozsubasi's Avatar, Join Date: Jan 2012
Invasive contributor
The OP was banned for asking this (and other) silly question, and I think it has been fully answered.
So to any else who visits this thread, please read it first and only post to it if you have something new and relevant to say.

Last edited by ozsubasi; 13Apr2012 at 15:23..
0
ozsubasi's Avatar, Join Date: Jan 2012
Invasive contributor
Quote:
Originally Posted by sachinseo View Post
robots.txt is the file which doesnt allow crawlers to a site, for that you need to specify disallow function in webmaster tools and generate the txt file and upload to your server in root directory.
Just to clarify this, these are the instructions from Google:

Generate a robots.txt file using the Create robots.txt tool
On the Webmaster Tools Home page, click the site you want.
Under Site configuration, click Crawler access.
Click the Create robots.txt tab.
Choose your default robot access. We recommend that you allow all robots, and use the next step to exclude any specific bots you don't want accessing your site. This will help prevent problems with accidentally blocking crucial crawlers from your site.
Specify any additional rules. For example, to block Googlebot from all files and directories on your site:
In the Action list, select Disallow.
In the Robot list, click Googlebot.
In the Files or directories box, type /.
Click Add. The code for your robots.txt file will be automatically generated.
Save your robots.txt file by downloading the file or copying the contents to a text file and saving as robots.txt. Save the file to the highest-level directory of your site. The robots.txt file must reside in the root of the domain and must be named "robots.txt". A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain. For instance, http://www.example.com/robots.txt is a valid location, but http://www.example.com/mysite/robots.txt is not.

(Source: http://support.google.com/webmasters...&answer=156449)
like this