Discussion in 'Search Engine Optimization (SEO)' started by mark joshef, Oct 13, 2011.
what is robot.txt??
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.
Pretty much meaningless posts just for signature spam. I had to ban you once again.
Robot.txt file is a simply text file.the purpose of robot.txt file is to tell the search engine to not crawl a page,which is robot.txt file.in a simple manner,search engine will not visit a page of your site,if you write a robot.txt command in a page.
"Robots.txt is a text file that has a special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that."
Robot are text files as we can see its extension but on the other hand we can say that robot always remains robot they react according to instruction We have some parts in website which are private we not want to be crawled by robot.
Re: what is robots.txt?
Robots.txt is a file which is use to give the instructions to the robots of search engine. We can allow and disallow the robots of search engine on a particular folder and page.
The clue to where this is copied from is in the reference to "cmsbuffet"
It originally comes from:
The OP was banned for asking this (and other) silly question, and I think it has been fully answered.
So to any else who visits this thread, please read it first and only post to it if you have something new and relevant to say.
Just to clarify this, these are the instructions from Google:
Generate a robots.txt file using the Create robots.txt tool
On the Webmaster Tools Home page, click the site you want.
Under Site configuration, click Crawler access.
Click the Create robots.txt tab.
Choose your default robot access. We recommend that you allow all robots, and use the next step to exclude any specific bots you don't want accessing your site. This will help prevent problems with accidentally blocking crucial crawlers from your site.
Specify any additional rules. For example, to block Googlebot from all files and directories on your site:
In the Action list, select Disallow.
In the Robot list, click Googlebot.
In the Files or directories box, type /.
Click Add. The code for your robots.txt file will be automatically generated.
Save your robots.txt file by downloading the file or copying the contents to a text file and saving as robots.txt. Save the file to the highest-level directory of your site. The robots.txt file must reside in the root of the domain and must be named "robots.txt". A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain. For instance, http://www.example.com/robots.txt is a valid location, but http://www.example.com/mysite/robots.txt is not.
Yes, for example my site has an admin section which is for my use only and which is disallowed by robots.txt
I think we have exhausted the possibilities on this subject. Thread closed.
Separate names with a comma.