How to build a search bot

noviceprogrammer's Avatar, Join Date: Jun 2007
Newbie Member
Can someone please tell me how to build a searchbot like the one used for www.pricegrabber.com. It has to be able to data mine selected websites.

Thank you,
novice
shabbir's Avatar, Join Date: Jul 2004
Go4Expert Founder
What is the specialty of that search bot? It just crawls the websites and has some data based on the crawling. I guess you should start on the crawling on the websites and then move on with the parsing of it.
noviceprogrammer's Avatar, Join Date: Jun 2007
Newbie Member
Quote:
Originally Posted by shabbir
What is the specialty of that search bot? It just crawls the websites and has some data based on the crawling. I guess you should start on the crawling on the websites and then move on with the parsing of it.
The bot would search different websites and pull specific information that it would then store in a database. Is this possible? For example if I typed in panasonic 50" tvs it would search circuitcity and store the prices in a data base.
shabbir's Avatar, Join Date: Jul 2004
Go4Expert Founder
So to start with you should concentrate on getting the website pages into your database then you should be parsing them.
noviceprogrammer's Avatar, Join Date: Jun 2007
Newbie Member
Quote:
Originally Posted by shabbir
So to start with you should concentrate on getting the website pages into your database then you should be parsing them.

Alright, thanks shabbir! One problem though, how do I do that? Do you have any links or good examples in mind?

Thanks again, your a great help.
shabbir's Avatar, Join Date: Jul 2004
Go4Expert Founder
You should know the basic of the language i.e opening the file on a web server and getting the HTML of the file ...
noviceprogrammer's Avatar, Join Date: Jun 2007
Newbie Member
Shabbir appreciate your help with this topic I know how to take html and put it into a mysql database and i know how to parse databases with SQL commands but i need a way to constatly keep this information up to date I need something that will automatically on a daily basis be able to parse the data therefore I will not be able to manually upload the files into my database.
shabbir's Avatar, Join Date: Jul 2004
Go4Expert Founder
If you are good doing the HTML parsing then your job is half done. What I meant by HTML parsing is not just putting the whole text into the database but just the visible content only should be there in the database. Something like when you do a select all and copy into the browser.

You will need to have the bots that will cache the data on regular interval using some cron jobs. When any change is found in the database some other program or a cron jobs parse the data. Now the parser should be made on a general basis and not based on the content of the website. Something like if you are looking for price you should find the word price and then look for $.