![]() |
How to build a search bot
Can someone please tell me how to build a searchbot like the one used for www.pricegrabber.com. It has to be able to data mine selected websites.
Thank you, novice |
Re: How to build a search bot
What is the specialty of that search bot? It just crawls the websites and has some data based on the crawling. I guess you should start on the crawling on the websites and then move on with the parsing of it.
|
Re: How to build a search bot
Quote:
|
Re: How to build a search bot
So to start with you should concentrate on getting the website pages into your database then you should be parsing them.
|
Re: How to build a search bot
Quote:
Alright, thanks shabbir! One problem though, how do I do that? Do you have any links or good examples in mind? Thanks again, your a great help. |
Re: How to build a search bot
You should know the basic of the language i.e opening the file on a web server and getting the HTML of the file ...
|
Re: How to build a search bot
Shabbir appreciate your help with this topic I know how to take html and put it into a mysql database and i know how to parse databases with SQL commands but i need a way to constatly keep this information up to date I need something that will automatically on a daily basis be able to parse the data therefore I will not be able to manually upload the files into my database.
|
Re: How to build a search bot
If you are good doing the HTML parsing then your job is half done. What I meant by HTML parsing is not just putting the whole text into the database but just the visible content only should be there in the database. Something like when you do a select all and copy into the browser.
You will need to have the bots that will cache the data on regular interval using some cron jobs. When any change is found in the database some other program or a cron jobs parse the data. Now the parser should be made on a general basis and not based on the content of the website. Something like if you are looking for price you should find the word price and then look for $. |
| All times are GMT +5.5. The time now is 14:56. |