Go4Expert

Go4Expert (http://www.go4expert.com/)
-   C (http://www.go4expert.com/forums/c/)
-   -   efficient search (http://www.go4expert.com/forums/efficient-search-t18710/)

ceus 27Jul2009 14:27

efficient search
 
Hi friends,
I have a doubt, i have a file which contains around 3000 elements, now i want to search for an element in that file....
what will be the Efficient way of searching ? without consuming much memory and time...
1) is this efficient if i take all the 3000 elements from file to a list or array and sort them and search ?
2) or is it efficient to search directly inside the file ??
Can anyone please suggest some algorithms for an efficient way to search in this case ??

I am using C for coding..

Thanks :)

shabbir 27Jul2009 14:37

Re: efficient search
 
Binary search would be best option if you have elements sorted.

dssr 27Jul2009 15:05

Re: efficient search
 
go for hashing for large number of elements .... for 3000 elements i think binary search would be quite useful, obviously if it's sorted, otherwise plain linear search is advised

xpi0t0s 27Jul2009 15:51

Re: efficient search
 
How often will you need to perform the search? Is it a one-off find operation, or is this something you'll be doing frequently?
I vaguely remember reading somewhere that Oracle recommends full table scans for tables <5000 rows, because the cost of maintaining the index exceeds the cost of the full scan.
But the way to determine the best solution is first to define "best" - include stuff like memory used, speed of search, programming time and any other relevant factors, and only you can do that, then you can run trials on the different approaches, taking everything into account: not just search time but also index maintenance, whether the table is read-only or sorted, how often it gets updated, how often you run searches, and this will eventually lead you to the most optimal solution.

ceus 28Jul2009 09:57

Re: efficient search
 
wow thanks for all the replies...:)
i will make it more clear... The elements are not sorted, the elements are not integers or floats, its all strings (having less than 300 charecters), and in my case i can define best as "It should not take much RAM and it should be performed in a minimal time possible (even if i want to compromise among one i would say even if it takes a second or two more it should not consume much RAM)"

xpi0t0s 28Jul2009 12:47

Re: efficient search
 
Also please answer the questions asked. No answer is possible without this.

ceus 28Jul2009 16:55

Re: efficient search
 
Sorry i dint observe that,
Yes the search will be frequent say atleast 50 times when it enters my function,
and the table is not sorted and read only, and regarding the updation time say the Table will get updated once in a day..

xpi0t0s 28Jul2009 20:51

Re: efficient search
 
How much RAM is the program allowed to use? Is it for instance allowed to store the whole file in memory?
Is it OK to create a second file (an index file)?

The *quickest* way will be to read the whole file into memory, sort it then do a binary search on the data. But if you can't do that due to memory limitations then we need to find another way. 3000 elements * 300 characters (worst case) =900,000 bytes, which on a modern machine with 3GB of RAM really isn't that much.

A hash file is another way; this takes a hash of each item in the table (which is a value calculated from it, for example a simple hash is just to add up all the ASCII codes of the letters in the string, so "hello"=104+101+108+108+111=532, so you might store in a new file the hash plus each line that hash occurs on - let's say lines 10,30,70 and 100 have a hash of 532, you could store "532 10 30 70 100" on a line, then when you want to search for a string, first calculate its hash, then lookup that value in the hash file, then you can check each of the listed line numbers for a match (i.e. 10 30 70 100), which will be a lot quicker than scanning the whole file. Ideally a hash will be unique, but this may be "good enough" for what you want.


All times are GMT +5.5. The time now is 03:19.