efficient search

Discussion in 'C' started by ceus, Jul 27, 2009.

  1. ceus

    ceus New Member

    Joined:
    Oct 16, 2006
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    0
    Hi friends,
    I have a doubt, i have a file which contains around 3000 elements, now i want to search for an element in that file....
    what will be the Efficient way of searching ? without consuming much memory and time...
    1) is this efficient if i take all the 3000 elements from file to a list or array and sort them and search ?
    2) or is it efficient to search directly inside the file ??
    Can anyone please suggest some algorithms for an efficient way to search in this case ??

    I am using C for coding..

    Thanks :)
     
  2. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,375
    Likes Received:
    388
    Trophy Points:
    83
    Binary search would be best option if you have elements sorted.
     
  3. dssr

    dssr New Member

    Joined:
    Feb 16, 2008
    Messages:
    26
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    Programming
    Location:
    Kolkata
    go for hashing for large number of elements .... for 3000 elements i think binary search would be quite useful, obviously if it's sorted, otherwise plain linear search is advised
     
  4. xpi0t0s

    xpi0t0s Mentor

    Joined:
    Aug 6, 2004
    Messages:
    3,009
    Likes Received:
    203
    Trophy Points:
    63
    Occupation:
    Senior Support Engineer
    Location:
    England
    How often will you need to perform the search? Is it a one-off find operation, or is this something you'll be doing frequently?
    I vaguely remember reading somewhere that Oracle recommends full table scans for tables <5000 rows, because the cost of maintaining the index exceeds the cost of the full scan.
    But the way to determine the best solution is first to define "best" - include stuff like memory used, speed of search, programming time and any other relevant factors, and only you can do that, then you can run trials on the different approaches, taking everything into account: not just search time but also index maintenance, whether the table is read-only or sorted, how often it gets updated, how often you run searches, and this will eventually lead you to the most optimal solution.
     
  5. ceus

    ceus New Member

    Joined:
    Oct 16, 2006
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    0
    wow thanks for all the replies...:)
    i will make it more clear... The elements are not sorted, the elements are not integers or floats, its all strings (having less than 300 charecters), and in my case i can define best as "It should not take much RAM and it should be performed in a minimal time possible (even if i want to compromise among one i would say even if it takes a second or two more it should not consume much RAM)"
     
  6. xpi0t0s

    xpi0t0s Mentor

    Joined:
    Aug 6, 2004
    Messages:
    3,009
    Likes Received:
    203
    Trophy Points:
    63
    Occupation:
    Senior Support Engineer
    Location:
    England
    Also please answer the questions asked. No answer is possible without this.
     
  7. ceus

    ceus New Member

    Joined:
    Oct 16, 2006
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    0
    Sorry i dint observe that,
    Yes the search will be frequent say atleast 50 times when it enters my function,
    and the table is not sorted and read only, and regarding the updation time say the Table will get updated once in a day..
     
  8. xpi0t0s

    xpi0t0s Mentor

    Joined:
    Aug 6, 2004
    Messages:
    3,009
    Likes Received:
    203
    Trophy Points:
    63
    Occupation:
    Senior Support Engineer
    Location:
    England
    How much RAM is the program allowed to use? Is it for instance allowed to store the whole file in memory?
    Is it OK to create a second file (an index file)?

    The *quickest* way will be to read the whole file into memory, sort it then do a binary search on the data. But if you can't do that due to memory limitations then we need to find another way. 3000 elements * 300 characters (worst case) =900,000 bytes, which on a modern machine with 3GB of RAM really isn't that much.

    A hash file is another way; this takes a hash of each item in the table (which is a value calculated from it, for example a simple hash is just to add up all the ASCII codes of the letters in the string, so "hello"=104+101+108+108+111=532, so you might store in a new file the hash plus each line that hash occurs on - let's say lines 10,30,70 and 100 have a hash of 532, you could store "532 10 30 70 100" on a line, then when you want to search for a string, first calculate its hash, then lookup that value in the hash file, then you can check each of the listed line numbers for a match (i.e. 10 30 70 100), which will be a lot quicker than scanning the whole file. Ideally a hash will be unique, but this may be "good enough" for what you want.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice