Similarity bet. text files

Discussion in 'C' started by kens, Jul 17, 2007.

  1. kens

    kens New Member

    Joined:
    Jul 17, 2007
    Messages:
    9
    Likes Received:
    0
    Trophy Points:
    0
    Find the similarity between two text files.The similarity index needs to b defined by yrself. The similarity shud b content based! for example two docs talki abt tennis n cricket will hv a lower similarity index than both talkin abt the sam sport.

    Please suggest how to approach this problem.
     
  2. DaWei

    DaWei New Member

    Joined:
    Dec 6, 2006
    Messages:
    835
    Likes Received:
    5
    Trophy Points:
    0
    Occupation:
    Semi-retired EE
    Location:
    Texan now in Central NY
    Home Page:
    http://www.daweidesigns.com
    I realize that English might not be your first language. In that case, do not use AOL-speak guides as a suitable translator or dictionary. I would not give the sweat off my balls to a person so lazy as to use 'b' for 'be'. There are innumerable other examples in your post.

    Work. Break a sweat. Expend some energy. Post some ideas or some code. Then you will receive help.

    Don't post code without learning about code tags.
     
  3. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,375
    Likes Received:
    388
    Trophy Points:
    83
    You should be comparing using the Dynamic programming technique. That way you can find the similarity between 2 strings. Use the NEEDLEMAN AND WUNSCH ALGORITHM or SMITH-WATERMAN ALGORITHM for sequence comparison using the Dynamic programming approach
     
  4. kens

    kens New Member

    Joined:
    Jul 17, 2007
    Messages:
    9
    Likes Received:
    0
    Trophy Points:
    0
    I will take care to post it properly next time.

    The algorithms you suggested, as far as I know, are used for string matching and DNA matching espicially pairwise sequence matching. Can you also suggest some other algorithm of lesser complexity? Thanks for your help.
     
  5. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,375
    Likes Received:
    388
    Trophy Points:
    83
    NEEDLEMAN AND WUNSCH ALGORITHM is a very simple algorithm and is the basic of dynamic programming but your requirement to match the conversation is not that simple to analyze.
     
  6. DaWei

    DaWei New Member

    Joined:
    Dec 6, 2006
    Messages:
    835
    Likes Received:
    5
    Trophy Points:
    0
    Occupation:
    Semi-retired EE
    Location:
    Texan now in Central NY
    Home Page:
    http://www.daweidesigns.com
    You might have a look at this discussion.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice