Find the similarity between two text files.The similarity index needs to b defined by yrself. The similarity shud b content based! for example two docs talki abt tennis n cricket will hv a lower similarity index than both talkin abt the sam sport.
Please suggest how to approach this problem.
|
Team Leader
|
![]() |
| 18Jul2007,03:08 | #2 |
|
I realize that English might not be your first language. In that case, do not use AOL-speak guides as a suitable translator or dictionary. I would not give the sweat off my balls to a person so lazy as to use 'b' for 'be'. There are innumerable other examples in your post.
Work. Break a sweat. Expend some energy. Post some ideas or some code. Then you will receive help. Don't post code without learning about code tags. |
|
Go4Expert Founder
|
![]() |
| 18Jul2007,09:07 | #3 |
|
You should be comparing using the Dynamic programming technique. That way you can find the similarity between 2 strings. Use the NEEDLEMAN AND WUNSCH ALGORITHM or SMITH-WATERMAN ALGORITHM for sequence comparison using the Dynamic programming approach
|
|
Light Poster
|
|
| 18Jul2007,13:19 | #4 |
|
I will take care to post it properly next time.
The algorithms you suggested, as far as I know, are used for string matching and DNA matching espicially pairwise sequence matching. Can you also suggest some other algorithm of lesser complexity? Thanks for your help. |
|
Go4Expert Founder
|
![]() |
| 18Jul2007,14:04 | #5 |
|
NEEDLEMAN AND WUNSCH ALGORITHM is a very simple algorithm and is the basic of dynamic programming but your requirement to match the conversation is not that simple to analyze.
|


