Similarity bet. text files

Light Poster
18Jul2007,01:17   #1
kens's Avatar
Find the similarity between two text files.The similarity index needs to b defined by yrself. The similarity shud b content based! for example two docs talki abt tennis n cricket will hv a lower similarity index than both talkin abt the sam sport.

Please suggest how to approach this problem.
Team Leader
18Jul2007,03:08   #2
DaWei's Avatar
I realize that English might not be your first language. In that case, do not use AOL-speak guides as a suitable translator or dictionary. I would not give the sweat off my balls to a person so lazy as to use 'b' for 'be'. There are innumerable other examples in your post.

Work. Break a sweat. Expend some energy. Post some ideas or some code. Then you will receive help.

Don't post code without learning about code tags.
Go4Expert Founder
18Jul2007,09:07   #3
shabbir's Avatar
You should be comparing using the Dynamic programming technique. That way you can find the similarity between 2 strings. Use the NEEDLEMAN AND WUNSCH ALGORITHM or SMITH-WATERMAN ALGORITHM for sequence comparison using the Dynamic programming approach
Light Poster
18Jul2007,13:19   #4
kens's Avatar
I will take care to post it properly next time.

The algorithms you suggested, as far as I know, are used for string matching and DNA matching espicially pairwise sequence matching. Can you also suggest some other algorithm of lesser complexity? Thanks for your help.
Go4Expert Founder
18Jul2007,14:04   #5
shabbir's Avatar
NEEDLEMAN AND WUNSCH ALGORITHM is a very simple algorithm and is the basic of dynamic programming but your requirement to match the conversation is not that simple to analyze.
Team Leader
18Jul2007,17:37   #6
DaWei's Avatar
You might have a look at this discussion.