Similarity bet. text files

kens's Avatar, Join Date: Jul 2007
Light Poster
Find the similarity between two text files.The similarity index needs to b defined by yrself. The similarity shud b content based! for example two docs talki abt tennis n cricket will hv a lower similarity index than both talkin abt the sam sport.

Please suggest how to approach this problem.
0
DaWei's Avatar, Join Date: Dec 2006
Team Leader
I realize that English might not be your first language. In that case, do not use AOL-speak guides as a suitable translator or dictionary. I would not give the sweat off my balls to a person so lazy as to use 'b' for 'be'. There are innumerable other examples in your post.

Work. Break a sweat. Expend some energy. Post some ideas or some code. Then you will receive help.

Don't post code without learning about code tags.
0
shabbir's Avatar, Join Date: Jul 2004
Go4Expert Founder
You should be comparing using the Dynamic programming technique. That way you can find the similarity between 2 strings. Use the NEEDLEMAN AND WUNSCH ALGORITHM or SMITH-WATERMAN ALGORITHM for sequence comparison using the Dynamic programming approach
0
kens's Avatar, Join Date: Jul 2007
Light Poster
I will take care to post it properly next time.

The algorithms you suggested, as far as I know, are used for string matching and DNA matching espicially pairwise sequence matching. Can you also suggest some other algorithm of lesser complexity? Thanks for your help.
0
shabbir's Avatar, Join Date: Jul 2004
Go4Expert Founder
NEEDLEMAN AND WUNSCH ALGORITHM is a very simple algorithm and is the basic of dynamic programming but your requirement to match the conversation is not that simple to analyze.
0
DaWei's Avatar, Join Date: Dec 2006
Team Leader
You might have a look at this discussion.