Find the similarity between two text files.The similarity index needs to b defined by yrself. The similarity shud b content based! for example two docs talki abt tennis n cricket will hv a lower similarity index than both talkin abt the sam sport. Please suggest how to approach this problem.
I realize that English might not be your first language. In that case, do not use AOL-speak guides as a suitable translator or dictionary. I would not give the sweat off my balls to a person so lazy as to use 'b' for 'be'. There are innumerable other examples in your post. Work. Break a sweat. Expend some energy. Post some ideas or some code. Then you will receive help. Don't post code without learning about code tags.
You should be comparing using the Dynamic programming technique. That way you can find the similarity between 2 strings. Use the NEEDLEMAN AND WUNSCH ALGORITHM or SMITH-WATERMAN ALGORITHM for sequence comparison using the Dynamic programming approach
I will take care to post it properly next time. The algorithms you suggested, as far as I know, are used for string matching and DNA matching espicially pairwise sequence matching. Can you also suggest some other algorithm of lesser complexity? Thanks for your help.
NEEDLEMAN AND WUNSCH ALGORITHM is a very simple algorithm and is the basic of dynamic programming but your requirement to match the conversation is not that simple to analyze.