kens 18Jul2007 01:17

Similarity bet. text files
Find the similarity between two text files.The similarity index needs to b defined by yrself. The similarity shud b content based! for example two docs talki abt tennis n cricket will hv a lower similarity index than both talkin abt the sam sport.

Please suggest how to approach this problem.

DaWei 18Jul2007 03:08

I realize that English might not be your first language. In that case, do not use AOL-speak guides as a suitable translator or dictionary. I would not give the sweat off my balls to a person so lazy as to use 'b' for 'be'. There are innumerable other examples in your post.

Work. Break a sweat. Expend some energy. Post some ideas or some code. Then you will receive help.

Don't post code without learning about code tags.

shabbir 18Jul2007 09:07

You should be comparing using the Dynamic programming technique. That way you can find the similarity between 2 strings. Use the NEEDLEMAN AND WUNSCH ALGORITHM or SMITH-WATERMAN ALGORITHM for sequence comparison using the Dynamic programming approach

kens 18Jul2007 13:19

I will take care to post it properly next time.

The algorithms you suggested, as far as I know, are used for string matching and DNA matching espicially pairwise sequence matching. Can you also suggest some other algorithm of lesser complexity? Thanks for your help.

shabbir 18Jul2007 14:04

NEEDLEMAN AND WUNSCH ALGORITHM is a very simple algorithm and is the basic of dynamic programming but your requirement to match the conversation is not that simple to analyze.

DaWei 18Jul2007 17:37

You might have a look at this discussion.

