First and foremost, we need to define what we mean by Natural Language Processing or NLP. The first and most general definition is simply that NLP encompasses anything a computer needs to do in order for it to understand natural language (whether it be typed or spoken).
We'll concentrate on natural language understanding (NLU); the task of understanding and reasoning with a natural language input, and ignore the issues of natural language generation; the generation of natural language output, since both use the same processes, just in a different order.
THe above points show the major steps necessary to decode a natural language sentence into a representation a computer can understand and then, ultimately, perform a suitable action.
1. input statement
6. output action
Each process in the diagram is progressively harder than the previous one, and less and less about the best method to tackle the problem is known morphology.
Morphology is the study of the formation of words; their stems, prefixes and suffixes.
A prefix is a small number of syllables that may be added to the start of an existing word to change it's meaning in some way. Likewise a suffix is added to the end of a word to alter it's meaning.
For example, the word advantage can be changed with the prefix dis- to make the new word disadvantage. Similarly the word national can be changed with the suffix -ity to make nationality.
The original word is known as the stem. A class of words can be created using the same stem but different afixes (prefixes and suffixes).
Morphology attempts to find patterns and rules in the way that affixes are used. Using these rules and patterns, a computer can check that words given as input are real words, and have been used correctly, without resorting to storing a huge list of words and each of their uses.
Note that in the above example the prefix de- was used, and a computer may logically determine that all words beginning with de- are made up of a prefix and a stem and a prefix, but this is of course not the case; the word dead for example starts with de but it is not used as a prefix in this case