Introdution to Natural language processing (NLP) - part 2
Once the basic of a sentence have been determined to be correct using morphology, syntax can be used to check they are properly combined.
The most widely used method of checking a sentence is syntactically correct is to attempt to use a grammar to build up a parse tree.
A grammar is just a fancy word for a list of rules that turn symbols into words, and a parse tree is simply a diagramatical way of showing the use of these rules to build up a sentence.
For example the following grammar starts at the symbol S and produces very simple sentences.
S ==> V N
N ==> this
N ==> that
V ==> do
These rules show that the symbol S should be rewritten, or expanded, to the symbols V and N (in that order). These symbols can, in turn, be rewritten to the right hand sides of their rules. This process repeats until we have only terminals (words) in our sentence.
A parse tree for the sentence "do this" would be as shown in figure 2, wheras the sentence "stop that" has no parse tree (according to the grammar we are using) and is therefore syntactically incorrect.
This stage of NLP is often simply termed parsing since it's main aim is simply to build a parse tree. There are two main classes of parsing; top-down and bottom-up, each having its advantages and disadvantages.
Top down parsers start with the start symbol of the grammar and try different combinations of rules until a sequence of rules is found that generate the sentence being parsed.
Bottom up parsers start with the sentence and looks to find a sequence of rules which could generate the list of words in question.
The problems with parsing become apparent when there is more than one choice of rules to expand in order to create a sentence. This is often termed ambiguity and poses a significant nightmare to many NLP applications; given a sentence with many possible meanings, how do you determine the intended one.
|All times are GMT +5.5. The time now is 07:02.|