Sentence Matching

How then might we go about deciding whether a string is grammatical? And in particular, how might we do so in a way that is consistent with what we believe we know about human cognitive capacities? One (very simple-minded) way might be to attempt to put in a database all the grammatical sentences of English. Any possible sentence is then bound to be included in the database and can be checked against it, for example by running it as a Prolog query. Non-sentences, such as those in (2), will not be included, and so will not be recognised as grammatical English.

This method has some severe drawbacks. In the first place, since we have already claimed that the number of sentences in a language is infinitely vast, it follows that no finite database could ever be complete. For example, given a database containing N sentences, we could create an (N+1)th sentence made up of all the sentences so far joined together by the word 'and', then an (N+2)th sentence by joining them with 'or', and so on. Most importantly, the grammars of natural languages are recursive (a concept which we shall return to later in the chapter), allowing syntactic units to be embedded to any depth, as in the sentence 'The house the surveyor the property developer called valued fell down' or 'The man in the sunglasses in the hotel in Spain in the photograph in the newspaper is a bank robber'.

The second drawback is that simply putting a large number of sentences into a database does not indicate what it is that distinguishes them from non-sentences. It does not account for our intuitions, on seeing a novel string of words, as to whether that string belongs to the list of sentences (i.e., to the language) or not. For the same reason, it does not account for the fact that, despite their apparently different structures, sentences (3a) and (3c) express the same proposition (or 'have the same meaning') while (3a) and (3b), despite their similar structures, do not:

a. John is easy to please
b. John is eager to please
c. It is easy to please John

So a 'database of all the grammatical sentences' is both impossible and unsatisfactory. The two disadvantages listed above converge in a common general observation, which is relevant to our requirement that our account of language be cognitively plausible. The human brain contains only a finite, even if awesomely vast, number of neurons, so human beings have a strictly limited memory. No single human brain, nor even the totality of all human brains, could hold in memory all the sentences of a language, since a finite space can not contain an infinite number of objects. By the same token, we could not hope to put an infinite number of sentences into a computer database.