Computer applications for natural language.

Why should we want to develop computational models of language? Below are described five of the more important areas of research, though the list is far from exhaustive. A concise and eminently readable historical overview of early research in natural language processing is David Waltz's "The State of the Art in Natural Language Understanding", which is the introductory chapter to Lehnert & Ringle (eds.), Strategies for Natural Language Processing, 1982. For more recent work, see Survey of the State of the Art in Human Language Technology (Studies in Natural Language Processing), 1998, edited by Giovanni Varile and Antonio Zampolli.

2.1. Machine translation

The first application area to receive significant attention (in the late 1940s) was the translation of texts (specifically, scientific and technical papers) from one language to another. It was widely believed that there would be a tremendous problem caused by the expansion of the international scientific community during the post-war years, and that without machine translation it would be impossible to handle the massive number of documents to be translated. Although there was work on a variety of languages, the main focus of research in the west was Russian-to-English and, in the east, English-to-Russian.

The quality of the translations produced, however, was extremely poor, automatic translation proving a far more formidable task than had been anticipated. The failure of the initial idea for machine translation -- that it was basically a process of dictionary look-up, plus substitution, plus grammatical re-ordering -- is well illustrated by the following (probably apocryphal) story of the sentence "The spirit is willing but the flesh is weak" translated into Russian and then back into English: the translation is said to have come out as "The vodka is strong but the meat is rotten".

This early work on machine translation came to an ignominious end in the early 1960s after the failure to build any reasonably successful automatic translator. It is only in the past few years that interest in the field has been re-awoken; and there have recently been several research projects in this area, the most ambitious probably being the EEC's EUROTRA Project to develop MT systems for use in commercial communications within the European community.

Another recent development has been the emergence of online translation programs on the World Wide Web. Given the transnational nature of the medium, there are HTML documents in many languages on the web. The Alta Vista search engine offers a translation service, Babelfish, at http://babelfish.altavista.digital.com, which will translate pages between English and a number of European languages.

2.2. Information retrieval

The amount of information to be stored and retrieved was, after the last war, growing even faster than that to be translated. So another early vision of computer possibilities was the 'library of the future'. Much more than simply cataloguing books and papers, computers would be able to store representations of their contents and either retrieve texts on the basis of information given by the user or use the stored information to generate answers to specific questions by the user.

It is especially the latter that has been a central topic in the development of computational models of language, including research into the formal structure of natural language, the connections between the formal structure and the meanings conveyed, the intentions of speakers in using certain forms of language, the importance of real-world knowledge, and so on.

A currently active area of research is in the use of natural language interfaces for querying databases. Rick Watson, in 'Data Management: An Organizational Perspective', for example, notes that

"infrequent inquirers of a relational database may be reluctant to use SQL because they don't use it often enough to remain familiar with the language. While the QBE approach can make querying easier, a more natural approach is it use standard English. In this case, natural language processing (NLP) is used to convert ordinary English into SQL so the query can be passed to the relational database."

A query to a movie database such as, for example, "Which movies have won best foreign film sorted by year?" would generate the SQL query for MS Access: SELECT DISTINCT [Year], [Title] FROM [Awards] INNER JOIN [Movies] ON [Movies].[Movie ID] = [Awards].[Movie ID] WHERE [Category]='Best Foreign Film' and [Status]='Winner' ORDER BY [Year] ASC

In other application domains, Information Quest enables natural language querying of databases of scientific, technical, medical and business information, while the SATELITE system, developed by Software Ag España and in use since 1990, provides Spanish natural language query access to the corporate information of Telefónica de Espania in Madrid. A general purpose natural language front end to relational databases, translating the user's question into an optimized SQL query, is the Squirrel system, developed at the University of Essex. See also e.g. N. R. Adam & A. Gangopadhyay, 'A Form-Based Natural Language Front-End to a CIM Database', IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 2, March-April 1997; or X. Wu, T. Ichikawa, & N. Cercone, Knowledge-Base Asisted Database Retrieval Systems, 1996.

2.3. Information extraction

Scanning and summarizing newspaper articles, etc

Very often one will be interested in certain keys aspects of a news story, not in the minor and contingent details. A meteorological office, for example, might be interested in a volcanic eruption in the south Atlantic from the point of view of its possible consequences for future weather, but not from the point of view of, say, danger to shipping or effect on fishing. It becomes a waste of man-hours to get human beings to laboriously skim through every story that comes in over a news-wire, and to select and summarize the important ones, if one can get a machine that can do just that, tirelessly and reliably. News skimming programs have been written, the most famous probably being FRUMP. That such systems are not yet perfect, however, is well illustrated by the classic mistake FRUMP made in response to the following headline to a story:

POPE'S DEATH SHAKES THE WESTERN HEMISPHERE

FRUMP's summary:

THERE WAS AN EARTHQUAKE IN THE WESTERN HEMISPHERE. THE POPE DIED.

Can you guess how the mistake was made?

2.4. Human-machine interaction.

As well as being used to provide information from some stored body of knowledge, a natural language understanding system is useful in situations where the computer is being used to perform some task. Consider the advantage there is to the non-programmer or to the necessarily 'hands-free' user in being able to use ordinary language -- preferably speech -- to give commands, ask questions, enter information, and so forth, to a machine which can in turn produce natural language descriptions of what is going on, explanations that enable the program to explain why certain actions were taken, what state it is in at a given time, and the like. Those of you who have seen the film 2001 may be reminded of the 'user-friendliness' of the computer HAL.

An important major use of such systems would be in 'knowledge acquisition'. Some knowledge-based systems, known as 'expert systems', are based on a large body of stored knowledge about a particular problem area, such as medical diagnosis or oil-well exploration, for instance. In the building of such systems, it would be useful to 'cut out the middle man' -- the computer programmer -- by having the expert in some domain feed his expertise directly into the expert system via textual descriptions.

A related research area is the development of systems that allow programmers to specify computer programs in natural language or in natural-language-like programming languages.

2.5. Computer aided instruction.

"When the computer understands more, the quality of our interaction with it can be greatly enhanced. A good example of this is in computer based learning. Without understanding, a computer is unlikely to suggest learning paths which suit a learner's needs. With better understanding, the computer can be made to target the specific needs of the learner who will find the course easier and more rewarding."
Lingualink, From Language Engineering to Human Language Technologies. (1998), p.5

A vogue area of research has been the building of systems that do not simply make use of 'canned' language (in the form of questions, answers, and explanations that are rigidly pre-planned) but which can deal intelligently with the content of both the pre-stored material and the student's queries and responses.

2.6. Cognitive modelling.

"There are a number of questions that might lead one to undertake a study of language. Personally, I am primarily intrigued by the possibility of learning something, from the study of language, that will bring to light inherent properties of the human mind."
A.N. Chomsky, Language and Mind (1972), p.103

It has often been remarked that language is a 'mirror of the mind'. If we understood how language worked, we would be a long way towards understanding how the rest of the mind works in reasoning, learning and remembering. Much of the research on psycholinguistics has a double aim -- to understand language, and to understand the mind through its linguistic abilities. The development of cognitive theories of language plays a role in the development of more general theories of cognition.