Kamis, 23 Desember 2010

A Program That Acquires Language Using Positive and Negative Feedback

Abstract:
"Acquire" is an artificial intelligence application that learns language directly from examples. Sentences are input and classified by an informant as either good or bad. A pair of transition networks is built, one from the "good" input and one from the "bad" input using a pattern matching algorithm which states that a word can be classified by what comes immediately before and after it in a sentence. A sentence is recognized when it is accepted by the "good" network and rejected by the "bad" network. "Acquire" attempts to deal with problems such as overgeneration that arise in similar language learning programs.

KEYWORDS: Acquire, superstate, doublematch rule, before and after rule, informant, positive network, negative network
Introduction
Computer-assisted language instructional programs are designed to aid people in learning language. Unlike these programs, Acquire is a program that learns language itself.
What does it mean to say that a computer program learns language? What does it mean to say that a computer program understands language? Of what relevance are language learning programs to language teachers? These are some of the questions that this article will discuss. Along with this will be a detailed description of Acquire—what it learns, and how it learns it.
Natural Language Understanding, Artificial Intelligence, and Learning
People communicate via the wonderfully complex, subtle, and expressive mechanism called natural language. On the other hand computer programming languages, data base query languages, and other formalisms are highly constrained, rigid, and awkward for humans to use.
Natural language research has as its central goal the realization of computer programs that can deal with the full range of meaning of human
15
languages. If computers can understand what is meant by a sentence then they would be much easier to use. In addition, by examining natural language programs Al researchers hope to extend the understanding of language, communication, and the human mind.
Language learning programs form a sub-field of both natural language understanding and the broader artificial intelligence field of machine learning. Like natural language programs in general, the goal of this research is two-fold: First, to gain more understanding of the process of human learning. And second, to produce programs that will allow computers to learn, thus extending the range of problems that a particular computer program could solve.
A simple definition states that learning is any process by which a system improves its performance. (Barr, 1981) For language learning this is taken to mean that a system learns if, over time, through application of its internal algorithm and external input, it is able to handle a larger chunk of the target language.
Survey of Natural Language Acquisition
The field of natural language acquisition can be roughly divided into several areas. First, there is the grammatical inference problem which grows out of mathematical linguistics and the theory of computation. This body of research attempts to prove in theory when it is possible to learn a language on the basis of a set of sentences from that language.
A second body of research attempts to construct programs that either learn language or simulate humans learning language. These attempts have grown out of artificial intelligence and cognitive simulation.
Grammatical Inference
A grammar is a mechanism that produces or identifies all the legal sentences in a language and only those sentences. The grammatical inference problem is the problem of acquiring such a mechanism based on a set of utterances. Gold (1967) proved mathematically that no such algorithm exists because there are an infinite number of different grammars that can generate any finite set of sentences in a language. He did, however, develop a concept called "identifiability in the limit" which allows for the indication of a grammar for a language when both sentences and non-sentences from the target language are available to the learner. In addition he presented an algorithm that embodied such a method.
16
Gold's model was an attempt to construct a system, regardless of its psychological validity, that satisfied what is known as the Learnability Condition, which accounts for the fact that languages can be learned by children. What he succeeded brilliantly in showing was that no simple grammatical induction model could account for a child's language learning ability. First, he showed that an informant was necessary (the learner must be told whether a sentence is in the language or not) to learn anything useful at all. And second, Gold's algorithm, which is proven to be as fast as any other, is nonetheless computationally and physically intractable. To apply algorithm to induce a very simple language would require testing over 10100 possibilities (Pinker, 1979).
Computational Models of Language Learning
Computational models of language learning fall into two groups—those that attempt to empirically model a human child learning language and those that don't.
In the first group—those that draw fully on the latest theories from cognitive psychology, psycho-linguistics, and developmental psychology, seeking to model human language learning behavior through observing children learning to speak—there are two notable programs. Selfridge (1980) and Hill (1983) have built models that simulate the language learning ability of a human child through about age two.
These two programs are difficult to evaluate. Because they are much more concerned with the psychological and linguistic aspect of the model and less with the computational questions, their goals are somewhat out of the realm of a computer scientist. Therefore, I will give only a brief characterization.
Selfridge's (1980) "Process Model of Language Acquisition" models the acquisition of language comprehension in a two-year-old. The starting assumption is that a child learns to understand before learning to speak. Interestingly, the input to the learner (the program) consists not only of sentences, but also symbolic representations of gestures. Output is a representation of a child's actions. The learning system embodies a set of heuristics (encoded "rules of thumb" which allow a program to ignore many possible combinations in favor of others deemed more reasonable) for a child's focusing of attention and for word learning.
Hill's model, like Selfridge's, draws heavily on developmental psychology. She attempts to combine the psychological theories of Piaget with linguistic data available on language acquisition including extensive empirical data such as her own observations of a child subject. The model is a repetition and response model producing child-like repetition of, or responses to, the input sentences
17
which are meant to simulate the communication of adults in the child's environment. Hill's model is very explicit about its assumptions which are based on current psycho-linguistic theory. For instance, the model simulates a child classifying words by a method called classification through word use, by which words are grouped according to cognitive information (here Hill is referring to schema theory which states that a child makes some orderly mapping between adult sounds, gestures, and actions and the events and actions in the child's world) and the relationship between words that the child uses. Hill's model does go on to learn syntactic categories, develop rules about syntax, and generalize to advance beyond the two word sentence stage.
The other group of computational models are not explicit attempts to model human behavior and are more concerned with what is learned rather than how it is learned. These programs use an ad hoc collection of heuristics, semantic-syntactic correlations, and data structures. Some, notably LAS (Language Acquisition System), are quite impressive (Anderson, 1974, 75). LAS, although not attempting to model human learning, did attempt to embody the cognitive theory of development, including heuristics that attempted to infer meaning from non-linguistic context and derive rules to convert the meaning into sentences and vice versa.
Kelley (1967) and Reeker (1971) were precursors of later efforts at language learning. Harris (1972) programmed a robot to acquire a grammar giving the robot not only the ability to understand but features to act on what it understood.
ACQUIRE—A Learning Program
Acquire, the language acquisition program described in the rest of this article, falls clearly into the realm of grammar induction. It is not intended to mimic human behavior nor resemble the language learning of a child.
Acquire's learning algorithm is based on a pattern matching scheme first presented by Katke (1985). By examining patterns of words, forming word groups based on those patterns, eliminating redundant groups and combining equivalent ones, Acquire attempts to forge a transition network purely based on the utterances of a user.
Transition Network—Machine to Recognize Legal Sentences
A transition network is a formal mechanism that defines what sentences are grammatically acceptable in a language. It is a simple mechanism that can handle a class of languages less powerful than natural language. For instance a transition network cannot handle an embedded sentence. For these a more
18
powerful machine, a recursive transition network, is called for.
Think of a transition network as a road map. Each city represents a lexical category and the roads connecting the cities represent routes from one city to another (later these cities will be called paths or arcs). A sentence represents a set of directions that a traveler attempts to follow while traversing the map (the goal of the traveler is always to get from the city called BEGIN, to the city called END). Looking at the first direction (word of the sentence) the traveler attempts to find a route to a city that contains that word. If she finds such a route then she travels across it, looks at the next direction (next word in the sentence) and repeats the action. The traveler is unsuccessful (the sentence rejected) if any one of the following occurs:
1. She arrives in a city other than END and has no more directions remaining.
2. She arrives at the city, END, and still has directions remaining.
3. She arrives at a city and discovers that it is impossible to reach any city containing the next word in the sentence.
Our traveler is successful only if she arrives at the city END, and has no more directions to follow.
Pattern Matching Rules Build a Transition Network
Acquire's learning mechanism uses three rules and a parsing algorithm to build such a transition network from scratch using only input sentences. It classifies words into word groups, creates paths between word groups, combines similar groups into super-groups and eliminates little-used portions of the network. Eventually, after enough input has been received, the word groups (states of the transition network) begin to resemble classic groupings into parts of speech—nouns, verbs, articles, etc.
Before and After Rule
A word (Y) belongs in a state (X) if the following two conditions are true:
1.The word before Y in the input sentence can be found in a state with an arc leading to X.
AND
2.The word after Y in the input sentence can be found in a state that is reachable from X.
19
Doublematch Rule
A doublematch exists if the following two conditions are true:
1. A word (Y) in the input can be found in a state (X) anywhere in the existing network.
AND
2. The word after Y in the input can be found in a category reachable from X.
Garbage Collection Rule
If, at any time, two states (Y and Z) are found to have n common previous states (i.e., states with an arc leading to Y and Z and m common states that are reachable from Y and Z, then Y and Z are equivalent and therefore combined into one new state.1
Acquire Learns as it Parses Sentences
Acquire uses the following algorithm to construct its networks:
1. Start with an empty network. Current state is START and the current word is the first word of the input sentence.
2. Repeat steps a-e until there is no more input.
a. If there is an arc from the current state to a state containing the current word then traverse it (i.e., make that state the new current state).
b. If no such path exists then
1) attempt to place the current word in a state according to the before and after rule.
2) search the whole network for a doublematch. If one exists create a new arc from the current state to that state and traverse the new arc.
c. If a and b fail then create a new state, add an arc from the current state to the new state and traverse it.
d. Apply the garbage collection rule.
e. Make the current word the next word in the input sentence.
The following example will demonstrate the different features of the learning algorithm.
Sentence I (Figure 1) creates five new categories each containing a single word (step c of the algorithm). Sentence 2 creates new categories for (man) and (hit) and applies the doublematch rule to make the link from S7 (hit) to S8 (the).
20
Sentence 3 demonstrates the addition of a word to an existing word state. (Girl) is added to state S2 according to the before and after rule. In addition, states S8 and S9 are added to the network. Similarly, sentence 4 adds (ran) to S7 and (mile) to S5.
0x01 graphic
#Start S1 S2 S3 S4 S5 #End
Figure 1: Sentence 1: The dog ate the cat.
0x01 graphic
#Start S1 S2 S3 S4 S5 #End
S6 S7
Figure 2: Sentence 2: The man hit the cat.
0x01 graphic
#Start S1 S2 S3 S8 S9 #End
S6 S7 S4 S5
Figure 3: Sentence 3: The girl ate a hotdog
21
0x01 graphic
#Start S1 S2 S3 S8 S9 #End
S6 S7 S4 S5
Figure 4: Sentence 4: the man ran the mile.
The next sentence, (The man ate a hotdog), causes a cascade of changes to the network as it exists in Figure 4. First (Figure 5), an arc will be created from S6 to S3 according to the double-match rule. Note that states S2 and S6 now have a common previous state, SI, and a common following state, S3. These two states, S2 and S6, are equivalent and will be combined into one superstate2 via the garbage collection portion of the algorithm.
0x01 graphic
#Start S1 S2 S3 S8 S9 #End
S6 S7 S4 S5
Figure 5: Sentence 5: The man ate a hotdog.
The new situation is shown in Figure 6. Note, however, that states S7 and S3 now also have a common previous state, S2, and a common following state, S4, and will be combined as shown in Figure 7.
As more input is received by the learning algorithm the network continues to grow, new words are added to existing states, new states and arcs are added to the network, and states combined as a result. In the example so far superstates S2 and S3 have started to resemble the classical lexical categories, noun and verb.
22
0x01 graphic
#START Sl S2 S3 S8 S9 #END
S7 S4 S5
Figure 6
0x01 graphic
#Start S1 S2 S3 S8 S9 #End
S4 S5
Figure 7
The forging of new paths and combining of states is the strength of this pattern matching approach. It is also its undoing. Figure 8 shows the results of adding three sentences to the network. A new state, S10, is added with words (big) and (brown), and the word (keeper) is added to S2.
Sentence 9 causes a new arc to be added from S2 to itself (a cycle arc) due to the doublematch rule. Now states S2 and S10 will be combined by the garbage collection rule since they have a common previous state, S1, and a common following state, S2 (the cycle arc makes S2 its own following state). The new situation is reflected in Figure 9a.
Much to the chagrin of grammarians, the network has now learned that (brown) and (man) belong in the same word category. Nonsense sentences like (The big brown ate the hotdog) and (The girl big hit the cat) will now be accepted by the network.
23
This problem of overgeneration is symptomatic of a much larger theoretical problem. Gold (1967) proved that it is not possible to induce a correct grammar from positive examples alone:
. . . (Gold) was able to prove that language learnability depends on the information available: if both sentences and non-sentences are available to a learner (informant presentation), the class of primitive recursive languages, and all its subclasses (which include natural languages) are learnable. But if only sentences are available (text presentation), no class of languages other than the finite cardinality languages is learnable. (Pinker, 1979)
Sentence 6: The big man ate a hotdog.
Sentence7: The brown dog ate the cat.
Sentence 8: The keeper hit the cat.
0x01 graphic
#Start S1 S2 S3 S8 S9 #End
S10 S4 S5
Figure 8
24
0x01 graphic
#Start S1 S2 S3 S8 S9 #End
S10 S4 S5
Figure 9: Sentence 9: The dog keeper hit the cat.
0x01 graphic
#Start S1 S2` S3 S8 S9 #End
S4 S5
Figure 9a
Incorporating the Learning Algorithm into a Sentence Recognizing System
Acquire is an attempt to deal with the problem of overgeneration. It extends Katke's algorithm to learn a grammar from both sentences and non-sentences of a language.
With the aid of an informant, two transition networks are constructed. The first, or positive network, is learned from sentences that are declared part of the target language by the informant. The second, or negative network, is learned
25
from the input sentences that the informant has declared not in the language.
A sentence can be submitted to the system in one of three modes:
1. LEARN MODE: The user enters sentences which are submitted to the current system. If a sentence is rejected, the system then requests information from the user. If the user okays the sentence it is entered into the positive network, otherwise it is entered into the negative network.
2. TEACH MODE: In this mode a user can directly train the system with any sequence of sentences.
3. REPORT MODE: In this mode a sentence is simply submitted to the system for approval or disapproval. Nothing is added to the system, that is, the networks are frozen in their current state.
What Constitutes Acceptance By the System?
When a sentence is submitted to the system for approval it will be accepted according to the following truth-table:
Positive
Network
Negative
Network
Result
1. False
False
Rejected
2. False
True
Rejected
3. True
False
Accepted
4. True
True
Rejected
In case 1 the sentence is not recognized by the positive network, but is not explicitly recognized as a bad sentence (i.e., accepted) by the negative network. This simulates the naïve learner lacking information about a sentence. Nonetheless, this sentence is rejected until later when the positive network may have learned something more that allows the sentence to pass.
In case 2 the sentence has been explicitly rejected. There is no possibility that this sentence will ever be accepted in the future.
Case 3 provides the lone accepting result. Here a sentence explicitly passes the positive network and fails the negative. Again, however, what is accepted now could later be rejected as the negative network learns more.
Finally, there is case 4, perhaps the most interesting case. Here the sentence has explicitly passed both networks. These sentences are currently rejected but what to do with them is a matter of some conjecture. Case 4 can only arise as a result of overgeneration in one or both of the networks since the
26
informant would never knowingly claim a sentence to be both good and bad. So, a sentence could go from being False-True (false in the positive network and true in the negative) to being True-True or from True-False to True-True. In either case some other mechanism is needed to handle this interesting case.
Learning To Reject a Previously Accepted Incorrect Sentence
Acquire should, and indeed does, have the ability to learn to reject a previously accepted sentence and to accept a previously rejected one without explicitly being told to do so. It does this, of course, on the basis of learning new things both positive and negative. The following example will illustrate this point.
Sentence 1: The boy walked to school.
Sentence 2: The boy ran to school.
Sentence 3: The girls walks to school.
POSITIVE NETWORK:
0x01 graphic
#Start S1 S2 S3 S4 S5 #End
NEGATIVE NETWORK:
0x01 graphic
#Start S1 S2 S3 S4 S5 #End
Figure 10
With the system in LEARN mode Acquire will report that neither Sentence 1 nor Sentence 2 is accepted (after all, the networks are empty at this
27
point). On prompt the user will enter these sentences to the positive network. Sentence 3 is entered into the negative network and the resulting system is displayed in Figure I 0.
The next two sentences, after approval by the user, alter the positive network as shown in Figure 11.
Sentence 4. The boys ran to school.
Sentence 5. The boy walks to school.
Sentence 6. The boys walks to school.
POSITIVE NETWORK:
0x01 graphic
#Start S1 S2 S3 S4 S5 #End
Figure 11
They don't change the negative network, so it is not shown again. Sentence 6, THE BOYS WALKS TO SCHOOL, is accepted by the system even though it is clearly incorrect. That is, it passes the positive network and fails the negative network. It can be seen that the current positive network, taken alone, overgenerates and will recognize sentences such as this with incorrect subject-verb agreement.
However, after the next two sentences, 7 and 8, are incorporated into the negative network (as shown in Figure 12) the situation has changed. This time when THE BOYS WALKS TO SCHOOL is submitted it is not accepted by the system because it will now pass the negative network and therefore gives a True-True result.
Sentence 7: The girls runs to school.
Sentence 8: The boys runs to school.
Sentence 9: The boys walks to school.
28
NEGATIVE NETWORK:
0x01 graphic
#Start S1 S2 S3 S4 S5 #End
Figure 12
Future Work on Acquire
If Acquire is to have a practical significance several problems must still be addressed. First of all, there is the problem of overgeneration. Acquire's negative network is one attempt to deal with this problem. In effect the negative network introduces a second grammar. By intersecting the grammar of non-sentences (the negative network) with the grammar of sentences (the positive network) a truth result is reached.
Even with the negative network, however, the problem of overgeneration still exists. Experiments (Brand, 1986) have shown that when large sets of random sentences are put in the system the negative network will overgenerate producing a sort of a stalemate in which everything is passed by both networks and therefore rejected by the system. Other experiments showed that when training sentences were planned carefully, aimed at specific features such as incorrect subject-verb agreement, much better results were achieved.
All of this suggests an approach that would build a series of networks (think of them as progressively finer sieves) that would each deal with a specific level of over-generation. If a sentence is passed by each of these networks then it would be accepted.

Tidak ada komentar:

Posting Komentar