Rabu, 22 Desember 2010

Computer-Assisted ESL Research

Abstract:
This paper describes a computer-assisted research project into writing errors of ESL college students. Sentences with error types and first language of students are entered in a database and analyzed to discover the most common errors for all students, and the most prevalent patterns within each language group, with the hope of more closely individualizing error identification and instruction. Results of the research into such areas as prepositions, verb agreement, part of speech, articles, verb tense and the use of be are presented.

KEYWORDS: ESL, writing errors, college, analyses, error identification, monitor hypothesis, first language interference, developmental interference
ACKNOWLEDGEMENTS
This is an expanded version of a paper presented at TESOL '84 in Houston, Texas. I would like to thank Assistant Provost Robert McDermott, Dean Martin Stevens and Professor Mary Hiatt of Baruch College for their support in this research, and Professor Elizabeth Riddle of Ball State University for her helpful comments and suggestions.
This paper describes an ongoing research project at Baruch College among college-level international students whose essays on standardized Writing Skills Assessment Tests (WATs) placed them in ESL writing courses. The failure rate of these students on this exam and on retests approximately 9 months later demonstrated to us the need for research in this area to determine the kinds of errors most commonly encountered, to see the effects of first language interference if any, and to individualize instruction for students by first language where appropriate. The idea is to focus more on the error-prone categories and help students monitor (in Krashen's sense) the rules of grammar as they write, especially in regard to particular first language interfering structures. Other aims of the research were to provide a basis for computer-derived exercises, produce a grammar-text and eventually a CAI program all based on the research. For now I wish to concentrate on the methodology, some statistical results, and some analyses of these statistics. Comments are particularly welcome on the number-crunching and the analyses, especially in regard to what constitutes an interfering structure.
METHODOLOGY
The sentences were taken from failing City University WATs tests, and also from students' essays written in class under similar time-limit constraints. At first, the CUNY system's mainframe IBM VM360-370 was accessed with a database system called FOCUS, and nearly 3,000 sentences were entered along with the students' first language. The totals by each language are in Figure 1.
These numbers fairly accurately represent the percentages of students enrolled in ESL classes speaking those languages: the overwhelming majority being Chinese, followed by Spanish, Russian, Korean and Greek. Since the number of sentences for languages other than these is rather small, I have decided to use only those five language as the bases for any analyses and subcategorization for the data in this paper, but work is continuing on the other languages.
The FOCUS file contains a field for first language, one for error type or types, and another for sentences or text. The standard kinds of searches, counts and reports could be made, but various shortcomings of this system notably difficulty of access to terminals, and frequent and unexplained crashes, led me to supplement this work with a Franklin 1000 microcomputer using a database/word-processing software call the Incredibly Jack. The biggest stumbling block in this kind of analysis is not to have access to a computer-driven parser like Writer's Workbench or EPISTLE. But although the human parser is slower, it can handle the data better, especially in the semantic domain. Analyzing the error type and entering the first language, error type and text (without making the correction unconsciously) is a slow process.
We are now at the stage of beginning to look at the numbers and try to see how to analyze the error types. The error types in general are errors of meaning (semantically wrong lexical choices
32
or idioms, labeled as voc/idiom) and errors of form (number, part of speech, subject-verb agreement, past participle), but distinctions are not easy to maintain here when it comes to such categories as articles and determiners. See Figure 2 for relatively clear-cut examples of error types, and an idea of the structure of the file.
It should be pointed out that these examples are atypical in that only one error type occurs for each sentence, but they are presented here for simplicity of presentation. In terms of sheer numbers, and when taken for all languages, sentences containing vocabulary or idiom mistakes and preposition mistakes that were due to idiomaticity out-number all grammatical categories. There will be a further explanation of what I mean for prepositional-idiomatic errors below. Articles are the single grammatical error type most often encountered, coming out ahead of verb forms for all languages except Spanish. These findings differ slightly from Reid (1983).
SOME RESULTS
In what follows I will present numbers and analyses of errors in preposition, verb agreement, part of speech, the article system, verb tense, and errors will be with an eye to considering what bearing these results might have on the current controversy concerning the dominance of developmental language-learning strategies vs. that of interference as the factor(s) causing errors.
What is striking is that for many of these error types, the percentages do not vary greatly cross-linguistically. That is, if we look just at the numbers for our five languages for a particular error type, the percentages of the total errors are similar. And when we scrutinize the nature of some of these error types, percentages in many cases remain consistently similar across language groups. This statement will be qualified as we go along, and all the facts are not yet in.
PREPOSITIONAL ERRORS
Prepositional errors are a good example of the type of error whose percentages do not vary much cross-linguistically no matter how we break up or dissect the data. The percentage totals for our five languages are given in Figure 3. Figure 4 lists some error examples.
For sheer numbers, as your see, preposition totals are roughly consonant cross-linguistically. When we scrutinize the errors one by one (recall that the databases do not have the power to do this kind of analysis, unless one knows beforehand what is likely to be relevant), we still find similarities cross-linguistically. If we look at just those preposition errors in which the preposition was incorrectly omitted (an example of this kind of error is given in Figure 4, Sentence 1), the percentages, displayed in Figure 5, are close.
Note that Greek writers seemed to omit less and commit more (i.e., give the wrong preposition or include an unnecessary preposition more). Since prep errors in which a preposition was included, but was the wrong choice, (as in Figure 4, Sentence 2) are simply the difference of the total prep mistakes (100%) minus the omitted errors, the totals of
SENTENCE TOTALS
Language
Total Number of Sentences
Amharic
6
Arabic
30
Chinese
1061
Farsi
55
French
134
Greek
260
Hindi
44
Indonesian
18
Japanese
14
Korean
279
Malay
25
Polish
21
Russian
315
Spanish
395
Tagalog
81
Vietnamese
85
Yoruba
11
Figure 1
EXAMPLES OF SENTENCES
Language:
Chinese
Error: Voc/idiom
Sentence:
We consist the New Year as an important day.
Language:
Japanese
Error: prep
Sentence:
Religions really answer to these questions.
Language:
Greek
Error: adj-n
Sentence:
Some of their children become violences.
Language:
Polish
Error: article
Sentence:
In order to decrease death rate, the speed limit was imposed.
Figure 2
PREPOSITIONS
Language
Number of preposition Errors
Chinese
201
(19%)
Greek
47
(18%)
Korean
64
(23%)
Russian
41
(13%)
Spanish
69
(17%)
Figure 3
33
confused prep errors are roughly similar as well. And, when we consider all the prep error sentences in which idiomatic usage was the reason for the error (note that this c an be an error of commission, as in Figure 4, Sentence 3, or of omission, and in Figure 4, Sentence 4), again the percentages are close cross-linguistically, as you can see in Figure 6.
As a last resort, I checked the number of prepositional-error sentences where the mistake (of commission or omission) was due to grammatical-syntactic causes (as in Figure 4, Sentence 5 and in Figure 7) you can see the percentages are somewhat divergent, though less noticeably as a percentage of all the sentences for each language.
We need now to analyze the first language structures corresponding to these errors to see if they are closer to developmental or interfering types. The category where there is some noticeable quantitative difference among the groups is the grammatical-syntactic category, where L1 interference is likeliest. A close look at Figure 4, Sentence 3 seems to be an attempt to render the en donde (literally, in where) pattern into English. It is still possible that within the preposition errors of commission or idiomaticity, individual errors may be due to interference. One such example is the Greek sentence, Figure 4, Sentence 3. The verb complain in Greek is paraponyeme, which is followed by ya, for.
VERB AGREEMENT
For subject-verb agreement, there was again great similarity cross-linguistically, and across our five languages, as in Figure 8.
It was not the case that speakers of languages with a verbal agreement system found this syntactic structure any easier than speakers without such a system; note the similarity between the Spanish and Chinese totals. Furthermore, percentages for the use of the base form instead of the -s form were similar; all groups used that form more than they should have. This turns out to be a recurrent pattern in some of the other error types discussed below, by the way. Russian had the highest error percentage for this misuse, with Greek a close second. Chinese and Spanish tied, with Korean last. Greek, Korean and Spanish had problems when prepositional phrases appeared between subject and verb, presumably interfering with determination of the subject and hence verb form choice; Chinese and Russian had more troubles when adverbs appeared between subject and verb. How these findings relate to the decision between developmental and first language interference is not clear.
PART OF SPEECH
In Figure 9 you can see how close the percentages for part of speech confusion errors are and when these are broken down in terms of adjective-noun confusion, noun-verb confusion, and adjective-verb confusion, we don't
OMISSION OF PREPOSITIONS
ERRORS TOTALS (%)
Language
% of Omitted-Prep
Error Types
Chinese
30%
Greek
17%
Korean
37%
Russian
27%
Spanish
29%
PREPOSITION ERROR SENTENCES
1. Chinese: Since this society is concerned time very much,...
2. Korean: Upon my experience, I know this is good.
3. Greek: They never complain for their occupation.
4. Russian: I can see you, regardless where you live.
5. Spanish: In here the trains and buses are cheaper.
Figure 4
Figure 5
IDIOMATIC TYPE PREPOSITION ERRORS TOTALS (%)
Language
% of Preposition Mistakes Due to Idiom
% of TOTAL Sentences
Chinese
54%
10%
Greek
49%
9%
Korean
47%
11%
Russian
49%
6%
Spanish
52%
9%
Figure 6
GRAMMATICAL-PREPOSITIONAL ERRORS TOTALS
Language
% of Prepositional Errs du to Grammar
% of TOTAL Sentences
Chinese
12%
2%
Greek
21%
4%
Korean
33%
7%
Russian
24%
3%
Spanish
30%
5%
Figure 7
34
find striking differences. (See Figure 10).
Adjective-noun confusion was more common for Greek and Korean speakers, while Chinese and Russian speakers made more noun-verb confusion errors. Spanish had a tie for these errors. In all languages, adjective-verb confusion was the lowest percentage of these types.
The improper use of, or the omission of, an uninflected base form for all the part of speech confusion categories, was the most encountered morphological category. Oddly, Chinese, a language with no part-of-speech morphology, had fewer problems in the base vs. inflected category of error when compared to other languages. But note that, as in subject-verb agreement, the morphologically simpler or base form is overgeneralized in part of speech errors.
Confusion over the use of -ent-ence and its variants, a phonologically based error and common in non-ESL writing, was very prevalent in all languages.
ARTICLES
A list of the breakdown for Articles, Determiners, Number and Mass-Count Confusion is given in Figure 11. Article mistakes alone are listed in Figure 12.
Here we find that Russian writers had the greatest problems in article use; it ranks highest there and is a very close second for all noun article-determiner systems (in Figure 11). Although we may often think Chinese when we think of article errors, it is fourth from the top of that list. However, in terms of omitting articles, Chinese is in fact the leader, a whopping 75% of article mistakes for Chinese involves the omission of a or the. Russian and Korean are the second and third worst for omitting articles. Chinese and Korean were similar in omitting the more often than they omitted a. Spanish, Russian and Greek omitted a more often than they omitted the.
When articles were improperly included, the article the was incorrectly included more often than the article a. Greek inserted articles more often than any other language while Spanish was second. This must be directly attributable to the use of the before mass nouns in those languages.
NUMBER CONFUSION
Korean had the highest percentage of errors for confusion over number (18%); Spanish was a close second (17%). Greek had the least trouble with number (9%). When number was confused, it is interesting to note that the singular, or base form, was improperly substituted for the plural form overwhelmingly in all languages, as Figure 13 shows.
VERB TENSE
As Figure 14 shows, verb tense error totals in terms of percentages are very close across the board.
In Figure 15 a partial breakdown of the verb tense data is given.
With the exception of Russian, the highest error-prone category for all languages was that they improperly substituted the present tense when the past tense was indicated (note again the preference for the morphologically simpler form). For all languages except Russian, present vs. past tense confusion (i.e., the total of columns two and three) was the single hardest distinction to make. Russian had the most difficulty with the distinction between present continuous and simple present, but did not have so much trouble with another aspectual problem, the distinction between present perfect and past.
ERRORS WITH BE
The final portion of the paper deals with errors involving the use of the verb be in any of its forms.
VERB AGREEMENT TOTALS
Language
% of TOTAL Number of Sentences
Chinese
11%
Greek
7%
Korean
7%
Russian
6%
Spanish
14%
PART OF SPEECH CONFUSION ERRORS
Language
% of TOTAL Number of Sentences
Chinese
8%
Greek
8%
Korean
7%
Russian
4%
Spanish
7%
Figure 8
Figure 9
BREAKDOWN, PART OF SPEECH TYPES
Language
Adjective-Noun
Noun-Verb
Adjective
Chinese
32%
43%
20%
Greek
61%
30%
0%
Korean
42%
33%
33%
Russian
18%
45%
9%
Spanish
44%
44%
11%
35
Figure 10
ARTICLES, DETERMINERS,
NUMBER, MASS-COUNT
Language
Number of Article, Mass-Count Determiner and Number Errors
Chinese
34%
Greek
28%
Korean
53%
Russian
52%
Spanish
28%
ARTICLES
Language
Number of Article Errors
Chinese
18%
Greek
20%
Korean
35%
Russian
42%
Spanish
10%
Figure 12
Figure 11
As Figure 16 shows, Chinese and Russian had the biggest problems with be, but we shall see that the nature of the troubles was somewhat different.
As Figure 17 shows, all languages omitted be when they made errors in this category more often than they improperly included be.
In Figure 18 there is a breakdown of the environments where be was omitted most frequently for all languages.
Largely because of the Greek percentages, it seems that be was omitted most frequently before a predicate adjective; omission before a predicate nominative is third. Omission of the copula before the progressive form of the verb is the second highest error-prone category; omission in passives is fourth. In addition, Chinese showed 5% omission of be before prepositional phrases, while no other language had such an error. Note also that Chinese and Korean had problems in omitting be and substituting have improperly. This turns out to occur most often in the existential there structure, and seems almost certainly to be a language-interference type error.
CONCLUSION
We have seen what sorts of results the computerized collecting and sorting of ESL sentences are capable of bringing to us, and some of the typical shortcomings in the methodology of computerized research (specifically, the fact that the researcher has to know in advance which areas of investigation are likely to be fruitful). It is hoped that this study has shed some light on the questions of developmental vs. first language interference as factors causing errors. The data still need to be interpreted further, but we are on the road. The applicability of this kind of study toward improving our teaching by focussing our instruction on these error-prone categories should not be overlooked. This kind of research also paves the road to individualizing by first language the content of our instruction, emphasizing that where certain language groups make certain errors, different lesson content should be developed.
Some immediate uses for this kind of research are the creations of proofreading exercises, where the teacher can, after a lesson on articles, assign students to correct a list of sentences with these errors as hard copy. I have done this, and there is no denying the authenticity of the sample. In fact, students have remarked that a particular sentence or two could have been my mistake; in some sense, they recognize their patterns. Also, I have noticed that when students are asked to correct errors from languages other than their own, they do better than when doing errors from their own first languages. Although these findings are anecdotal, they seem to indicate that certain error types
NUMBER CONFUSION BREAKDOWN
Language
Sing. for Plural
Plural for Sing.
Chinese
74%
26%
Greek
61%
38%
Korean
72%
22%
Russian
67%
33%
Spanish
81%
18%
Figure 13
VERB TENSE
Language
Total Number of Sentences
Number of Tense Errors
Chinese
1061
129 (12%)
Greek
260
28 (11%)
Korean
279
36 (13%)
Russian
315
37 (12%)
Spanish
395
47 (12%)
Figure 14
VERB TENSE BREAKDOWN
(% of verb tense errors)
Language
Present for Past
Past for Present
Present Contin. for Simple
Pres Perf/ Past Confu
Chinese
37%
17%
6%
5%
Greek
42%
4%
11%
18%
Korean
31%
25%
6%
14%
Russian
14%
22%
41% (+2%)
5%
Spanish
47%
30%
13%
4%
Figure 15
BE Errors
Language
Total Sentences
Number of BE Errors
Chinese
1061`
104 (9%)
Greek
260
14 (5%)
Korean
279
14 (5%)
Russian
315
30 (9%)
Spanish
395
17 (4%)
Figure 16
BE Omitted vs. BE Improperly Added
Language
BE-omitted
BE improperly added
Chinese
64%
28%
Greek
85%
14%
Korean
71%
21%
Russian
80%
20%
Spanish
76%
18%
Figure 17
BE Omitted Breakdown
Language
Passive
Pred. Adj.
Pred. Nom.
-ING
BE/HAVE
BE/DO
Chinese
18%
16%
17%
6%
11%
3%
Greek
14%
43%
14%
7%
0%
7%
Korean
7%
7%
0%
7%
21%
0%
Russian
10%
27%
17%
23%
0%
3%
Spanish
6%
0%
29%
35%
0%
0%
Figure 18
are specific to certain language groups, that certain errors are more opaque to speakers of certain languages, and that in some sense there is a psychological reality to the differences in error types and the monitoring hypothesis.

Tidak ada komentar:

Posting Komentar