Identifying Text Genres Using Phrasal Verbs
Identifying Text Genres Using Phrasal Verbs
Kyle B. Dempsey, Philip M. McCarthy, and Danielle S. McNamara
Department of Psychology
Memphis. TN 38152
{kdempsey, pmccarthy, d.mcnamara} @mail.psyc.memphis.edu)
Understanding the textual distinction between
In a second experiment, we performed the same
spokenness-informality and writtenness-formality serves
analyses on a larger mirror corpus of texts, containing
many purposes. It can facilitate text mining, improve
1028 texts as compared to 482 texts in the Biber corpus.
parser accuracy, offer better appraisals of student writing,
Overall, we found a significant difference in the incidence
and may also facilitate better interpretations of
of phrasal verbs LSWD texts, F(1,1026) = 441.359,
experimental data. Previous studies of such textual
MSE=28.616, p<.001, and the BFID texts F(1,1026) =
variation (e.g., Biber, 1988, Louwerse et al., 2004) have
206.210, MSE=34.077, p<.001. We also found a
failed to produce a simple and effective method for
significant correlation between the rank ordering of texts
computationally distinguishing these text types. Indeed,
by incidence of phrasal verbs and the order of degree of
Biber (1988) using 67 lexical features could not determine
spokenness in the LSWD texts (r=.611, p< .001), as well
any spoken/written dimension and Louwerse et al. (2004)
as the incidence of phrasal verbs and the degree of
using over 200 textual indices could not identify a
informality in BFID texts (r=.656, p<.001). The results
formal/informal dimension.
supported the findings from Experiment 1 and suggest
In this study, we tested the hypothesis that phrasal
that phrasal verbs are significant markers for identifying
verbs could distinguish such text-types because the
spokennness and informality in texts.
presence of this verb construction is often claimed to be
Our study suggests that phrasal verbs offer an
indicative of both spoken and less formal discourse (e.g.,
efficacious and computationally inexpensive approach to
McWhorter, 2001).
identifying the degree of textual spokenness and
To test our hypotheses, we used Coh-Metrix (Graesser
informality. Such an index serves to benefit research in
et al., 2004) to calculate the incidences of various phrasal
both textual mining and text analysis tools. A better
verbs forms across two corpora: the Biber Corpus (Biber,
understanding of textual composition serves the learning
1988; Louwerse et al., 2004); and a larger, yet structurally
community by increasing the accuracy of textual
identical second corpus. For the spoken/written
appraisals, facilitating better feedback to researchers,
distinction, we used texts identified in the Louwerse et al.
students, and authors alike.
first dimension (LSWD). For the formal/informal
distinction, we used tests identified in the Biber fifth
Acknowledgements
dimension (BFID).
This research was supported by the Institute for Education
Sciences (IES R3056020018-02).
Results and Discussion
We conducted a series of ANOVAs on the incidence of
References
phrasal verbs across both text distinctions of both corpora.
Biber, D. 1988. Variation across speech and writing.
We also examined correlations between the incidence of
Cambridge: Cambridge University Press.
phrasal verbs and the degrees of spokenness and
Graesser, A. C., McNamara, D. S., Louwerse, M. M., &
informality for each corpus. Overall, we found a
Cai, Z. (2004). Coh-Metrix: Analysis of text on
significant difference in the incidence of phrasal verbs in
cohesion and language. Behavioral Research Methods,
both the LSWD texts F(1,480) = 100.469, MSE=27.188,
Instruments, and Computers, 36, 193-202.
p<.001, and the BFID texts, F(1,480) = 23.103,
Louwerse, M. M., McCarthy, P. M., McNamara, D. S., &
MSE=31.369, p<.001. We also found a significant
Graesser, A. C. (2004). Variation in language and
correlation between the rank ordering of texts by
cohesion across written and spoken registers. In K.
incidence of phrasal verbs and the order of degree of
Forbus, D. Gentner, T. Regier (Eds.), Proceedings of
spokenness in the LSWD texts (r=.464, p< .001), as well
the 26th Annual Meeting of the Cognitive Science
as the incidence of phrasal verbs and the degree of
Society (pp. 843-848). Mahwah, NJ: Erlbaum.
informality in BFID texts (r=.579, p<.001). The results
McWhorter, J.H. (2001). The power of Babel: A natural
suggest that phrasal verbs are significant markers for
history of language. Times Books: Henry Holt and
distinguishing differences in both spoken/written and
Company, New York.
formal/informal distinctions.
2470