|
|
The Uppsala Corpus: Short description
The Uppsala Corpus (Upsal'skij korpus russkix tekstov) consists
of some 600 Russian texts with a total of one million running
words (word tokens), equally divided between informative and
literary prose. The informative texts are from between 1985
and 1989, while the literary texts, whose vocabulary does not
date as quickly, cover a longer period, 1960-88. The corpus
does not include poetry or drama.
Within the given frameword, considerable effort
has been made to ensure as representative and varied a corpus
as possible. The informative texts are drawn from 25 different
subject areas: economics, foreign affairs / foreign policy,
ideology / domestic policy, party matters, Soviet society, social
issues, defence, education, law, history, culture, linguistics,
medicine / health care, psychology, environment / ecology, agriculture,
engineering, information technology, space research, energy,
biology, geology / geography, physics, chemistry and sport.
Certain areas which were felt to be more important are represented
by a larger volume of texts.
The literary half of the corpus comprises work
by the following 40 authors: Abramov, Ajtmatov, Astaf'ev, Baklanov,
Bek, Belov, Bitov, Bondarev, Dubov, Ganin, Gladyshev, Granin,
Grekova, Goncharov, Iskander, Kaverin, Kazakov, Kochnev, Kozhevnikova,
Nagibin, Lixanov, Lidin, Paustovskij, Pogodin, Pristavkin, Troepol'skij,
Rasputin, Shcherbakova, Simonov, Solouxin, Shmelev, Tendrjakov,
Tokareva, Tolstaja, Trifonov, Vasil'ev, Vorobl'ev, Zalygin and
Zorin. Here, too, there is unequal representation, with a larger
amount of writing by the better-known authors.
For further details about the corpus, see Lönngren,
Lennart (ed.), 1993. Chastotnyj slovar' sovremennogo russkogo
jazyka. (A Frequency Dictionary of Modern Russian. With a Summary
in English.) Acta Universitatis Upsaliensis, Studia Slavica
Upsaliensia 32. 188 pp. Uppsala. ISBN 91-554-3134-8.
Distributor:
Almqvist & Wiksell International, Box 4627, S-116 91 Stockholm
[Abstract]
|
|