Specialized language data base incorporating the texts of the regional Belarusian newspapers
Main principles of creation of the full-text database incorporating the texts of regional Belarusian mass-media being elaborated as an experimental basis within the framework of the research project «Functioning of the Belarusian language in bilingual regional mass media» are described. The choic...
Gespeichert in:
Datum: | 2018 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | English |
Veröffentlicht: |
Інститут мовознавства ім. О.О. Потебні НАН України
2018
|
Schriftenreihe: | Мовознавство |
Online Zugang: | http://dspace.nbuv.gov.ua/handle/123456789/184379 |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Назва журналу: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
Zitieren: | Specialized language data base incorporating the texts of the regional Belarusian newspapers / L.V. Rychkova, A.Yu. Stankevich // Мовознавство. — 2018. — № 6. — С. 39-45. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraineid |
irk-123456789-184379 |
---|---|
record_format |
dspace |
spelling |
irk-123456789-1843792022-05-29T01:26:42Z Specialized language data base incorporating the texts of the regional Belarusian newspapers Rychkova, L.V. Stankevich, A.Yu. Main principles of creation of the full-text database incorporating the texts of regional Belarusian mass-media being elaborated as an experimental basis within the framework of the research project «Functioning of the Belarusian language in bilingual regional mass media» are described. The choice of the bilingual regional newspapers texts as the authentic language material is justified and the annotation system is explained. У статті описано головні засади створення повнотекстової бази даних, що містить тексти регіональних білоруських ЗМІ і була розроблена як експериментальна основа в межах дослідницького проекту «Функціонування білоруської мови в двомовних регіональних засобах масової інформації». Обґрунтовується вибір двомовних регіональних газет як джерел автентичного мовного матеріалу і висвітлюється система анотування. 2018 Article Specialized language data base incorporating the texts of the regional Belarusian newspapers / L.V. Rychkova, A.Yu. Stankevich // Мовознавство. — 2018. — № 6. — С. 39-45. — англ. 0027-2833 http://dspace.nbuv.gov.ua/handle/123456789/184379 en Мовознавство Інститут мовознавства ім. О.О. Потебні НАН України |
institution |
Digital Library of Periodicals of National Academy of Sciences of Ukraine |
collection |
DSpace DC |
language |
English |
description |
Main principles of creation of the full-text database incorporating the texts of regional
Belarusian mass-media being elaborated as an experimental basis within the framework
of the research project «Functioning of the Belarusian language in bilingual regional mass
media» are described. The choice of the bilingual regional newspapers texts as the authentic
language material is justified and the annotation system is explained. |
format |
Article |
author |
Rychkova, L.V. Stankevich, A.Yu. |
spellingShingle |
Rychkova, L.V. Stankevich, A.Yu. Specialized language data base incorporating the texts of the regional Belarusian newspapers Мовознавство |
author_facet |
Rychkova, L.V. Stankevich, A.Yu. |
author_sort |
Rychkova, L.V. |
title |
Specialized language data base incorporating the texts of the regional Belarusian newspapers |
title_short |
Specialized language data base incorporating the texts of the regional Belarusian newspapers |
title_full |
Specialized language data base incorporating the texts of the regional Belarusian newspapers |
title_fullStr |
Specialized language data base incorporating the texts of the regional Belarusian newspapers |
title_full_unstemmed |
Specialized language data base incorporating the texts of the regional Belarusian newspapers |
title_sort |
specialized language data base incorporating the texts of the regional belarusian newspapers |
publisher |
Інститут мовознавства ім. О.О. Потебні НАН України |
publishDate |
2018 |
url |
http://dspace.nbuv.gov.ua/handle/123456789/184379 |
citation_txt |
Specialized language data base incorporating the texts of the regional Belarusian newspapers / L.V. Rychkova, A.Yu. Stankevich // Мовознавство. — 2018. — № 6. — С. 39-45. — англ. |
series |
Мовознавство |
work_keys_str_mv |
AT rychkovalv specializedlanguagedatabaseincorporatingthetextsoftheregionalbelarusiannewspapers AT stankevichayu specializedlanguagedatabaseincorporatingthetextsoftheregionalbelarusiannewspapers |
first_indexed |
2025-07-16T04:37:05Z |
last_indexed |
2025-07-16T04:37:05Z |
_version_ |
1837776897312817152 |
fulltext |
ISSN 0027-2833. Мовознавство, 2018, № 6 39
© L.V. RYCHKOVA, A. YU STANKEVICH, 2018
L. V. RYCHKOVA, A. YU. STANKEVICH
SPECIALIZED LANGUAGE DATA BASE INCORPORATING
THE TEXTS OF THE REGIONAL BELARUSIAN
NEWSPAPERS
Main principles of creation of the full-text database incorporating the texts of regional
Belarusian mass-media being elaborated as an experimental basis within the framework
of the research project «Functioning of the Belarusian language in bilingual regional mass
media» are described. The choice of the bilingual regional newspapers texts as the authen-
tic language material is justified and the annotation system is explained.
Keywords: linguistic resources, Belarusian language, interaction of languages,
regional newspapers, full-text database, meta-annotation system, mixed-type corpus.
The creation of representative linguistic resources on an authentic Belarusian
language material is a very urgent task for language planning in the Republic of
Belarus. Firstly, de facto, the communicative matrix of the Belarusian language is
not complete in terms of its real use, as Belarus provides an example of predomi-
nantly Russian-speaking region not belonging to Russia itself. Secondly, the issue
of delimitation of authentic primary Belarusian-language texts from secondary
texts of translations into Belarusian becomes rather difficult in practice, since as
a rule only the texts already translated into Belarusian become apparent while the
original texts in Russian remain behind the scene, since they serve as an interme-
diate language material in the creation of the final Belarusian texts. Thirdly, the
real distribution of languages in various discourses in the complex context, which
is comprised by the official bilingualism and the interaction of two closely related
by their origin East-European languages as well as by the interaction of the appro-
priate linguocultures, requires investigation.
Within the framework of the research project «Functioning of the Belarusian
language in bilingual regional mass media» being fulfilled in the frames of the
State Programme of Scientific Researches of the Republic of Belarus, agreement
A70-16, an original electronic language resource — a full-text database of re-
gional Belarusian mass-media (RBM) is created, representing meta- and themati-
cally annotated regionally distributed corpus of texts, which is in its final version
should be regionally volume-balanced. As the main aim of the project is to study
the peculiarities of the Belarusian language real usage in the regional newspapers
of Belarus, RBM is planned to include the full texts of the newspapers, reflecting
the interaction of the two state languages in the context of the Belarusian-Rus-
sian linguocultural community. Considering the purpose of revealing the regional
peculiarities as well, namely those, which result from the influence of bordering
languages and cultures, the mass media are chosen from all the regions of the
Republic of Belarus, which border other countries, so the newspapers from Minsk
region are not included.
40 ISSN 0027-2833. Мовознавство, 2018, № 6
L.V. Rychkova, A. Yu Stankevich
It is known that the functioning of the language in a permanent contact with
other language(s) has its own characteristics, which under certain conditions can
be fixed in the usual norm, adapting to the functioning in different conditions, es-
pecially in the oral and written speech of bilinguals in the conditions of interaction
of cultures (see, for instance,[Smułkowa, 2000]) and other numerous publications
on the problem of language contact). From this point of view, the Belarusian lan-
guage was not investigated, and the question of regional variation of its literary
form was not raised at all. Nevertheless, this task is very relevant for the language
building in the Republic of Belarus, and its solution is possible only by creation
and use of the targeted representative language resources on the material of au-
thentic texts as an experimental evidence base for such kind of the research.
We should mention here that in traditional academic lexicography the texts of
newspapers publications have been considered as «sources of an unconventional
type» [Korovanenko, 1995, p. 35] (here and further the citations were translated
by L. Rychkova). However, in the 90s of the twentieth century, the attitude of
lexicographers to the print media has changed dramatically. The inclusion of me-
dia in the range of the important language material for the creation of language
resources is due to a number of factors, among which the following should be
noted as the most significant: 1) reassessment of the «language standard»; 2) con-
sidering the newspapers as «inexhaustible donors of neology»; 3) the reflection
of «standardized phrases ready for lexicographical use as words» in the texts of
media; 4) a wide range of «expressive-stylistic means» presented in the texts of
newspapers; 5) their function of the «conductor of terminology»; 6) the trend of
«resurrection of obsolete vocabulary» explicit in the newspapers; 7) the function
of the newspaper as the «supplier of regional geographic and international vo-
cabulary» [Korovanenko, 1995, p. 35–37]. If we consider the Belarusian regional
newspapers, one more feature is of utmost importance — that is the character of
their bilingualism, when any author can chose Belarusian and / or Russian as a
language of his / her publication on his / her own. This feature makes it unneces-
sary to translate any text and proves authenticity of the texts in Belarusian func-
tioning in such newspapers. Besides, the texts, which reflect the interaction of lan-
guages in the same newspaper publication, represent «mixed language material»
first differentiated during the procedure of the first mixed-type corpus creation,
namely the illustrative linguistic corpus of Grodno region mass media [Rychkova,
Stankevich, 2017]. This corpus based on the sources that included both authentic
texts in Belarusian and Russian allowed to show the peculiarities of the Russian
language in the bilingual (Russian-Belarusian) regional media of Belarus and des-
ignate the research direction for studying, besides Belarusian, regional features
of the Russian language in Belarus as well. The illustrative linguistic corpus of
Grodno region mass media is an open electronic (digital) resource and is freely
accessible through the regional module (Corpus of regional and foreign press) of
the media corpus in the frames of the Russian National Corpus. In order to be able
to work only with the illustrative linguistic corpus of Grodno region mass media
one must go to the search window of the regional module at http://ruscorpora.ru/
search-regional.html and create the appropriate subcorpus using meta-character-
istics of the country and region.
In the modern world, «the creation and effective use of information resourc-
es» is considered «as the most important factor in the social and economic devel-
opment of mankind» [Antopolski, 2004, p. 8]. As a rule, information resources are
understood as «separate documents and separate arrays of documents, documents
ISSN 0027-2833. Мовознавство, 2018, № 6 41
Specialized language data base incorporating the texts...
and arrays of documents in information systems, data banks, other information
systems» [Antopolski, 2004, p. 12]. Citing this definition, A. Antopolsky rightly
notes that not always the data circulating in information systems can be identified
only as documents, i.e. texts and data with a different structure, supplied with
unique details for their search and distinguishing them among a number of other
similar documents. Nevertheless, this is exactly the document approach that was
taken as the basis for the development of meta-annotation for full-text electronic
language resources. Combinations of tags, attributed to texts as information ob-
jects, allow identifying each specific text and, accordingly, select only such sets of
texts (= information arrays) that possess a set of specific meta-text characteristics
by tag values.
The infological model of RBM takes into account its complex nature as a
type of full-text annotated electronic language resource, for which, taking into
account the goal of the performed research, an important characteristic of the re-
gion, which, when implemented, creates the possibility of working with both the
resource as a whole and with data sets representing the newspapers from specific
region(s) has been provided.
The system of RBM meta-annotation represents the infological model of the
resource and includes the following characteristics of the texts as search objects:
A. Structural markup (limited only to paragraphs);
B. Meta-markup, which includes the following set of characteristics
(meta-parameters):
1. Author (an open list of values);
2. Title (an open list of values);
3. Date (an open list of values, though they should be given in the format
DD.MM.YYYY);
4. Topic (a closed list of values provided – is given below);
5. The name of the newspaper (an open list of values);
6. Region (a closed list of values, which includes the following items: «Grodno
region», «Gomel region», «Brest region», «Vitebsk region», «Mogilev region»);
7. The quantity of tokens (an open list of values);
8. The Internet address of the source to which a certain article belongs
(an open list of values);
9. Language of the text of the article (a closed list of values, which includes
the following options: «ru» (‘Russian’), «be» (‘Belarusian’), «mfr» (‘Russian
with Belarusian inclusions’));
10. Language of the headline of the article (with the same list of options as for
the meta-parameter 9 — «Language of the text of the article»).
The list of values for the meta-parameter 4 — «Topic» is divided into 2 sets:
1) general values (these values are used for the general, not specialized con-
tent); 2) specialized values (those values are directly related with special fields
of knowledge).
The general values of the meta-parameter «Topic» are as follows: 1) «ad-
ministration and management»; 2) «the army and armed conflicts»; 3) «astrol-
ogy, parapsychology, esoterics»; 4) «business, commerce, economics, finance»;
5) «home and household»; 6) «health and medicine»; 7) «leisure, sights and
entertainment» 8) «art and culture»; 9) «crime»; 10) +«forestry»; 11) «science
and technology»; 12) + «education»; 13) «politics and public life»; 14) «law»;
15) + «nature»; 16) + «industry»; 17) «accidents»; 18) «travelling»; 19) «reli-
42 ISSN 0027-2833. Мовознавство, 2018, № 6
L.V. Rychkova, A. Yu Stankevich
gion»; 20) + «agriculture»; 21) «sports»; 22) + «construction and architecture»;
23) + «engineering»; 24) + «transport»; 25) «private life».
The value «science and technology» can be used isolated in annotation of the
texts or it can be used in combination with any general value marked above by
«+», as well as in combination with a certain value from the list of the specialized
values (see below).
The closed list of the specialized values of the meta-parameter «Topic» is
divided into 3 subsets: a) natural sciences; b) applied sciences; c) humanities. All
the specialized values can be used only in combination with the general value
«science and technology».
The values belonging to the subset of natural sciences are as follows: 26) «as-
tronomy»; 27) «biology»; 28) «geography»; 29) «geology»; 30) «computer sci-
ence»; 31) «mathematics»; 32) «statistics»; 33) «physics»; 34) «chemistry».
The subset of values belonging to applied sciences includes the following
items: 35) «military affairs»; 36) «medicine»; 37) «energy».
Finally, the humanities subset comprises 10 more values: 38) «art history»;
39) «history»; 40) «cultural studies»; 41) «political science»; 42) «psychology»;
43) «religious studies»; 44) «sociology»; 45) «philology»; 46) «philosophy»;
47) «economy».
Comparison of the lists of general and specialized values shows the explicit
correlations between certain items:
2) «the army and armed conflicts» — 35) «military affairs»;
4) «business, commerce, economics» — 47) «economy»;
6) «health and medicine» — 36) «medicine»;
8) «art and culture» — 38) «art history»/ 40) «cultural studies»;
13) «politics and public life» — 41) «political science»;
19) «religion» — 43) «religious studies».
Such a correlation is a very valuable tool for the study of consubstantial lexis
as it provides an option to find thematically correlated texts, which belong to gen-
eral and / or scientific discourse. The analysis of the variety of texts in contempo-
rary newspapers shows that specialized texts are no longer represent exceptions
from the rule [Rychkova 2014, 2015].
Like in the illustrative linguistic corpus of Grodno region mass media, the
system of meta-annotation used for infological modeling of RBM is based on
the variety of parameters of the meta-annotation elaborated for the Corpus of re-
gional and foreign press in the frames of the Russian National Corpus. However,
it is replenished with a unique parameter (10 — Language of the headline of the
article). This meta-parameter is very important for RBM as it gives possibility to
find not so rare cases of texts written in Belarusian but having headlines written in
Russian as well as quite rare occasions of the Russian texts with Belarusian head-
lines. Therefore each headline in RBM has a parallel representation in the form of
a set of lemmas, that can be considered as one more distinguishing feature of the
infological model of RBM.
ISSN 0027-2833. Мовознавство, 2018, № 6 43
Specialized language data base incorporating the texts...
The current state of RBM is far from being regionally balanced in
volumes of the language material (see the diagram in the picture 1 be-
low) and especially it concerns the texts published in Belarusian (see
the diagram in the picture 2). The list of all the newspapers processed in
the RBM distributed by region is given in the attachment to the article.
Picture 1 . Distribution of the whole volume of language material in RBM
by the regions of Belarus
This means that the main attention further should be paid to the replenish-
ment of the RBM in order to ensure the regional balance of the language ma-
terial and enlarge the volume of the Belarusian language material proper.
Picture 2. Distribution of Belarusian language material in RBM by the
regions of Belarus
When completed, RBM can be used in an independent manner for different
scientific and educational purposes, and its infological model described above
makes it obvious that RBM can be also easily included into the Corpus of regional
and foreign press in the frames of the Russian National Corpus for the open access
through Internet.
REFERENCES
Antopolski, 2004 — Антопольский А. Б. Информационные ресурсы
России. — М., 2004. — 423 с.
Korovanenko, 1995 — Корованенко Т. А. Источники нового
академического словаря // Очередные задачи
русской академической лексикографии. —
СПб., 1995. — С. 31–43.
Rychkova, Stankevich, 2017 — Рычкова Л. В. Лингвистический корпус
СМИ Гродненщины: технология создания,
44 ISSN 0027-2833. Мовознавство, 2018, № 6
L.V. Rychkova, A. Yu Stankevich
направления использования / Под науч. ред.
Л. В. Рычковой. — Гродно, 2017. — 115 с.
Rychkova, 2015 — Рычкова Л. В. Лексіка рэлігійнай тэматыкі ў
беларускамоўных тэкстах СМІ Гродзеншчыны
і яе лексікаграфічнае адлюстраванне //
Беларуская мова і літаратура ў славянскім
этнакультурным кантэксце : Матэрыялы ІІ
Рэсп. навук.-практ. канф. (Віцебск, 19–20
лістапада 2015 года) / Редкал.: Г. А. Арцямёнак
(адк. рэд.) і інш. — Віцебск, 2015. — С. 134–
138.
Rychkova, 2014 — Рычкова Л. В. Специальная лексика в языке
региональных СМИ // Терминология и знание :
Материалы IV Междунар. симпозиума (Москва,
6–8 июня 2014 г.) / Отв. ред. С. Д. Шелов. — М.,
2014. — Вып. 4. — С. 157–172.
Smułkowa, 2000 — Smułkowa E. Dwujęzyczność po białorusku:
bilingwizm, dygłosja, czy coś innego? // Język i
toźsamość na pograniczu kultur. — Białystok,
2000. — S. 90–100.
ATTACHMENT
List of newspapers in the RBM distributed by region
Regions of Belarus in
alphabetical order
Original newspapers’
titles
Transliterated newspapers’
titles
Brest region Заря над Бугом Zarja nad Bugom
Brest region Навіны Камянеччыны Naviny Kamjanjechchyny
Gomel region Дняпровец Dnjaprovjets
Gomel region Светлагорскія навіны Svjetlagorskija naviny
Gomel region Светлае жыццё Svjetlaje zhytstsjo
Gomel region Хойнiцкiя навiны Hojnitskija naviny
Grodno region Астравецкая праўда Astravetskaja prawda
Grodno region Бераставіцкая газета Bjerastavitskaja gazjeta
Grodno region Вечерний Гродно Vjechjernij Grodno
Grodno region Воранаўская газета Voranawskaja gazjeta
Grodno region Іўеўскі край Iwjewski kraj
Grodno region Перспектива Pjerspjektiva
Grodno region Праца Pratsa
Grodno region Свіслацкая газета Svislatskaja gazjeta
Mogilev region Прыдняпроўская ніва Prydnjaprowskaja niva
Mogilev region Родная ніва Rodnaja niva
Vitebsk region Герой працы Gjeroj pratsy
Vitebsk region Жыццё Прыдзвіння Zhytstsjo Prydzvinnja
(Grodno, Belarus)
ISSN 0027-2833. Мовознавство, 2018, № 6 45
Specialized language data base incorporating the texts...
РИЧКОВА Л.В., СТАНКЕВИЧ А. Ю.
СПЕЦІАЛІЗОВАНА МОВНА БАЗА ДАНИХ З ВИКОРИСТАННЯМ ТЕК-
СТІВ БІЛОРУСЬКИХ РЕГІОНАЛЬНИХ ГАЗЕТ
У статті описано головні засади створення повнотекстової бази даних, що
містить тексти регіональних білоруських ЗМІ і була розроблена як експерименталь-
на основа в межах дослідницького проекту «Функціонування білоруської мови в
двомовних регіональних засобах масової інформації». Обґрунтовується вибір дво-
мовних регіональних газет як джерел автентичного мовного матеріалу і висвітлю-
ється система анотування.
Ключові слова: мовні ресурси, білоруська мова, міжмовна взаємодія, регіо-
нальні газети, повнотекстова база даних, система мета-анотування, корпус зміша-
ного типу.
|