Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging
It is considered an actual task of conceptualization and next formalization of unstructured text data that contained in thematic information flows distributed on the Internet. A network that built from the key terms of the text (the key words and phrases) is considered as an ontology model of the su...
Gespeichert in:
Datum: | 2020 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | Ukrainian |
Veröffentlicht: |
Інститут проблем реєстрації інформації НАН України
2020
|
Schlagworte: | |
Online Zugang: | http://drsp.ipri.kiev.ua/article/view/225914 |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Назва журналу: | Data Recording, Storage & Processing |
Institution
Data Recording, Storage & Processingid |
drspiprikievua-article-225914 |
---|---|
record_format |
ojs |
spelling |
drspiprikievua-article-2259142021-03-10T16:12:52Z Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging Побудова направлених зважених мереж термінів із застосуванням Part-of-speech tagging Дмитренко, О. О. text corpus, natural language processing, part-of-speech (PoS) tagging, terminological ontology, network of terms текстовий корпус, обробка природньої мови, Part-ofspeech (PoS) tagging, термінологічна онтологія, мережа термінів It is considered an actual task of conceptualization and next formalization of unstructured text data that contained in thematic information flows distributed on the Internet. A network that built from the key terms of the text (the key words and phrases) is considered as an ontology model of the subject domain with which the texts are meaningfully related. It is proposed and applied a new method for extracting key words and phrases from thematic information flows and a new method for determining the directions of links between nodes in undirected networks of terms to a directed weighted network of terms. The proposed methods are characterized by the use of more extended processing of natural language, based on the classification process of words into parts of speech (Part-of-speech tagging). The idea of determining the weight values of links between nodes in the already built directed network of terms is also presented. Computer processing of text corpora and building of directed weighted networks of terms that preliminary extracted during the process of classification of words by parts of speech (Part-of-Speech tagging) and subsequent statistical weighing are presented as a holistic methodology. The proposed methodology is tested on the example of a famous folk European fairy tale «Little Red Cap» were retold by the Brothers Grimm. Applying the proposed method, key terms have been extracted and the directed weighted network of words and phrases, which correspond to separate key concepts in the researched text was built. Within the proposed ontological model, as expected, the key terms correspond to the title of the fairy tale, and the most important links correspond to the connections between these terms. Tabl.: 2. Fig.: 4. Refs: 15 titles. Розглянуто новий метод побудови термінологічних онтологій у вигляді мереж із ключових термінів (ключових слів і словосполучень) текстів, що змістовно пов’язані з певною предметною галуззю. Виокремлення ключових слів і словосполучень з тематичних текстових потоків і подальша побудова направленої зваженої мережі термінів здійснюються на основі застосування більш широкої обробки природної мови, що базується на розбитті на частини мови (Part-of-speech tagging). Комп’ютерну обробку текстових корпусів і побудову направлених зважених мереж термінів представлено у вигляді цілісної методики. У статті показано апробацію запропонованої методики на прикладі відомої народної європейської казки «Little Red Cap» і побудовано направ-лену зважену мережу зі слів і словосполучень, які відповідають окремим ключовим поняттям у досліджуваному творі. Інститут проблем реєстрації інформації НАН України 2020-12-29 Article Article application/pdf http://drsp.ipri.kiev.ua/article/view/225914 10.35681/1560-9189.2020.22.4.225914 Data Recording, Storage & Processing; Vol. 22 No. 4 (2020); 47-55 Регистрация, хранение и обработка данных; Том 22 № 4 (2020); 47-55 Реєстрація, зберігання і обробка даних; Том 22 № 4 (2020); 47-55 1560-9189 uk http://drsp.ipri.kiev.ua/article/view/225914/226089 Авторське право (c) 2021 Реєстрація, зберігання і обробка даних |
institution |
Data Recording, Storage & Processing |
baseUrl_str |
|
datestamp_date |
2021-03-10T16:12:52Z |
collection |
OJS |
language |
Ukrainian |
topic |
text corpus natural language processing part-of-speech (PoS) tagging terminological ontology network of terms |
spellingShingle |
text corpus natural language processing part-of-speech (PoS) tagging terminological ontology network of terms Дмитренко, О. О. Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging |
topic_facet |
text corpus natural language processing part-of-speech (PoS) tagging terminological ontology network of terms текстовий корпус обробка природньої мови Part-ofspeech (PoS) tagging термінологічна онтологія мережа термінів |
format |
Article |
author |
Дмитренко, О. О. |
author_facet |
Дмитренко, О. О. |
author_sort |
Дмитренко, О. О. |
title |
Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging |
title_short |
Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging |
title_full |
Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging |
title_fullStr |
Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging |
title_full_unstemmed |
Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging |
title_sort |
building directed weighted networks of terms with applying part-of-speech tagging |
title_alt |
Побудова направлених зважених мереж термінів із застосуванням Part-of-speech tagging |
description |
It is considered an actual task of conceptualization and next formalization of unstructured text data that contained in thematic information flows distributed on the Internet. A network that built from the key terms of the text (the key words and phrases) is considered as an ontology model of the subject domain with which the texts are meaningfully related. It is proposed and applied a new method for extracting key words and phrases from thematic information flows and a new method for determining the directions of links between nodes in undirected networks of terms to a directed weighted network of terms. The proposed methods are characterized by the use of more extended processing of natural language, based on the classification process of words into parts of speech (Part-of-speech tagging). The idea of determining the weight values of links between nodes in the already built directed network of terms is also presented. Computer processing of text corpora and building of directed weighted networks of terms that preliminary extracted during the process of classification of words by parts of speech (Part-of-Speech tagging) and subsequent statistical weighing are presented as a holistic methodology. The proposed methodology is tested on the example of a famous folk European fairy tale «Little Red Cap» were retold by the Brothers Grimm. Applying the proposed method, key terms have been extracted and the directed weighted network of words and phrases, which correspond to separate key concepts in the researched text was built. Within the proposed ontological model, as expected, the key terms correspond to the title of the fairy tale, and the most important links correspond to the connections between these terms. Tabl.: 2. Fig.: 4. Refs: 15 titles. |
publisher |
Інститут проблем реєстрації інформації НАН України |
publishDate |
2020 |
url |
http://drsp.ipri.kiev.ua/article/view/225914 |
work_keys_str_mv |
AT dmitrenkooo buildingdirectedweightednetworksoftermswithapplyingpartofspeechtagging AT dmitrenkooo pobudovanapravlenihzvaženihmerežtermínívízzastosuvannâmpartofspeechtagging |
first_indexed |
2025-07-17T10:58:10Z |
last_indexed |
2025-07-17T10:58:10Z |
_version_ |
1837891470174978048 |