Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging

It is considered an actual task of conceptualization and next formalization of unstructured text data that contained in thematic information flows distributed on the Internet. A network that built from the key terms of the text (the key words and phrases) is considered as an ontology model of the su...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2020
1. Verfasser: Дмитренко, О. О.
Format: Artikel
Sprache:Ukrainian
Veröffentlicht: Інститут проблем реєстрації інформації НАН України 2020
Schlagworte:
Online Zugang:http://drsp.ipri.kiev.ua/article/view/225914
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Data Recording, Storage & Processing

Institution

Data Recording, Storage & Processing
id drspiprikievua-article-225914
record_format ojs
spelling drspiprikievua-article-2259142021-03-10T16:12:52Z Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging Побудова направлених зважених мереж термінів із застосуванням Part-of-speech tagging Дмитренко, О. О. text corpus, natural language processing, part-of-speech (PoS) tagging, terminological ontology, network of terms текстовий корпус, обробка природньої мови, Part-ofspeech (PoS) tagging, термінологічна онтологія, мережа термінів It is considered an actual task of conceptualization and next formalization of unstructured text data that contained in thematic information flows distributed on the Internet. A network that built from the key terms of the text (the key words and phrases) is considered as an ontology model of the subject domain with which the texts are meaningfully related. It is proposed and applied a new method for extracting key words and phrases from thematic information flows and a new method for determining the directions of links between nodes in undirected networks of terms to a directed weighted network of terms. The proposed methods are characterized by the use of more extended processing of natural language, based on the classification process of words into parts of speech (Part-of-speech tagging). The idea of determining the weight values of links between nodes in the already built directed network of terms is also presented. Computer processing of text corpora and building of directed weighted networks of terms that preliminary extracted during the process of classification of words by parts of speech (Part-of-Speech tagging) and subsequent statistical weighing are presented as a holistic methodology. The proposed methodology is tested on the example of a famous folk European fairy tale «Little Red Cap» were retold by the Brothers Grimm. Applying the proposed method, key terms have been extracted and the directed weighted network of words and phrases, which correspond to separate key concepts in the researched text was built. Within the proposed ontological model, as expected, the key terms correspond to the title of the fairy tale, and the most important links correspond to the connections between these terms. Tabl.: 2. Fig.: 4. Refs: 15 titles. Розглянуто новий метод побудови термінологічних онтологій у вигляді мереж із ключових термінів (ключових слів і словосполучень) текстів, що змістовно пов’язані з певною предметною галуззю. Виокремлення ключових слів і словосполучень з тематичних текстових потоків і подальша побудова направленої зваженої мережі термінів здійснюються на основі застосування більш широкої обробки природної мови, що базується на розбитті на частини мови (Part-of-speech tagging). Комп’ютерну обробку текстових корпусів і побудову направлених зважених мереж термінів представлено у вигляді цілісної методики. У статті показано апробацію запропонованої методики на прикладі відомої народної європейської казки «Little Red Cap» і побудовано направ-лену зважену мережу зі слів і словосполучень, які відповідають окремим ключовим поняттям у досліджуваному творі. Інститут проблем реєстрації інформації НАН України 2020-12-29 Article Article application/pdf http://drsp.ipri.kiev.ua/article/view/225914 10.35681/1560-9189.2020.22.4.225914 Data Recording, Storage & Processing; Vol. 22 No. 4 (2020); 47-55 Регистрация, хранение и обработка данных; Том 22 № 4 (2020); 47-55 Реєстрація, зберігання і обробка даних; Том 22 № 4 (2020); 47-55 1560-9189 uk http://drsp.ipri.kiev.ua/article/view/225914/226089 Авторське право (c) 2021 Реєстрація, зберігання і обробка даних
institution Data Recording, Storage & Processing
baseUrl_str
datestamp_date 2021-03-10T16:12:52Z
collection OJS
language Ukrainian
topic text corpus
natural language processing
part-of-speech (PoS) tagging
terminological ontology
network of terms
spellingShingle text corpus
natural language processing
part-of-speech (PoS) tagging
terminological ontology
network of terms
Дмитренко, О. О.
Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging
topic_facet text corpus
natural language processing
part-of-speech (PoS) tagging
terminological ontology
network of terms
текстовий корпус
обробка природньої мови
Part-ofspeech (PoS) tagging
термінологічна онтологія
мережа термінів
format Article
author Дмитренко, О. О.
author_facet Дмитренко, О. О.
author_sort Дмитренко, О. О.
title Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging
title_short Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging
title_full Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging
title_fullStr Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging
title_full_unstemmed Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging
title_sort building directed weighted networks of terms with applying part-of-speech tagging
title_alt Побудова направлених зважених мереж термінів із застосуванням Part-of-speech tagging
description It is considered an actual task of conceptualization and next formalization of unstructured text data that contained in thematic information flows distributed on the Internet. A network that built from the key terms of the text (the key words and phrases) is considered as an ontology model of the subject domain with which the texts are meaningfully related. It is proposed and applied a new method for extracting key words and phrases from thematic information flows and a new method for determining the directions of links between nodes in undirected networks of terms to a directed weighted network of terms. The proposed methods are characterized by the use of more extended processing of natural language, based on the classification process of words into parts of speech (Part-of-speech tagging). The idea of determining the weight values of links between nodes in the already built directed network of terms is also presented. Computer processing of text corpora and building of directed weighted networks of terms that preliminary extracted during the process of classification of words by parts of speech (Part-of-Speech tagging) and subsequent statistical weighing are presented as a holistic methodology. The proposed methodology is tested on the example of a famous folk European fairy tale «Little Red Cap» were retold by the Brothers Grimm. Applying the proposed method, key terms have been extracted and the directed weighted network of words and phrases, which correspond to separate key concepts in the researched text was built. Within the proposed ontological model, as expected, the key terms correspond to the title of the fairy tale, and the most important links correspond to the connections between these terms. Tabl.: 2. Fig.: 4. Refs: 15 titles.
publisher Інститут проблем реєстрації інформації НАН України
publishDate 2020
url http://drsp.ipri.kiev.ua/article/view/225914
work_keys_str_mv AT dmitrenkooo buildingdirectedweightednetworksoftermswithapplyingpartofspeechtagging
AT dmitrenkooo pobudovanapravlenihzvaženihmerežtermínívízzastosuvannâmpartofspeechtagging
first_indexed 2025-07-17T10:58:10Z
last_indexed 2025-07-17T10:58:10Z
_version_ 1837891470174978048