Building Directed Weighted Networks of Terms with Applying Part-of-speech Tagging

It is considered an actual task of conceptualization and next formalization of unstructured text data that contained in thematic information flows distributed on the Internet. A network that built from the key terms of the text (the key words and phrases) is considered as an ontology model of the su...

Full description

Saved in:
Bibliographic Details
Date:2020
Main Author: Дмитренко, О. О.
Format: Article
Language:Ukrainian
Published: Інститут проблем реєстрації інформації НАН України 2020
Subjects:
Online Access:http://drsp.ipri.kiev.ua/article/view/225914
Tags: Add Tag
No Tags, Be the first to tag this record!
Journal Title:Data Recording, Storage & Processing

Institution

Data Recording, Storage & Processing
Description
Summary:It is considered an actual task of conceptualization and next formalization of unstructured text data that contained in thematic information flows distributed on the Internet. A network that built from the key terms of the text (the key words and phrases) is considered as an ontology model of the subject domain with which the texts are meaningfully related. It is proposed and applied a new method for extracting key words and phrases from thematic information flows and a new method for determining the directions of links between nodes in undirected networks of terms to a directed weighted network of terms. The proposed methods are characterized by the use of more extended processing of natural language, based on the classification process of words into parts of speech (Part-of-speech tagging). The idea of determining the weight values of links between nodes in the already built directed network of terms is also presented. Computer processing of text corpora and building of directed weighted networks of terms that preliminary extracted during the process of classification of words by parts of speech (Part-of-Speech tagging) and subsequent statistical weighing are presented as a holistic methodology. The proposed methodology is tested on the example of a famous folk European fairy tale «Little Red Cap» were retold by the Brothers Grimm. Applying the proposed method, key terms have been extracted and the directed weighted network of words and phrases, which correspond to separate key concepts in the researched text was built. Within the proposed ontological model, as expected, the key terms correspond to the title of the fairy tale, and the most important links correspond to the connections between these terms. Tabl.: 2. Fig.: 4. Refs: 15 titles.