Text Information Ontological Analysis in the Computer Simulation Systems

The paper proposes an approach for automation ontological analysis of text information using generalized iterative algorithm (GIA) of inductive modeling. The technology for the information collecting and sorting, which includes four phases: input data formalization (extract data from external source...

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2014
Автор: Bulgakova, O.
Формат: Стаття
Мова:English
Опубліковано: Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України 2014
Назва видання:Індуктивне моделювання складних систем
Онлайн доступ:http://dspace.nbuv.gov.ua/handle/123456789/83988
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:Digital Library of Periodicals of National Academy of Sciences of Ukraine
Цитувати:Text Information Ontological Analysis in the Computer Simulation Systems / O. Bulgakova // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2014. — Вип. 6. — С. 5-10. — Бібліогр.: 9 назв. — англ.

Репозитарії

Digital Library of Periodicals of National Academy of Sciences of Ukraine
id irk-123456789-83988
record_format dspace
spelling irk-123456789-839882015-07-03T03:01:43Z Text Information Ontological Analysis in the Computer Simulation Systems Bulgakova, O. The paper proposes an approach for automation ontological analysis of text information using generalized iterative algorithm (GIA) of inductive modeling. The technology for the information collecting and sorting, which includes four phases: input data formalization (extract data from external sources, their transformation and loading in the repository); data analysis (using the GIA inductive modeling); information definition to a specific ontology (ontology instance); new ontologies creation (instances) based on analyzed information. В статті пропонується підхід автоматизації онтологічного аналізу текстової інформації з використанням узагальненого ітераційного алгоритму (ОІА) індуктивного моделювання. Описано технологію збору і сортування інформації, яка включає в себе чотири етапи: формалізація вхідних даних (витягування даних із зовнішніх джерел, їх трансформація та завантаження в сховище); аналіз даних (за допомогою ОІА індуктивного моделювання); визначення інформації до конкретної онтології (екземпляру онтології); створення нових онтологій (екземплярів) на основі проаналізованої інформації. В статье предлагается подход автоматизации онтологического анализа текстовой информации с использованием обобщенного итерационного алгоритма (ОИА) индуктивного моделирования. Описана технология сбора и сортировки информации, которая включает в себя четыре этапа: формализация входных данных (извлечение данных из внешних источников, их трансформация и загрузка в хранилище); анализ данных (с помощью ОИА индуктивного моделирования); определение информации к конкретной онтологии (экземпляру онтологии); создание новых онтологий (экземпляров) на основе проанализированной информации. 2014 Article Text Information Ontological Analysis in the Computer Simulation Systems / O. Bulgakova // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2014. — Вип. 6. — С. 5-10. — Бібліогр.: 9 назв. — англ. XXXX-0044 http://dspace.nbuv.gov.ua/handle/123456789/83988 004.9 en Індуктивне моделювання складних систем Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
institution Digital Library of Periodicals of National Academy of Sciences of Ukraine
collection DSpace DC
language English
description The paper proposes an approach for automation ontological analysis of text information using generalized iterative algorithm (GIA) of inductive modeling. The technology for the information collecting and sorting, which includes four phases: input data formalization (extract data from external sources, their transformation and loading in the repository); data analysis (using the GIA inductive modeling); information definition to a specific ontology (ontology instance); new ontologies creation (instances) based on analyzed information.
format Article
author Bulgakova, O.
spellingShingle Bulgakova, O.
Text Information Ontological Analysis in the Computer Simulation Systems
Індуктивне моделювання складних систем
author_facet Bulgakova, O.
author_sort Bulgakova, O.
title Text Information Ontological Analysis in the Computer Simulation Systems
title_short Text Information Ontological Analysis in the Computer Simulation Systems
title_full Text Information Ontological Analysis in the Computer Simulation Systems
title_fullStr Text Information Ontological Analysis in the Computer Simulation Systems
title_full_unstemmed Text Information Ontological Analysis in the Computer Simulation Systems
title_sort text information ontological analysis in the computer simulation systems
publisher Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
publishDate 2014
url http://dspace.nbuv.gov.ua/handle/123456789/83988
citation_txt Text Information Ontological Analysis in the Computer Simulation Systems / O. Bulgakova // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2014. — Вип. 6. — С. 5-10. — Бібліогр.: 9 назв. — англ.
series Індуктивне моделювання складних систем
work_keys_str_mv AT bulgakovao textinformationontologicalanalysisinthecomputersimulationsystems
first_indexed 2025-07-06T10:52:51Z
last_indexed 2025-07-06T10:52:51Z
_version_ 1836894568513011712
fulltext Oleksandra Bulgakova Індуктивне моделювання складних систем, випуск 6, 2014 5 УДК 004.9 TEXT INFORMATION ONTOLOGICAL ANALYSIS IN THE COMPUTER SIMULATION SYSTEMS Oleksandra Bulgakova Mykolaiv V.O.Suhomlynsky National University, Nikolska str., 24, Mykolaiv, 54030, Ukraine sashabulgakova@list.ru В статті пропонується підхід автоматизації онтологічного аналізу текстової інформації з використанням узагальненого ітераційного алгоритму (ОІА) індуктивного моделювання. Описано технологію збору і сортування інформації, яка включає в себе чотири етапи: формалізація вхідних даних (витягування даних із зовнішніх джерел, їх трансформація та завантаження в сховище); аналіз даних (за допомогою ОІА індуктивного моделювання); визначення інформації до конкретної онтології (екземпляру онтології); створення нових онтологій (екземплярів) на основі проаналізованої інформації. Ключові слова: аналіз даних, онтологія предметної області, онтологічна інформація, індуктивне моделювання, узагальнений ітераційний алгоритм, структури даних, обробка та зберігання інформації. The paper proposes an approach for automation ontological analysis of text information using generalized iterative algorithm (GIA) of inductive modeling. The technology for the information collecting and sorting, which includes four phases: input data formalization (extract data from external sources, their transformation and loading in the repository); data analysis (using the GIA inductive modeling); information definition to a specific ontology (ontology instance); new ontologies creation (instances) based on analyzed information. Keywords: data mining, domain ontology, ontological information, generalized iterative algorithm , inductive modeling,, structures of data, handling and storing of information. В статье предлагается подход автоматизации онтологического анализа текстовой информации с использованием обобщенного итерационного алгоритма (ОИА) индуктивного моделирования. Описана технология сбора и сортировки информации, которая включает в себя четыре этапа: формализация входных данных (извлечение данных из внешних источников, их трансформация и загрузка в хранилище); анализ данных (с помощью ОИА индуктивного моделирования); определение информации к конкретной онтологии (экземпляру онтологии); создание новых онтологий (экземпляров) на основе проанализированной информации. Ключевые слова: анализ данных, онтология предметной области, онтологическая информация, индуктивное моделирование, обобщенный итерационный алгоритм, структуры данных, обработка и хранение информации. Introduction. With the growth of the accumulated information databases requires new data mining methods, algorithms and software for provide access to information, many of which should be classified as artificial intelligence systems � systems of knowledge processing. The development of adequate and relatively simple programs that will "extract" the knowledge of the data, will greatly facilitate the work of human. One of the most effective approaches to the text documents meaning detection and processing is the ontologies [1]. An ontology defines the terms used to describe and represent the knowledge of a particular subject area. Ontologies include computer processing for the basic concepts definition in the domain and the Text Information Ontological Analysis in the Computer Simulation System Індуктивне моделювання складних систем, випуск 6, 2014 6 relationships between them [2]. To obtain a database of ontologies and their models can be used inductive self�organization models based on experimental data (inductive modeling). This approach to modeling instead of the traditional deductive path "from the general laws operation of the facility − a particular mathematical model" is used an inductive approach "from specific observations − to the general model": the researcher hypothesizes about the possible models class and sets the criteria to choose the best models in this class. Computer processing allows to minimize the influence of subjective factors and get the model as an objective result [3�4]. Ontology model is obtained as algorithm result. 1. Domain ontology Formally, an ontology can be defined as the set of ),,,,( hCl RFFCLO = , where niii xwL ,1)},{( == : L � glossary domain, iw � term, ix � term rating relative to the other terms, С � concepts set, CLFl →)( � concepts function interpretation that associates each concept a terms set from the dictionary, hR � hierarchy relationship between the concepts [4]. The domain ontology describes the scientific knowledge domain, defined by specific subject. It may include a defined concepts hierarchy built on ontology concepts. All these hierarchies can be linked through associative relationships, some of which will be inherited from the basic technologies, and some will reflect the specifics of the subject area. Introducing concepts formal descriptions and problem domain in the concepts form and relations between them, the ontology should be asking structure for representing real-world objects and their relationships that composes the knowledge base. Thus, the data will be presented in the form set of information objects different types and the relationships between them. Information object, we assume a data representing a set of text information specific area, relevant to some notion of ontology. To determine the appropriate ontology, the text information will be analyzed using a generalized iterative algorithm of inductive modeling. 2. Collecting and sorting information technology Collecting and sorting information technology includes the following steps: 1. Formalizing the input data (extract data from external sources, their transformation and loading in the repository); 2. Data analysis/mining (using inductive modeling GIA) 3. Information determination to a specific ontology (ontology instance). 4. Create a new ontology (instances) on the basis information analyzed. Oleksandra Bulgakova Індуктивне моделювання складних систем, випуск 6, 2014 7 Step 1. Formalization submitting input data Data � is a presentation of facts and ideas in a formalized form suitable for transmission and processing of information in some process [6]. At this step, each document is represented as a set of terms, the set of documents is divided into subsets of documents similar topic (clusters), this results in terms of one subjects group. This allows to establish a relationship between terms and concepts. Each term is characterized by the frequency of occurrence (weight). Problem is solved using the algorithms of inductive modeling. To solve problems using algorithms inductive modeling inputs must be strictly formalized and reduced to a tabular form. To solve this problem, the data need to be extracted from external sources, transformed and downloaded into the repository. Deleting data � is a copying from the operational systems, documents and other sources, providing data integrity and uniqueness. Transformation involves the transformation of data to overall appearance, delete the errors, bind to dimensions. Transfer of transformed data storage is performed on the stage image. Integrated into the system can be used as data for the construction of direct reports, and further analysis using data mining algorithms. Then we analyze input data characteristics in the inductive modeling tasks on various parameters. In [7] used set theory to formalize the presentation of data at each construction models stage using GMDH algorithms. We have the following components (built using analysis method of structural identification [8]): ),( YXW = – data set (sequence N values random variable Y , that characterized M features X ) { } MmNnmnJJjw j ,1,,1,,,1,W ==⋅=== ; NW – norm data set { } JjwNW j ,1, == ; F – classes of models set { } KkfF k ,1, == ; G – generators structures models set { } LlgG l ,1, == ; P – set of parameter estimation structures methods { } RrpP r ,1, == ; CR – models criteria set { } Qqcrq ,1,CR == ; V – classification models set { } Ttvt ,1,V == . Then constructing set process of all possible models can be represented as a direct product of components sets VCRPGFNWWZ ××××××= . Some set elements Z , as described { }tqrlkjji vcrpgfwwz ,,,,,,= , TtQqRrLlKkJj ,1,,1,,1,,1,,1,,1 ====== , TQRLKJIIi ⋅⋅⋅⋅⋅== ,,1 , will be considered as specific data that have been stored in an environment at a particular passage full cycle simulation. Step 2: Analysis of data Documents clustering will be made on the basis of generalized iterative algorithm inductive modeling (GIA). Text Information Ontological Analysis in the Computer Simulation System Індуктивне моделювання складних систем, випуск 6, 2014 8 Let us briefly consider the iterative structure of algorithm used for solving the general problem of search for a better model under such formulation: )),ˆ,(,(minarg* ff XfyCRf θ Φ∈ = (1) where fθ̂ is an estimation of parameters for any partial model f∈Φ , CR is a model quality criterion for selection of optimal model. The set Φ of models being compared can be formed by various generators of model structures of diverse complexities. All structure generators developed within the GMDH framework naturally divided into two main groups – sorting�out and iterative ones which differ by techniques of variants generation and organization of search of a given criterion minimum. For simulation will be used the generalized iterative algorithm, GIA GMDH, fig.1 [9]. Formally, in the general case for layer r define the GIA GMDH as follows: 1) the input matrix is ),,,,...,( 111 m r F r r xxyyX K=+ , 2) apply the operators: FjiClyyfy F r j r i r l ,1,,,,2,1),,( 21 ===+ K (2) and mjFiFmlxyfy j r i r l ,1,,1,,,2,1),,(1 ====+ K (3) with a quadratic partial description .),( ;),( ;),( 2 5 2 43210 3210 210 vauauvavauaavufz uvavauaavufz vauaavufz +++++== +++== ++== (4) 3) for each description is the optimal structure (an example for the linear partial description): vdaudadavuf 322110),( ++= , (5) where 3,2,1, =kdk , }1,0{=kd are structural elements of the binary vector )( 321 dddd = taking values 1 or 0 (inclusion or not a relevant argument). Then the best model will describe: ),,( optdvuf , where 12,minarg ,1 −== = p l ql opt qCRd , ),,(),( optopt dvufvuf = (6) 4) the algorithm stops when the condition 1−> rr CRCR is checked, where 1, −rr CRCR are criterion values for the best models of (r–1)�th and r�th layers respectively. If the condition holds, then stop, otherwise jump to the next layer. Oleksandra Bulgakova Індуктивне моделювання складних систем, випуск 6, 2014 9 Fig.1. The generalized iterative algorithm schema Define the GIA GMDH as many iterative and iterative combinatorial algorithms, described by vector of three elements DM (Dialogue Mode), ІC (Iterative�Combinatorial), MR (Multilayered�Relaxative), ie any iterative algorithm is defined as a special case of a generalized: GIA (DM, IC, MR). This is possible with the help of specialized program complex of modeling based on iterative algorithms group method of data handling, which implemented the following features: automatic and interactive options for organization of user interface, management through the web interface, ensuring multiaccess. Constructed best model are presented by system for the graphic and semantic analysis, determined the effect of the arguments on the target factor, as well as analyzes and selects the most informative arguments [10]. Step 3: Definition of information to a specific ontology (ontology instance) After GIA finished will be obtained "ontology model". At this step, the text information will be analyzed with the help of the models obtained for each ontology (ontology instance) sorted. Each model will have its own threshold (minimum and maximum) value based on the error simulation. Thus, as a result of the phase is determined not only set the partition areas of knowledge, which will include text, but also the conformity degree of the relevant sections document, which gives reason to stop or continue the analysis. Step 4: Creation of the new ontology At this stage, we have the opportunity to create new instances of ontologies, which are not in the current knowledge base. After the formalization of the input data and analysis can remain documents that were not related to any category. Such documents will be stored in a special data warehouse and analyzed at regular Text Information Ontological Analysis in the Computer Simulation System Індуктивне моделювання складних систем, випуск 6, 2014 10 intervals on the basis of which will constitute a glossary of terms. The dictionary will be stored semantic information, which will link elements of the dictionary, highlighting at the same time a new class of problem and domain. 3. Conclusion The article describes the approach for automation ontological analysis of text information using generalized iterative algorithm (GIA) of inductive modeling. The technology for the information collecting and sorting, which includes four phases: input data formalization (extract data from external sources, their transformation and loading in the repository); data analysis (using the GIA inductive modeling); information definition to a specific ontology (ontology instance); new ontologies creation (instances) based on analyzed information. References 1. Baeza�Yates R., Ribeiro�Neto B. Modern Information Retrieval. ACM Press, 1999. 2. T.R. Gruber. A translation approach to portable ontology specifications. 1. Acquisition, 5(2), 1993. 2. Степашко В.С. Теоретические аспекты МГУА как метода индуктивного моделирования // УсиМ. – 2003. – №2.– С. 3. Bulgakova O., Kordik P. Methods of true data mining model selection – with experimental results // Proceedings of 3rd International Workshop on Inductive Modelling IWIM�2009, 14�19 September 2009, Krynica, Poland. – Prague: Czech Technical University, 2009. – P. 23�27. 4. Zakharova I.V., Melnikov A.V.,Vokhmitsev J.A. «An approach to automated ontology building in text analysis problems».//Workshop on computer Science and Information Technologies CSIT'2006, Karlsruhe, Germany, 2006. P.177�178. 5. http://wikipedia.org/ 6. Щербакова Н.В Формалізація структур зберігання інформації в задачах індуктивного моделювання // Моделювання та керування станом еколого�економічних систем регіону. Збірник праць. К.: МННЦІТС, 2009. – С. 229� 234. 7. Ефименко С.Н., Степашко В.С. Имитационный эксперимент как средство для исследования эффективности методов моделирования по данным наблюдений // УСІМ. –2009. – №1. –С. 69�78. 8. Stepashko V.S., Bulgakova O.S. Generalized iterative algorithm of the group method of data handling // USiM. – 2013. – № 2. – P: 5�18. 9. Bulgakova O.S., Zosimiv V.V., Stepashko V.S. Program complex modeling of complex systems based on iterative algorithms with the ability of GMDH network access: 14�th International conference SAIT 2012, Kyiv, Ukraine, 176�178 p.