Text Information Ontological Analysis in the Computer Simulation Systems
The paper proposes an approach for automation ontological analysis of text information using generalized iterative algorithm (GIA) of inductive modeling. The technology for the information collecting and sorting, which includes four phases: input data formalization (extract data from external source...
Збережено в:
Дата: | 2014 |
---|---|
Автор: | |
Формат: | Стаття |
Мова: | English |
Опубліковано: |
Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
2014
|
Назва видання: | Індуктивне моделювання складних систем |
Онлайн доступ: | http://dspace.nbuv.gov.ua/handle/123456789/83988 |
Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
Назва журналу: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
Цитувати: | Text Information Ontological Analysis in the Computer Simulation Systems / O. Bulgakova // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2014. — Вип. 6. — С. 5-10. — Бібліогр.: 9 назв. — англ. |
Репозитарії
Digital Library of Periodicals of National Academy of Sciences of Ukraineid |
irk-123456789-83988 |
---|---|
record_format |
dspace |
spelling |
irk-123456789-839882015-07-03T03:01:43Z Text Information Ontological Analysis in the Computer Simulation Systems Bulgakova, O. The paper proposes an approach for automation ontological analysis of text information using generalized iterative algorithm (GIA) of inductive modeling. The technology for the information collecting and sorting, which includes four phases: input data formalization (extract data from external sources, their transformation and loading in the repository); data analysis (using the GIA inductive modeling); information definition to a specific ontology (ontology instance); new ontologies creation (instances) based on analyzed information. В статті пропонується підхід автоматизації онтологічного аналізу текстової інформації з використанням узагальненого ітераційного алгоритму (ОІА) індуктивного моделювання. Описано технологію збору і сортування інформації, яка включає в себе чотири етапи: формалізація вхідних даних (витягування даних із зовнішніх джерел, їх трансформація та завантаження в сховище); аналіз даних (за допомогою ОІА індуктивного моделювання); визначення інформації до конкретної онтології (екземпляру онтології); створення нових онтологій (екземплярів) на основі проаналізованої інформації. В статье предлагается подход автоматизации онтологического анализа текстовой информации с использованием обобщенного итерационного алгоритма (ОИА) индуктивного моделирования. Описана технология сбора и сортировки информации, которая включает в себя четыре этапа: формализация входных данных (извлечение данных из внешних источников, их трансформация и загрузка в хранилище); анализ данных (с помощью ОИА индуктивного моделирования); определение информации к конкретной онтологии (экземпляру онтологии); создание новых онтологий (экземпляров) на основе проанализированной информации. 2014 Article Text Information Ontological Analysis in the Computer Simulation Systems / O. Bulgakova // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2014. — Вип. 6. — С. 5-10. — Бібліогр.: 9 назв. — англ. XXXX-0044 http://dspace.nbuv.gov.ua/handle/123456789/83988 004.9 en Індуктивне моделювання складних систем Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України |
institution |
Digital Library of Periodicals of National Academy of Sciences of Ukraine |
collection |
DSpace DC |
language |
English |
description |
The paper proposes an approach for automation ontological analysis of text information using generalized iterative algorithm (GIA) of inductive modeling. The technology for the information collecting and sorting, which includes four phases: input data formalization (extract data from external sources, their transformation and loading in the repository); data analysis (using the GIA inductive modeling); information definition to a specific ontology (ontology instance); new ontologies creation (instances) based on analyzed information. |
format |
Article |
author |
Bulgakova, O. |
spellingShingle |
Bulgakova, O. Text Information Ontological Analysis in the Computer Simulation Systems Індуктивне моделювання складних систем |
author_facet |
Bulgakova, O. |
author_sort |
Bulgakova, O. |
title |
Text Information Ontological Analysis in the Computer Simulation Systems |
title_short |
Text Information Ontological Analysis in the Computer Simulation Systems |
title_full |
Text Information Ontological Analysis in the Computer Simulation Systems |
title_fullStr |
Text Information Ontological Analysis in the Computer Simulation Systems |
title_full_unstemmed |
Text Information Ontological Analysis in the Computer Simulation Systems |
title_sort |
text information ontological analysis in the computer simulation systems |
publisher |
Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України |
publishDate |
2014 |
url |
http://dspace.nbuv.gov.ua/handle/123456789/83988 |
citation_txt |
Text Information Ontological Analysis in the Computer Simulation Systems / O. Bulgakova // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2014. — Вип. 6. — С. 5-10. — Бібліогр.: 9 назв. — англ. |
series |
Індуктивне моделювання складних систем |
work_keys_str_mv |
AT bulgakovao textinformationontologicalanalysisinthecomputersimulationsystems |
first_indexed |
2025-07-06T10:52:51Z |
last_indexed |
2025-07-06T10:52:51Z |
_version_ |
1836894568513011712 |
fulltext |
Oleksandra Bulgakova
Індуктивне моделювання складних систем, випуск 6, 2014 5
УДК 004.9
TEXT INFORMATION ONTOLOGICAL ANALYSIS IN THE
COMPUTER SIMULATION SYSTEMS
Oleksandra Bulgakova
Mykolaiv V.O.Suhomlynsky National University, Nikolska str., 24, Mykolaiv, 54030, Ukraine
sashabulgakova@list.ru
В статті пропонується підхід автоматизації онтологічного аналізу текстової
інформації з використанням узагальненого ітераційного алгоритму (ОІА) індуктивного
моделювання. Описано технологію збору і сортування інформації, яка включає в себе чотири
етапи: формалізація вхідних даних (витягування даних із зовнішніх джерел, їх
трансформація та завантаження в сховище); аналіз даних (за допомогою ОІА індуктивного
моделювання); визначення інформації до конкретної онтології (екземпляру онтології);
створення нових онтологій (екземплярів) на основі проаналізованої інформації.
Ключові слова: аналіз даних, онтологія предметної області, онтологічна інформація,
індуктивне моделювання, узагальнений ітераційний алгоритм, структури даних, обробка та
зберігання інформації.
The paper proposes an approach for automation ontological analysis of text information
using generalized iterative algorithm (GIA) of inductive modeling. The technology for the
information collecting and sorting, which includes four phases: input data formalization (extract
data from external sources, their transformation and loading in the repository); data analysis (using
the GIA inductive modeling); information definition to a specific ontology (ontology instance); new
ontologies creation (instances) based on analyzed information.
Keywords: data mining, domain ontology, ontological information, generalized iterative
algorithm , inductive modeling,, structures of data, handling and storing of information.
В статье предлагается подход автоматизации онтологического анализа текстовой
информации с использованием обобщенного итерационного алгоритма (ОИА) индуктивного
моделирования. Описана технология сбора и сортировки информации, которая включает в
себя четыре этапа: формализация входных данных (извлечение данных из внешних
источников, их трансформация и загрузка в хранилище); анализ данных (с помощью ОИА
индуктивного моделирования); определение информации к конкретной онтологии
(экземпляру онтологии); создание новых онтологий (экземпляров) на основе
проанализированной информации.
Ключевые слова: анализ данных, онтология предметной области, онтологическая
информация, индуктивное моделирование, обобщенный итерационный алгоритм,
структуры данных, обработка и хранение информации.
Introduction. With the growth of the accumulated information databases
requires new data mining methods, algorithms and software for provide access to
information, many of which should be classified as artificial intelligence systems �
systems of knowledge processing. The development of adequate and relatively simple
programs that will "extract" the knowledge of the data, will greatly facilitate the work
of human.
One of the most effective approaches to the text documents meaning detection
and processing is the ontologies [1]. An ontology defines the terms used to describe
and represent the knowledge of a particular subject area. Ontologies include
computer processing for the basic concepts definition in the domain and the
Text Information Ontological Analysis in the Computer Simulation System
Індуктивне моделювання складних систем, випуск 6, 2014 6
relationships between them [2]. To obtain a database of ontologies and their models
can be used inductive self�organization models based on experimental data
(inductive modeling). This approach to modeling instead of the traditional deductive
path "from the general laws operation of the facility − a particular mathematical
model" is used an inductive approach "from specific observations − to the general
model": the researcher hypothesizes about the possible models class and sets the
criteria to choose the best models in this class. Computer processing allows to
minimize the influence of subjective factors and get the model as an objective result
[3�4]. Ontology model is obtained as algorithm result.
1. Domain ontology
Formally, an ontology can be defined as the set of
),,,,( hCl RFFCLO = , where niii xwL ,1)},{( == :
L � glossary domain,
iw � term,
ix � term rating relative to the other terms,
С � concepts set,
CLFl →)( � concepts function interpretation that associates each concept a
terms set from the dictionary,
hR � hierarchy relationship between the concepts [4].
The domain ontology describes the scientific knowledge domain, defined by
specific subject. It may include a defined concepts hierarchy built on ontology
concepts. All these hierarchies can be linked through associative relationships, some
of which will be inherited from the basic technologies, and some will reflect the
specifics of the subject area. Introducing concepts formal descriptions and problem
domain in the concepts form and relations between them, the ontology should be
asking structure for representing real-world objects and their relationships that
composes the knowledge base.
Thus, the data will be presented in the form set of information objects different
types and the relationships between them. Information object, we assume a data
representing a set of text information specific area, relevant to some notion of
ontology. To determine the appropriate ontology, the text information will be
analyzed using a generalized iterative algorithm of inductive modeling.
2. Collecting and sorting information technology
Collecting and sorting information technology includes the following steps:
1. Formalizing the input data (extract data from external sources, their
transformation and loading in the repository);
2. Data analysis/mining (using inductive modeling GIA)
3. Information determination to a specific ontology (ontology instance).
4. Create a new ontology (instances) on the basis information analyzed.
Oleksandra Bulgakova
Індуктивне моделювання складних систем, випуск 6, 2014 7
Step 1. Formalization submitting input data
Data � is a presentation of facts and ideas in a formalized form suitable for
transmission and processing of information in some process [6].
At this step, each document is represented as a set of terms, the set of
documents is divided into subsets of documents similar topic (clusters), this results in
terms of one subjects group. This allows to establish a relationship between terms and
concepts. Each term is characterized by the frequency of occurrence (weight).
Problem is solved using the algorithms of inductive modeling. To solve
problems using algorithms inductive modeling inputs must be strictly formalized and
reduced to a tabular form. To solve this problem, the data need to be extracted from
external sources, transformed and downloaded into the repository. Deleting data � is
a copying from the operational systems, documents and other sources, providing data
integrity and uniqueness. Transformation involves the transformation of data to
overall appearance, delete the errors, bind to dimensions. Transfer of transformed
data storage is performed on the stage image. Integrated into the system can be used
as data for the construction of direct reports, and further analysis using data mining
algorithms.
Then we analyze input data characteristics in the inductive modeling tasks on
various parameters.
In [7] used set theory to formalize the presentation of data at each construction
models stage using GMDH algorithms. We have the following components (built
using analysis method of structural identification [8]):
),( YXW = – data set (sequence N values random variable Y , that
characterized M features X )
{ } MmNnmnJJjw j ,1,,1,,,1,W ==⋅=== ;
NW – norm data set { } JjwNW j ,1, == ;
F – classes of models set { } KkfF k ,1, == ;
G – generators structures models set { } LlgG l ,1, == ;
P – set of parameter estimation structures methods { } RrpP r ,1, == ;
CR – models criteria set { } Qqcrq ,1,CR == ;
V – classification models set { } Ttvt ,1,V == .
Then constructing set process of all possible models can be represented as a
direct product of components sets VCRPGFNWWZ ××××××= . Some set
elements Z , as described { }tqrlkjji vcrpgfwwz ,,,,,,= ,
TtQqRrLlKkJj ,1,,1,,1,,1,,1,,1 ====== , TQRLKJIIi ⋅⋅⋅⋅⋅== ,,1 , will be
considered as specific data that have been stored in an environment at a particular
passage full cycle simulation.
Step 2: Analysis of data
Documents clustering will be made on the basis of generalized iterative
algorithm inductive modeling (GIA).
Text Information Ontological Analysis in the Computer Simulation System
Індуктивне моделювання складних систем, випуск 6, 2014 8
Let us briefly consider the iterative structure of algorithm used for solving the
general problem of search for a better model under such formulation:
)),ˆ,(,(minarg*
ff
XfyCRf θ
Φ∈
= (1)
where fθ̂ is an estimation of parameters for any partial model f∈Φ , CR is a
model quality criterion for selection of optimal model.
The set Φ of models being compared can be formed by various generators of
model structures of diverse complexities. All structure generators developed within
the GMDH framework naturally divided into two main groups – sorting�out and
iterative ones which differ by techniques of variants generation and organization of
search of a given criterion minimum. For simulation will be used the generalized
iterative algorithm, GIA GMDH, fig.1 [9].
Formally, in the general case for layer r define the GIA GMDH as follows:
1) the input matrix is ),,,,...,( 111 m
r
F
r
r xxyyX K=+ ,
2) apply the operators:
FjiClyyfy F
r
j
r
i
r
l ,1,,,,2,1),,( 21 ===+ K (2)
and
mjFiFmlxyfy j
r
i
r
l ,1,,1,,,2,1),,(1 ====+ K (3)
with a quadratic partial description
.),(
;),(
;),(
2
5
2
43210
3210
210
vauauvavauaavufz
uvavauaavufz
vauaavufz
+++++==
+++==
++==
(4)
3) for each description is the optimal structure (an example for the linear partial
description):
vdaudadavuf 322110),( ++= , (5)
where 3,2,1, =kdk , }1,0{=kd are structural elements of the binary vector
)( 321 dddd = taking values 1 or 0 (inclusion or not a relevant argument). Then the
best model will describe: ),,( optdvuf , where
12,minarg
,1
−==
=
p
l
ql
opt qCRd , ),,(),( optopt dvufvuf = (6)
4) the algorithm stops when the condition 1−> rr CRCR is checked, where
1, −rr CRCR are criterion values for the best models of (r–1)�th and r�th layers
respectively. If the condition holds, then stop, otherwise jump to the next layer.
Oleksandra Bulgakova
Індуктивне моделювання складних систем, випуск 6, 2014 9
Fig.1. The generalized iterative algorithm schema
Define the GIA GMDH as many iterative and iterative combinatorial
algorithms, described by vector of three elements DM (Dialogue Mode), ІC
(Iterative�Combinatorial), MR (Multilayered�Relaxative), ie any iterative algorithm
is defined as a special case of a generalized: GIA (DM, IC, MR). This is possible
with the help of specialized program complex of modeling based on iterative
algorithms group method of data handling, which implemented the following
features: automatic and interactive options for organization of user interface,
management through the web interface, ensuring multiaccess. Constructed best model
are presented by system for the graphic and semantic analysis, determined the effect
of the arguments on the target factor, as well as analyzes and selects the most
informative arguments [10].
Step 3: Definition of information to a specific ontology (ontology instance)
After GIA finished will be obtained "ontology model". At this step, the text
information will be analyzed with the help of the models obtained for each ontology
(ontology instance) sorted. Each model will have its own threshold (minimum and
maximum) value based on the error simulation. Thus, as a result of the phase is
determined not only set the partition areas of knowledge, which will include text, but
also the conformity degree of the relevant sections document, which gives reason to
stop or continue the analysis.
Step 4: Creation of the new ontology
At this stage, we have the opportunity to create new instances of ontologies,
which are not in the current knowledge base. After the formalization of the input data
and analysis can remain documents that were not related to any category. Such
documents will be stored in a special data warehouse and analyzed at regular
Text Information Ontological Analysis in the Computer Simulation System
Індуктивне моделювання складних систем, випуск 6, 2014 10
intervals on the basis of which will constitute a glossary of terms. The dictionary will
be stored semantic information, which will link elements of the dictionary,
highlighting at the same time a new class of problem and domain.
3. Conclusion
The article describes the approach for automation ontological analysis of text
information using generalized iterative algorithm (GIA) of inductive modeling. The
technology for the information collecting and sorting, which includes four phases:
input data formalization (extract data from external sources, their transformation and
loading in the repository); data analysis (using the GIA inductive modeling);
information definition to a specific ontology (ontology instance); new ontologies
creation (instances) based on analyzed information.
References
1. Baeza�Yates R., Ribeiro�Neto B. Modern Information Retrieval. ACM
Press, 1999.
2. T.R. Gruber. A translation approach to portable ontology specifications.
1. Acquisition, 5(2), 1993.
2. Степашко В.С. Теоретические аспекты МГУА как метода
индуктивного моделирования // УсиМ. – 2003. – №2.– С.
3. Bulgakova O., Kordik P. Methods of true data mining model selection –
with experimental results // Proceedings of 3rd International Workshop on Inductive
Modelling IWIM�2009, 14�19 September 2009, Krynica, Poland. – Prague: Czech
Technical University, 2009. – P. 23�27.
4. Zakharova I.V., Melnikov A.V.,Vokhmitsev J.A. «An approach to
automated ontology building in text analysis problems».//Workshop on computer
Science and Information Technologies CSIT'2006, Karlsruhe, Germany, 2006.
P.177�178.
5. http://wikipedia.org/
6. Щербакова Н.В Формалізація структур зберігання інформації в
задачах індуктивного моделювання // Моделювання та керування станом
еколого�економічних систем регіону. Збірник праць. К.: МННЦІТС, 2009. – С.
229� 234.
7. Ефименко С.Н., Степашко В.С. Имитационный эксперимент как
средство для исследования эффективности методов моделирования по данным
наблюдений // УСІМ. –2009. – №1. –С. 69�78.
8. Stepashko V.S., Bulgakova O.S. Generalized iterative algorithm of the
group method of data handling // USiM. – 2013. – № 2. – P: 5�18.
9. Bulgakova O.S., Zosimiv V.V., Stepashko V.S. Program complex modeling
of complex systems based on iterative algorithms with the ability of GMDH network
access: 14�th International conference SAIT 2012, Kyiv, Ukraine, 176�178 p.
|