A method for extracting data from semis-tructured documents

A method for extracting data from semis-tructured documents

Linguistic method to solve the problem of data extraction from weakly structured documents is developed, approved, and described in detail in the paper. Sample data were taken from thesis catalogue of Vernadsky National Library of Ukraine. The sequence of all stages is described: document collection...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Datum:	2020
Hauptverfasser:	Kudim, K.A., Proskudina, G.Yu.
Format:	Artikel
Sprache:	rus
Veröffentlicht:	Інститут програмних систем НАН України 2020
Schlagworte:	weakly structured documents information extraction linguistic analyzer syntactic analyzer morphological analysis context-free grammar UDC 004.82
Online Zugang:	https://pp.isofts.kiev.ua/index.php/ojs1/article/view/388
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:	Problems in programming

Institution

Problems in programming

Ähnliche Einträge

Methods and tools for extracting personal data from theses abstracts
von: Kudim, K.A., et al.
Veröffentlicht: (2019)

Extracting structure from text documents based on machine learning
von: Kudim, K.A., et al.
Veröffentlicht: (2023)

About technologies of use of external data on creating and editing of encyclopedic texts
von: Proskudina, G.Yu., et al.
Veröffentlicht: (2018)

Mixed topic-entity ontology for enhanced topic vector-spaced model
von: Shabinskiy, A.S.
Veröffentlicht: (2025)

Overview of global open access resource aggregation services and their requirements for data providers
von: Proskudina, G.Yu., et al.
Veröffentlicht: (2025)

Global open access resource aggregation services and their requirements for data providers
von: Proskudina, G.Yu., et al.
Veröffentlicht: (2024)

Decompositional Extraction and Retrieval of Conceptual Knowledge
von: Terletskyi, D.O., et al.
Veröffentlicht: (2023)

Use of domain ontology for homonymy clarification into the natural language texts
von: Lesko, O.N., et al.
Veröffentlicht: (2018)

Review of methods of events extraction «from the stream of news»
von: Pryshchepa, S. V.
Veröffentlicht: (2015)

A method of tuning programs on .Net platform with rewriting rules
von: Mamedov, T.A., et al.
Veröffentlicht: (2019)

The technology of new events extraction on a defined topic from Twitter social network
von: Pryshchepa, S. V.
Veröffentlicht: (2017)

Extracting structure from text documents based on machine learning
von: Kudim, K.A., et al.
Veröffentlicht: (2022)

CREATING THE RT-32 RADIO TELESCOPE ON THE BASIC OF MARK-4B ANTENNA SYSTEM. 2. ESTIMATION OF THE POSSIBILITY FOR MAKING SPECTRAL OBSERVATIONS OF RADIO ASTRONOMICAL OBJECTS
von: Antyufeyev, A. V., et al.
Veröffentlicht: (2019)

INTERSTELLAR MEDIUM AND DECAMETER RADIO SPECTROSCOPY
von: Stepkin, S. V., et al.
Veröffentlicht: (2021)

Automated extraction of structured information from a variety of web pages
von: Pogorilyy, S.D., et al.
Veröffentlicht: (2018)

Ontological similar systems for analysis of texts of natural language
von: Kryvyi, S.L., et al.
Veröffentlicht: (2018)

The definition of formal languages in the meta language of normal forms of knowledge
von: Kurgaev, A.F., et al.
Veröffentlicht: (2018)

The main functional blocks of the test bench for the archival electronic documents validation
von: Melaschenko, A.O., et al.
Veröffentlicht: (2018)

PROSPECTS TO THERMAL WATERS EXTRACTION AT ILLICHIVSK OF ODESA REGION
von: DIDKIVSKA, G.G., et al.
Veröffentlicht: (2013)

UWN: The ontological basе of knowledge of the Ukrainian language
von: Anisіmov, A.V., et al.
Veröffentlicht: (2015)

Analysis of formal models and standards for structured electronic document in corporate informational system
von: Sharypanov, A.V., et al.
Veröffentlicht: (2018)

A method for extracting data from semistructured documents
von: K. A. Kudim, et al.
Veröffentlicht: (2020)

Actual problems of long-term preservation of documentation in insurance fund of documentation of Ukraine
von: Podorozhnyi, V. I.
Veröffentlicht: (2016)

DIRECTIVITY OF ANTENNA ARRAYS
von: Bulgakovа, A. A., et al.
Veröffentlicht: (2016)

On equivalence of some subcategories of modules in Morita contexts
von: Kashu, A. I.
Veröffentlicht: (2018)

Estimation Method for Compatibility of Normative Documents
von: Mezentsev, O. V.
Veröffentlicht: (2014)

Anti-proliferative effects of a blueberry extract on a panel of tumor cell lines of different origin
von: Lamdan, H., et al.
Veröffentlicht: (2023)

The implementation of legal electronic documents
von: Melaschenko, A.O., et al.
Veröffentlicht: (2015)

Performance analysis of a new LP stage located upstream the extraction point in a 225 MW turbine
von: Шиманяк, М., et al.
Veröffentlicht: (2016)

Performance analysis of a new LP stage located upstream the extraction point in a 225 MW turbine
von: Шиманяк, М., et al.
Veröffentlicht: (2016)

Methods and software for significant indicators determination of the natural language texts author profile
von: Shynkarenko, V.I., et al.
Veröffentlicht: (2023)

Some issues of registration and reproduction of information touching upon objects of material and spiritual culture using technologies of the state insurance documentation fund of Ukraine.
von: Babenko, V. V., et al.
Veröffentlicht: (2019)

Морфологiчнi характеристики трифторидiв залiза рiзного ступеня гiдратацiї, отриманих гiдротермальним методом
von: Moklyak, V. V., et al.
Veröffentlicht: (2019)

An approach of intelligent searching of information in texts
von: Chebanuyk, O.V.
Veröffentlicht: (2023)

Method of information obtaining from ontology on the basis of a natural language phrase analysis
von: Litvin, A.A., et al.
Veröffentlicht: (2020)

Scientific documents metadata as a component of the system of the “open science” information resources
von: Zakharova, O.V.
Veröffentlicht: (2023)

Antiproliferative and apoptotic effect of ethanolic extract of Calocybe indica on PANC-1 and MIAPaCa2 cell lines of pancreatic cancer
von: Ghosh, S.K., et al.
Veröffentlicht: (2023)

Fuzzy system for determining the quality of digital images of documents to be microfilmed
von: Egorov, P. N.
Veröffentlicht: (2016)

Methods and tools for extracting personal data from theses abstracts
von: K. A. Kudim, et al.
Veröffentlicht: (2019)

Metastatic cardiac tumors: literature review and own observation of testicular tumor metastasis in the right ventricle of the heart
von: Zakhartseva, L.M., et al.
Veröffentlicht: (2018)