Extracting structure from text documents based on machine learning

This study is devoted to a method that facilitates the task of extracting structure from the text documents using an artificial neural network. The method consists of data preparation, building and training the model and results evaluation. Data preparation includes collecting corpora of documents...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Datum:	2023
Hauptverfasser:	Kudim, K.A., Proskudina, G.Yu.
Format:	Artikel
Sprache:	English
Veröffentlicht:	Інститут програмних систем НАН України 2023
Schlagworte:	natural language processing information extraction machine learning neural network UDC 004.82
Online Zugang:	https://pp.isofts.kiev.ua/index.php/ojs1/article/view/517
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:	Problems in programming

Institution

Problems in programming

Beschreibung
Zusammenfassung:	This study is devoted to a method that facilitates the task of extracting structure from the text documents using an artificial neural network. The method consists of data preparation, building and training the model and results evaluation. Data preparation includes collecting corpora of documents, converting a variety of file formats into plain text, and manual labeling each document structure. Then documents are split into tokens and into paragraphs. The text paragraphs are represented as feature vectors to provide input to the neural network. The model is trained and validated on the selected data subsets. Trained model results evaluation is presented. The final performance is calculated per label using precision, recall, and F1 measures, and overall average. The trained model can be used to extract sections of documents bearing similar structure.Prombles in programming 2022; 3-4: 154-160

Extracting structure from text documents based on machine learning

Institution

Ähnliche Einträge