Semantic Indexing and Cluster Analysis of Cybersecurity Documents

This study examines methods for extracting concepts from textual messages and constructing semantic networks for text data analysis, specifically within the context of cyberthreats. The semantic networks are essential tools for identifying key concepts and their relationships which provide a better...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2024
Hauptverfasser: Ланде, Д. В., Рибак, О. О.
Format: Artikel
Sprache:Ukrainian
Veröffentlicht: Інститут проблем реєстрації інформації НАН України 2024
Schlagworte:
Online Zugang:http://drsp.ipri.kiev.ua/article/view/316711
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Data Recording, Storage & Processing

Institution

Data Recording, Storage & Processing
Beschreibung
Zusammenfassung:This study examines methods for extracting concepts from textual messages and constructing semantic networks for text data analysis, specifically within the context of cyberthreats. The semantic networks are essential tools for identifying key concepts and their relationships which provide a better understanding of the relationships between concepts and help uncover critical data such as hacker group names, malicious programs, vulnerabilities, and other threats. Such an approach can be applied in cybersecurity, where textual information can contain vital data for preventing and responding to cyber threats. The focus is on the use of large language models (LLMs) that enable automated extraction of entities and the construction of concept networks. Utilizing LLMs for information extraction from text data helps create networks of relationships that can be used to analyze causal links between events and objects, detect interdependencies, and structure information. These networks can be further employed for cluster analysis, allowing for the automatic grouping of nodes by similarity and the identification of new patterns in the data. The research also addresses the construction of document proximity networks, which assess the degree of similarity between texts based on their semantic structures. This enables the identification of thematically related documents that may contain significant information for analysis, as well as the detection of informational chains and key trends within large textual datasets. By applying the methods described in the article, it is possible to effectively structure and analyze large volumes of textual information in cybersecurity, facilitating quicker threat detection and the formulation of strategies for prevention. This approach also allows for the streamline of many stages of analytical work to do, thereby enhancing the efficiency of big data analysis. Fig.: 3. Refs: 11 titles.