APPROACHES TO PARSING NON-STANDARDLY ORGANIZED GEOMETRIC DATA
Abstract
The article is devoted to investigating the problem of the structural gap between non-standardly organized geometric datasets and the requirements of software tools for statistical analysis. A review of scientific sources related to the topic of the study and adjacent research areas is conducted. The methodology and approaches described in the paper are implemented using the Canonical Polyhedra dataset as an example, which contains metric and topological characteristics of 2,907 polyhedra for the Wolfram Mathematica environment. The primary feature of the dataset under investigation is its specific structure, namely the representation of spatial objects not in the form of traditional tables but as abstract syntax trees. The article provides a detailed analysis of the dataset architecture, identifying its principal data nodes and highlighting symbolic mathematical expressions that may potentially create difficulties during parsing. In the dataset, some characteristics of geometric figures are represented not numerically but symbolically. While this ensures absolute mathematical precision, it also makes the data unsuitable for automated computation without prior transformation. Accordingly, the limitations of analyzing such a dataset using standard data-processing libraries are further substantiated. The core of the research methodology is the practical development of a recursive parsing algorithm demonstrated on a specific dataset. The algorithm is described in a universal and generalized form, making it applicable to a broader range of related tasks. The paper outlines the logic of traversing syntax tree nodes, identifying their headers, and normalizing data types. The result of this approach is the transformation of an abstract syntax tree into a normalized table suitable for further statistical analysis. The practical significance of the study lies in identifying universal approaches to parsing complex nested data hierarchies, which often serve as valuable sources of information about spatial geometric figures for further investigation and application in fields such as computational geometry, machine learning, and related disciplines.
Keywords: dataset, statistical analysis, parsing, computer mathematics system, polyhedron.




