COMPOSITIONAL GEOMETRIC METHOD OF INFORMATION ANALYSIS AND ITS APPLICATION WHEN WORKING WITH BIG DATA
Abstract
The article proposes a composite geometric method for analysis of information in Big Data sets at the stage of their primary processing and “cleaning”. The method is based on the methods of the Baluba-Naydysh point calculus is a preparatory stage when using the structural geometric modelling of Big Data.
For effective analysis of Big Data, it is important to use appropriate sorting algorithms by number in certain clusters (groups). In each cluster, the points of the database have the same (within a certain tolerance for deviation), characteristics-coordinates that define them. Using clusters with a large number of points, you can determine the course of the process, identify trends in its development. Clusters with a relatively small number of points, as a result of the analysis, can be excluded from consideration, as those that do not significantly affect the development of the situation. The representation (of objects) of any database in the form of points that have, in quantity and quality, coordinates, fully correspond to their properties and characteristics, we will call compositional geometrization of Data.
Data properties can be completely different in nature and content. During the geometrization of the database, two coordinate systems are applied simultaneously by the methods of compositional geometric modeling. The first is a three-dimensional coordinate system of object space in which the process flows. In this case, a fourth coordinate is added - this is the change in time. The second is the n-dimensional coordinate system of the parameter space, in which the coordinates of the database elements are determined, the properties and characteristics of each element are parameterized.
The Data geometrization process greatly simplifies the next stage of work - the development of compositional geometric models. In particular, the minimal use of machine resources when working with Big Data significantly reduces the cost of obtaining valuable conclusions and forecasts.
Keywords: Big Data, cleaning, primary processing, point BN-calculus, compositional method of geometric modelling.