Int. J. of Applied Mathematics, Computational Science and Systems Engineering



Multidimensional Data Indexing in BIG DATA

Author(s): Aridj Mohamed

Abstract: Multidimensional trie hashing (MTH) access method is an extension of the trie hashing for dynamic multi-key files (or databases). Its formulation consists in maintaining in main memory (d) separate tries, every one indexes an attribute. The data file represents an array of dimension (d), in an orderly, linear way on the disk. The correspondence between the physical addresses and indexes resulting of the application of the tries is achieved through the mapping function. In average, a record may be found in one disk access, which places the method among the most efficient known. Yet MTH has the double disadvantage of a low occupancy of file buckets (40-50%) and a greater memory space in relation to the file size (tries in memory). We propose a refinement of MTH on two levels. First, by using the compact representations of tries suggested in [23], then by applying the phenomenon of delayed splitting (partial expansion) as introduced in the first methods of dynamic hashing and as used in [25]. The analysis of performances of this new scheme, mainly by simulation, shows on the one hand a high load factor (70-80%) with an access time practically equal to one disk access and on the other hand an increase in the file size with a factor of two with the same space used by MTH.

Keywords: Data structure, BigData , hashing , Multidimensional data, data storage

Pages: 12-18


Ιnt. J. of Applied Mathematics, Computational Science and Systems Engineering (published by International Academic Publications)


"International Academic Publications", 1666 Kennedy Causeway #412, North Bay Village, Miami, Florida, United States of America.



+1 914 2787705