Hierarchical Satellite System Graph for Approximate Nearest Neighbor Search on Big Data

Abstract

Approximate nearest neighbor search is a classical problem in data science, which is widely applied in many fields. With the rapid growth of data in the real world, it becomes more and more important to speed up the nearest neighbor search process. Satellite System Graph (SSG) is one of the state-of-the-art methods to solve the problem. However, with the further increase of the data scale of problems, SSG still needs a considerable amount of time to finish the search due to the limitation of step length and start point locations. To solve the problem, we propose Hierarchical Satellite System Graph (HSSG) and present its index algorithm and search algorithm. The index process can be distributed deployed due to the good parallelism of our designed hierarchical structure. The theoretical analysis reveals that HSSG decreases the search steps and reduces the computational cost and reduces the search time by searching on the hierarchical structure with a similar indexing time compared with SSG, hence reaches a better search efficiency. The experiments on multiple datasets present that HSSG reduces the distance computations, accelerates the search process, and increases the search precision in the real tasks, especially under the tasks with large scale and crowded distributions, which presents a good application prospect of HSSG.

Publication
ACM/IMS Transactions on Data Science (TDS)
Jiaru Zhang
Jiaru Zhang
Incoming postdoc researcher at Purdue University

My research interests include Bayesian Neural Networks, Adversarial Attack and Defense, and Causal Discovery.