Exploiting sub-space distance equalities -- A novel concept for efficient k-nearest neighbor computation 

 

Abstract

Today, almost all information system for storing and analyzing relational data use a column store back-end. By design, they offer impressive OLAP analytics performance, but miss efficient analytics like knn analytics having a row-wise access pattern. Alternatively, for high-performance knn analytics, one could use traditional row-stores, optionally equipped with a knn indexing approach, sacrificing OLAP performance. As, either way, one does not achieve good performance for OLAP and knn queries in the same system, we envision a storage structure serving as back-end achieving this. Towards this, we study how to exploit the sub-space distance equalities observation, saying that points projected to the same sub space have the same distance to any third point, for accelerating knn query performance. To this end, we investigate on two effects resulting from this observation. The first effect is a variant of lower-bound based pruning. The second offers a distance function-independent worst-case execution bound. We contribute knn algorithms exploiting both effects integrated into a storage structure already know to work well for OLAP queries. Our investigations reveal that the effects are robust to increasing e.g., the dimensionality, suggesting generally good knn performance. Comparing our knn algorithms to well-known competitors reveals that they deliver at least comparable performance, suffering even less from the curse of dimensionality.

 

Supplementary Material

16-dimensional Data Sets

Materialized Elfs

Euclid Code

Last Modification: 12.05.2020 - Contact Person: