Skip to main content
Log in

Probabilistic nearest neighbor query processing on distributed uncertain data

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

A nearest neighbor (NN) query, which returns the most similar object to a user-specified query object, plays an important role in a wide range of applications and hence has received considerable attention. In many such applications, e.g., sensor data collection and location-based services, objects are inherently uncertain. Furthermore, due to the ever increasing generation of massive datasets, the importance of distributed databases, which deal with such data objects, has been growing. One emerging challenge is to efficiently process probabilistic NN queries over distributed uncertain databases. The straightforward approach, that each local site forwards its own database to the central server, is communication-expensive, so we have to minimize communication cost for the NN object retrieval. In this paper, we focus on two important queries, namely top-k probable NN queries and probabilistic star queries, and propose efficient algorithms to process them over distributed uncertain databases. Extensive experiments on both real and synthetic data have demonstrated that our algorithms significantly reduce communication cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://mkt.unwto.org.

  2. As long as a distance can be computed, our algorithms function, hence we can apply any distance metrics.

  3. http://www.chorochronos.org/.

References

  1. AbdulAzeem, Y.M., ElDesouky, A.I., Ali, H.A.: A framework for ranking uncertain distributed database. DKE 92, 1–19 (2014)

    Article  Google Scholar 

  2. Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: ICDE, pp. 983–992 (2008)

  3. Atallah, M.J., Qi, Y.: Computing all skyline probabilities for uncertain data. In: PODS, pp. 279–287 (2009)

  4. Bernecker, T., Emrich, T., Kriegel, H.P., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)

    Google Scholar 

  5. Beskales, G., Soliman, M.A., IIyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases. PVLDB 1(1), 326–339 (2008)

    Google Scholar 

  6. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)

  7. Cheema, M.A., Lin, X., Wang, W., Zhang, W., Pei, J.: Probabilistic reverse nearest neighbor queries on uncertain data. IEEE TKDE 22(4), 550–564 (2010)

    Google Scholar 

  8. Chen, J., Cheng, R., Mokbel, M., Chow, C.Y.: Scalable processing of snapshot and continuous nearest-neighbor queries over one-dimensional uncertain data. VLDB J. 18(5), 1219–1240 (2009)

    Article  Google Scholar 

  9. Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: EDBT, pp. 672–683 (2009)

  10. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE TKDE 16(9), 1112–1127 (2004)

    Google Scholar 

  11. Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)

  12. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)

    Article  Google Scholar 

  13. Ding, X., Jin, H.: Efficient and progressive algorithms for distributed skyline queries over uncertain data. IEEE TKDE 24(8), 1448–1462 (2012)

    Google Scholar 

  14. Fu, T.Y., Peng, W.C., Lee, W.C.: Parallelizing itinerary-based knn query processing in wireless sensor networks. IEEE TKDE 22(5), 711–729 (2010)

    Google Scholar 

  15. Ge, T., Zdonik, S., Madden, S.: Top-k queries on uncertain data: on score distribution and typical answers. In: SIGMOD, pp. 375–388 (2009)

  16. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD, pp. 673–686 (2008)

  17. Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA, pp. 337–348 (2007)

  18. Li, F., Yi, K., Jestes, J.: Ranking distributed probabilistic data. In: SIGMOD, pp. 361–374 (2009)

  19. Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1), 502–513 (2009)

    Google Scholar 

  20. Li, X., Wang, Y., Li, X., Wang, X., Yu, J.: Gdps: an efficient approach for skyline queries over distributed uncertain data. Big Data Res. 1, 23–36 (2014)

    Article  Google Scholar 

  21. Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE TKDE 20(6), 809–824 (2008)

    Google Scholar 

  22. Lian, X., Chen, L.: Shooting top-k stars in uncertain databases. VLDB J. 20(6), 819–840 (2011)

    Article  Google Scholar 

  23. Lin, X., Xu, J., Hu, H., Lee, W.: Authenticating location-based skyline queries in arbitrary subspaces. IEEE TKDE 26(6), 1479–1493 (2014)

    Google Scholar 

  24. Liu, X., Yang, D., Ye, M., Lee, W.: U-skyline: a new skyline query for uncertain databases. IEEE TKDE 25(4), 945–960 (2013)

    Google Scholar 

  25. Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: VLDB, pp. 15–26 (2007)

  26. Pripužić, K., Žarko, I.P., Aberer, K.: Distributed processing of continuous sliding-window k-nn queries for data stream filtering. World Wide Web 14(5–6), 465–494 (2011)

    Google Scholar 

  27. Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE, pp. 7–27 (2006)

  28. Soliman, M.A., Ilyas, I.F., Chen-Chuan Chang, K.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)

  29. Tang, M., Li, F., Phillips, J.M., Jestes, J.: Efficient threshold monitoring for distributed probabilistic data. In: ICDE, pp. 1120–1131 (2012)

  30. Tang, M., Li, F., Tao, Y.: Distributed online tracking. In: SIGMOD (2015)

  31. Wang, Y., Li, X., Li, X., Wang, Y.: A Survey of Queries Over Uncertain Data. Knowledge and Information Systems. Springer, London (2013)

    Google Scholar 

  32. Ye, M., Lee, W., Lee, D., Liu, X.: Distributed processing of probabilistic top-k queries in wireless sensor networks. IEEE TKDE 25(6), 76–91 (2013)

    Google Scholar 

  33. Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: VLDB, pp. 483–494 (2007)

  34. Yuen, S.M., Tao, Y., Xiao, X., Pei, J., Zhang, D.: Superseding nearest neighbor search on uncertain spatial databases. IEEE TKDE 22(7), 1041–1055 (2010)

    Google Scholar 

  35. Zhang, X., Chomicki, J.: Semantics and evaluation of top-k queries in probabilistic databases. Distrib Parallel Databases 26(1), 67–126 (2009)

    Article  Google Scholar 

  36. Zhang, Y., Lin, X., Zhu, G., Zhang, W., Lin, Q.: Efficient rank based knn query processing over uncertain data. In: ICDE, pp. 28–39 (2010)

Download references

Acknowledgments

This research is partially supported by the Grant-in-Aid for Scientific Research (A) (26240013) of MEXT, Japan, and JST, Strategic International Collaborative Research Program, SICORP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daichi Amagata.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amagata, D., Sasaki, Y., Hara, T. et al. Probabilistic nearest neighbor query processing on distributed uncertain data. Distrib Parallel Databases 34, 259–287 (2016). https://doi.org/10.1007/s10619-015-7183-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-015-7183-0

Keywords

Navigation