AN EFFICIENT DENSITY PARAMETER-LIGHT IN ENHANCED SUBSPACE CLUSTERING IN HIGH DIMENSIONAL DATA

Published 31 jul 2019 •  vol 128  • 


Authors:

 

Rama Devi Jujjuri, Department of Computer Science and Engineering, Gandhi Institute of Technology and Management (GITAM), Visakhapatnam, India
M.Venkateswara Rao, Department of Information Technology Gandhi Institute of Technology and Management (GITAM), Visakhapatnam, India

Abstract:

 

Subspace clustering identifies the clusters stored in subspaces of a high dimensional dataset. Various Density-based strategies have been determined to mine clusters of arbitrary shape successfully even in the appearance of noise in full dimensional space clustering techniques. The performance and result of a subspace clustering algorithm highly depend on the parameter values of the algorithm is tuned to execute. Although determining the proper parameter values are crucial for both clustering quality and performance of the algorithm. Further, as high dimensional data has converted more and more prevalent in real-world applications due to the advances of vast data technologies. Precisely Density-based subspace clustering have gained their importance owing to their ability to identify arbitrary shaped subspace clusters. Density Divergence Query is an essential subject in high dimensional data clustering. Density divergence involves having various subspace cardinalities for complex region densities. To defeat this problem, Efficient-EnSubClu employs an efficient subspace clustering model. It discovers the clusters using different epsilon density thresholds in various subspaces. In this research, we propose an Efficient enhanced Subspace Clustering Model named Efficient-EnSubClu (Enhancement of ENSUBCLU) for discovering precise values of parameters in subspace clustering. It allows efficient neighboring core points to be clustered and find quality subspace clusters satisfying specific qualitative and quantitative properties.
Furthermore, apply the post-processing clustering steps on each found subspaces. It aims a merging model at the first step of evaluation of clusters connected with DBSCAN algorithm. Also, find the number of subspace clusters in a particular dimension and calculate the low mean dimensionality of subspace clusters. It represents every cluster with a fewer number of dimensions as visible from the low utility of mean dimensionality. Hence, we can obtain knowledge more concisely with an enhanced quality of clusters in terms like Accuracy and Silhouette Coefficient.

Keywords:

 

Subspace Clustering; Density divergence; Parameter epsilon; High Dimensional Data Mining; Density based Subspace Clustering; Quality Subspace Clusters

References:

 

[1] Ester, Martin, et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." Kdd. Vol. 96. No. 34. 1996.
[2] Agrawal, Rakesh, et al. Automatic subspace clustering of high dimensional data for data mining applications. Vol. 27. No. 2. ACM, 1998.
[3] Sim, Kelvin, et al. "A survey on enhanced subspace clustering." Data mining and knowledge discovery 26.2 (2013): 332-397.
[4] Devi, J. Rama, and M. Venkateswara Rao. "An era of Enhanced Subspace Clustering in High-Dimensional data." i-Manager's Journal on Computer Science 4.3 (2016): 29.
[5] Kailing, Karin, Hans-Peter Kriegel, and Peer Kröger. "Density-connected subspace clustering for high-dimensional data." Proceedings of the 2004 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2004.
[6] J. RAMA DEVI, Dr. M.VENKATESWARA RAO, “A new generic interpretation of enhanced subspace clustering in high dimensional data.” International Journal of Engineering & Technology, v. 7, n. 4, pp.4157-4163, dec. 2018. ISSN 2227-524X.
[7] Cheng, Chun-Hung, Ada Wai-Chee Fu, and Yi Zhang. Entropy-based subspace clustering for mining numerical data. Diss. Chinese University of Hong Kong, 1999.
[8] Lakshmi, B. Jaya, K. B. Madhuri, and M. Shashi. "An efficient algorithm for density based subspace clustering with dynamic parameter setting." Int. J. of Information Technology and Computer Science (IJITCS) 6 (2017): 27-33.
[9] Goil, Sanjay, Harsha Nagesh, and Alok Choudhary. "MAFIA: Efficient and scalable subspace clustering for very large data sets." Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 443. ACM, 1999.
[10] Yadav, Jyoti, and Dharmender Kumar. "Sub space Clustering using CLIQUE: an exploratory study." Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 3, (2014).
[11] Chu, Yi-Hong, et al. "Density conscious subspace clustering for high-dimensional data." IEEE Transactions on knowledge and data engineering 22.1 (2008): 16-30.
[12] Kriegel, H-P., et al. "A generic framework for efficient subspace clustering of high-dimensional data." fifth IEEE international conference on data mining (ICDM'05). IEEE, 2005.
[13] Assent, Ira, et al. "DUSC: Dimensionality unbiased subspace clustering." seventh IEEE international conference on data mining (ICDM 2007). IEEE, 2007.
[14] Müller, Emmanuel, et al. "Scalable density-based subspace clustering." Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011.
[15] Assent, Ira, et al. "INSCY: Indexing subspace clusters with in-process-removal of redundancy." 2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008.
[16] Kaur, Amardeep, and Amitava Datta. "Subscale: fast and scalable subspace clustering for high dimensional data." 2014 IEEE International Conference on Data Mining Workshop. IEEE, 2014.
[17] Palaniammal, S. "Improved subspace clustering algorithms for high dimensional data." (2016).
[18] Lakshmi, B. Jaya, M. Shashi, and K. B. Madhuri. "A rough set based subspace clustering technique for high dimensional data." Journal of King Saud University-Computer and Information Sciences (2017).
[19] Lee, Dongjin, and Junho Shim. "Impact parameter analysis of subspace clustering." International Journal of Distributed Sensor Networks 11.9 (2015): 398452.
[20] Soni, Neha. "Aged (automatic generation of eps for dbscan)." International Journal of Computer Science and Information Security 14.5 (2016): 536.
[21] Sunita Jahirabadkar, and Parag Kulkarni, Intelligent Subspace Clustering, A Density based Clustering aproach for High Dimensional Dataset, World Academy of Science, Engineering and Technology, International Journal of Computer and Information Engineering, Vol:3, No:7, 2009.
[22] Ozkok, Fatma Ozge, and Mete Celik. "A new approach to determine Eps parameter of DBSCAN algorithm." International Journal of Intelligent Systems and Applications in Engineering5.4 (2017): 247-251.
[23] Yao, Jing, et al. "Robust subspace clustering via penalized mixture of Gaussians." Neurocomputing 278 (2018): 4-11.
[24] Kelkar, Bhagyashri A., Sunil F. Rodd, and Umakant P. Kulkarni. "A Novel Parameter-Light Subspace Clustering Technique Based on Single Linkage Method." Journal of Information & Knowledge Management 18.01 (2019): 1950007.
[25] Inturi, Anitha Rani, and Jujjuri Ramadevi. "An Unsupervised Approach for Mining Multiple Web Databases." International Journal of Electronics and Computer Science Engineering, Jg1: 2148-2151.
[26] Rousseeuw, Peter J. "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis." Journal of computational and applied mathematics 20 (1987): 53-65.
[27] Ramadevi Jujjuri, Dr.M.V Rao, Evaluation of Enhanced Subspace Clustering Validity Using Silhouette Coefficient Internal Measure, Journal of Advanced Research in Dynamical & Control Systems, Vol. 11, No. 1, 2019.
[28] Lichman, UCI Machine Learning Repository, available at 2007.http://archive.ics.uci.edu/m/.

Citations:

 

APA:
Jujjuri, R. D., & Rao, M. V. (2019). An Efficient Density Parameter-Light in Enhanced Subspace Clustering in High Dimensional Data. International Journal of Advanced Science and Technology (IJAST), ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, 128, 31-44. doi: 10.33832/ijast.2019.128.04.

MLA:
Jujjuri, Rama Devi, et al. “An Efficient Density Parameter-Light in Enhanced Subspace Clustering in High Dimensional Data.” International Journal of Advanced Science and Technology, ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, vol. 128, 2019, pp. 31-44. IJAST, http://article.nadiapub.com/IJAST/Vol128/4.html.

IEEE:
[1] R. D. Jujjuri and M. V. Rao, “An Efficient Density Parameter-Light in Enhanced Subspace Clustering in High Dimensional Data.” International Journal of Advanced Science and Technology (IJAST), ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, vol. 128, pp. 31-44, Jul. 2019.