CLUSTERING THE PHYSICO-CHEMICAL PROPERTIES OF SEVENTEEN APPROVED BREAST CANCER DRUGS WITH K-MEANS AND FUZZY K-MEANS

Published 30 June 2020 •  vol 13  •  no 1  • 


Authors:

 

V MNSSVKR Gupta, Dept of Computer Science and Engineering, KLEF, Vaddeswaram, Guntur, AP, India
Ch. V. Phani Krishna, Dept of Computer Science and Engineering, TKR Engineering College, Hyderabad, TS, India

Abstract:

 

Breast cancer is most widely occurring disease for women and the second most widely identified cancer effecting the lives of mankind, 2 million fresh cases were informed in 2018. If breast Cancer was detected in the starting stage, over and above 90% of women diagnosed with this can survive minimum of 5 years equated to around 15% of women diagnosed with the greatest progressive phase of breast cancer. So, there is a necessity to concentrate on this area of research and some studies are going on. By considering the importance of the breast cancer, its dataset is examined by implementing k-means clustering and fuzzy k-means clustering methods on physico-chemical characteristics of 17 authorized drugs for breast cancer mined from DrugBank database. The fuzziness parameter, m was evaluated appropriately based on the dataset and alterations of objects contained by the dataset. Before analyzing the dataset, Hopkins statistics used to evaluate normalization and clusterability of the dataset. An elbow method was used to define K for K-means. The cluster analysis gained was validated by various measures namely connectivity, Dunn and silhouette measures, adjusted Rank, Jaccard and Fowlkes-Mallows validity indices. A prior study on this area will help the doctor while giving the treatment to the patients and better decisions in prescribing suitable drug.

Keywords:

 

Clustering, K-means, Fuzzy K-means, Validation, Membership Values

References:

 

[1] Ravikumar S, Fredimoses M. and Gnanadesigan M., “Anticancer property of sediment actinomycetes against MCF-7 and MDA-MB-231 cell lines”, Asian Pacific Journal of Tropical Biomedicine, 2012, pp.92-96.
[2] GBD Mortality and Causes of Death, Collaborators, “Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death a systematic analysis for the Global Burden of Disease Study”, Lancet, (2015), pp. 1459-1544.
[3] Axelrad JE, Lichtiger S. and Yajnik V, “Inflammatory bowel disease and cancer: The role of inflammation, immunosuppression, and cancer treatment”, World Journal of Gastroenterology, (2016), pp. 4794-801.
[4] Varricchio Claudette G, “A cancer source book for nurses. Boston”, Jones and Bartlett Publishers, (2004), pp. 229.
[5] Arnold M, Karim-Kos HE, Coebergh JW, Byrnes G, Antilla A, Ferlay J, Renehan AG, Forman D and Soerjomataram I., “Recent trends in incidence of five common cancers in 26 European countries since 1988”, Analysis of the European Cancer Observatory, European Journal of Cancer, (2013).
[6] Brown P. Nat Rev Clin Oncol, “Prevention: targeted therapy-anastrozole prevents breast cancer”, (2014), pp. 127-8.
[7] Colditz GA and Bohlke K, “Priorities for the primary prevention of breast cancer”, CA, Cancer Journal of Clinicians, (2014), pp. 186-94.
[8] Cuzick J, Warwick J, Pinney E, Duffy SW, Cawthorn S, Howell A, Forbes JF and Warren RM, “Tamoxifen-induced reduction in mammographic density and breast cancer risk reduction: a nested case-control study”, Journal of National Cancer Institute, (2011), pp. 744-52.
[9] Kai Sun, “Predicting disease associations via biological network analysis”, BMC Bioinformatics, (20140.
[10] J. Ihmels, “Revealing modular organization in the yeast transcriptional network”, Nature Genetics, (2002), pp. 370-377.
[11] A. Tong, “Global mapping of the yeast genetic interaction network”, Science, (2004), pp. 808-813.
[12] Fang Song, Tan Yang, Wang Yanxian, Chen Lin and Liu Yan, “Analysis of Covert Network Channel based On Two-stage Condensing Clustering of Density Multilayer”, International Journal of Future Generation Communication and Networking, (2016), pp. 141-148.
[13] Goh KI, Cusick ME, Valle D, Childs B, Vidal M and Barabasi AL, “The human disease network”, Proceedings of the National Academic Science, (2007), pp. 8685-8690.
[14] Goh KI, Cusick ME, Valle D, Childs B and Vidal M, “The human disease network”, Proc Natl Acad Sci U S A, (2007), pp. 8685-8690.
[15] C. Von Mering, “Comparative assessment of large-scale data sets of protein-protein interactions”, Nature, (2002), pp. 399-403.
[16] Barabasi AL, “Scale-free networks: a decade and beyond”, Science, (2009), pp.412-3.
[17] Barabasi AL and Oltvai ZN, “Network biology: understanding the cell’s functional organization”, Nat Rev Genet, (2004), pp. 101-13.
[18] Muhammad Arif and Faheem Zaffar, “Challenges in Efficient Data Warehousing”, International Journal of Grid and Distributed Computing, (2015), pp. 37-48.
[19] A. Jahanshir, “Short Review on Strong Interaction of Hadrons in Quark Cluster Model”, International Journal of Advanced Science and Technology, (2016), pp. 25-36.
[20] Mengxiong Zhou, Yanming Ye, Yizhi Ren and Yueshen Xu, “Collaborative Learning Group Formation with Density Clustering”, International Journal of Grid and Distributed Computing, (2016), pp. 117-126.
[21] Shalu Sharma, “Hybrid Clustering and Classification”, International Journal of Advanced Research in Computer Science and Software Engineering, (2015), pp. 222-225.
[22] Mohammad Shabbir Hasan, “Hierarchical k-Means: A Hybrid Clustering Algorithm and Its Application to Study Gene Expression in Lung Adenocarcinoma”, Emerging Trends in Computational Biology, Bioinformatics and Systems Biology, (2015), pp. 51-67.
[23] Saima Bano and M. N. A. Khan, “A Survey of Data Clustering Methods”, International Journal of Advanced Science and Technology, (2018), pp. 133-142.
[24] Jianhui Song, Xuefei Li and Yanju Liu, “An Optimized k-means Algorithm for Selecting Initial Clustering Centers”, International Journal of Security and Its Applications, (2015), pp. 177-186.
[25] Gao Jie, Wang Jia and Zhou Yang, “Low Frequency Oscillation Modal Parameter Identification Based on NExT-ERA and Fuzzy Clustering”, International Journal of Control and Automation, (2016), pp. 309-322.
[26] Yuan Zhou, Hong-Fu Zuo and Jun He, “Aero-engine Fault Diagnosis Using a Feature Weighting Fuzzy Clustering Algorithm”, International Journal of Control and Automation, (2017), pp. 161-168.
[27] J. Z. C. Lai, T. J. Huang and Y. C. Liaw, “A fast k-means clustering algorithm using cluster center displacement”, Pattern Recognition, (2009), pp. 2551-2556.
[28] Gurjit Singh, “Implementation of Hybrid Clustering Algorithm with Enhanced K-Means and Hierarchal Clustering”, International Journal of Advanced Research in Computer Science and Software Engineering, (2013), pp. 608-618.
[29] R. Xu, D. Wunsch A. Jain, M. Murty and P. Flynn, “Data clustering: A review”, ACM Computing Surveys, (19990, pp. 264-323.
[30] Ganglong Duan, Wenxiu Hu and Zhiguang Zhang, “A Novel Multilayer Data Clustering Framework based on Feature Selection and Modified K-Means Algorithm”, International Journal of Signal Processing, Image Processing and Pattern Recognition, (2016), pp. 81-90.
[31] Zhao Hongwei and Tian Liwei, “Cooperative Approaches to Bacterial Foraging Algorithm for Clustering”, International Journal of Database Theory and Application, (2015), pp. 81-90.
[32] Chen Ning and Zhang Hongyi, “An Optimizing Algorithm of Non-Linear K-Means Clustering”, International Journal of Database Theory and Application, (2016), pp. 97-106.
[33] Ye Ping, “Fuzzy K-means algorithms based on membership function improvement Changchun”, Institute of Technology(Natural Sciences Edition), (2007).
[34] Xian-wei Zhang and Jinjin Liang, “Multiple Smooth Support Vector Machine with FCM Clustering in Hidden Space”, International Journal of Grid and Distributed Computing, (2016), pp. 129-136.
[35] Rappaport N, Twik M, Plaschkes I, Nudel R, Iny Stein T, Levitt J, Gershoni M, Morrey CP, Safran M. and Lancet D., “MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search”, Nucleic acids research, (2016), pp. D877-87.
[36] Hopkins, Brian and Skellam, John Gordon, “A new method for determining the type of distribution of plant individuals”, Annals Botany Co, (1954), pp. 213-227.
[37] W. M. Rand, “Objective criteria for the evaluation of clustering methods”, Journal of the American Statistical Association, (1971), pp. 846-850.
[38] Ferraro M. B. and Giordani P., “A new fuzzy clustering algorithm with entropy regularization”, Proceedings of the meeting on Classification and Data Analysis (CLADAG), (2013).

Citations:

 

APA:
MNSSVKR Gupta, V., & Phani Krishna, Ch. V. (2020). Clustering the Physico-Chemical Properties of Seventeen Approved Breast Cancer Drugs with K-Means and Fuzzy K-Means. International Journal of Grid and Distributed Computing (IJGDC), ISSN: 2005-4262 (Print); 2207-6379 (Online), NADIA, 13(1), 23-52. doi: 10.33832/ijgdc.2020.13.1.02.

MLA:
Gupta, V MNSSVKR, et al. “Clustering the Physico-Chemical Properties of Seventeen Approved Breast Cancer Drugs with K-Means and Fuzzy K-Means.” International Journal of Grid and Distributed Computing (IJGDC), ISSN: 2005-4262 (Print); 2207-6379 (Online), NADIA, vol. 13, no. 1, 2020, pp. 23-52. IJGDC, http://article.nadiapub.com/IJGDC/vol13_no1/2.html.

IEEE:
[1] V. MNSSVKR Gupta, and C. V. Phani Krishna, " Clustering the Physico-Chemical Properties of Seventeen Approved Breast Cancer Drugs with K-Means and Fuzzy K-Means." International Journal of Grid and Distributed Computing (IJGDC), ISSN: 2005-4262 (Print); 2207-6379 (Online), NADIA, vol. 13, no. 1, pp. 1-12, June 2020.