ONTO-BTM: ONTOLOGY-BASED BITERM TOPIC MODELLING FOR AUTOMATIC TOPIC LABELLING

Published 30 november 2020 •  vol 144  • 


Authors:

 

Shiva Prasad KM, Department of Computer Science and Engineering, Rao Bahadur Y Mahabaleswarappa Engineering College, Affiliated to VTU Belagavi Karnataka India
T. Hanumantha Reddy, Department of Computer Science and Engineering, Rao Bahadur Y Mahabaleswarappa Engineering College, Affiliated to VTU Belagavi Karnataka India

Abstract:

 

Huge text is available in digital form. Topic modelling which is meant for assigning meaningful labels for discovering the topics in various documents has gained a lot of attention in this area of research. There are digital books and the complex thesis which are written and available for use. Searching for relevant content inside this large corpus is difficult. It is possible to search with keywords if they are present or not, but the relative importance of those keywords inside this large corpus is still difficult to establish. In this paper Ontology-based Biterm Topic Modelling Approach (O-BTM) is proposed which embeds the concepts of ontology instead of words alone which employ squat-text topic models to establish important topics inside an automated summarized context of the large text. The outcomes of our inspections carried on thesis-oriented datasets by accessing the Google drive show that incorporating ontological notions as additional, richer properties between subjects, terms, and defining concept themes provides an adequate method of creating meaningful tags for the learned topics. The experimental analysis of O_BTM model is found more effective and the efficiency of this model is 85% compared to other traditional models such as R_BTM, BTM and LDA model.

Keywords:

 

Squat Text, Summarization, Context Analysis, Topic Modelling, Topic Labelling, Statistical Learning, Ontology

References:

 

[1] Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H. and Li, X., “Comparing Twitter and Traditional Media Using Topic Models”, Eur. Conf. Advances in Information Retrieval, Dublin, Ireland, Springer, Berlin, (2011) April 18-21, pp. 338-349.
[2] Lin, C. X., Zhao, B., Mei, Q. and Han, J., “PET: A Statistical Model for Popular Events Tracking in Social Communities”, Int. Conf. Knowledge Discovery and Data Mining, Washington, DC, USA, ACM, NY, USA, (2010) July 25-28, pp. 929-938.
[3] Song, Y., Zhou, D. and He, L., “Query Suggestion by Constructing Term-Transition Graphs”, Int. Conf. Web Search and Data Mining, Seattle, Washington, USA, ACM, NY, USA, (2012) February 8-12, pp. 353-362.
[4] Blei, D. M., Ng, A. Y. and Jordan, M. I., “Latent Dirichlet allocation”, J. Mach. Learn. Res., vol. 3, (2003), pp. 993-1022.
[5] Wang, X. and McCallum, A., “Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends”, Int. Conf. Knowledge Discovery and Data Mining, Philadelphia, PA, ACM, NY, USA, (2006) 20-23 August, pp. 424-433.
[6] Hong, L., and Davison, B. D., “Empirical Study of Topic Modeling on Twitter”, Proc. First Workshop on Social Media Analytics, Washington, DC, ACM, NY, USA, (2010) July 25-28, pp. 80-88.
[7] Hofmann, T., “Probabilistic latent semantic indexing”, In SIGIR, ACM, (1999), pp. 50-57.
[8] Blei, D., Ng, A. and Jordan, M., “Latent Dirichlet allocation”, The Journal of Machine Learning Research, vol. 3, (2003), pp. 993-1022.
[9] Boyd-Graber, J. and Blei, D. M., “Syntactic topic models”, Technical Report arXiv:1002.4665, (2010) February.
[10] Wang, X. and McCallum, A., “Topics over time: a non-Markov continuous-time model of topical trends”, In Proceedings of the 12th ACM SIGKDD, New York, NY, USA, ACM, (2006), pp. 424-433.
[11] Agirre, E., Alfonseea, E., Hall, K., Kravalova, J., Pasca, M. and Soroa, A., “A Study on Similarity and Relatedness Using Distributional and Wordnet-Based Approaches”, Annu. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Boulder, CO, USA, ACL, Stroudsburg, PA, USA, (2009) May 31- June 5, pp. 19-27.
[12] Mikolov, T., Yih, W. and Zweig, G., “Linguistic Regularities in Continuous Space Word Representations”, Annu. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, USA, ACL, Stroudsburg, PA, USA, (2013) June 10-12, pp. 746-751.
[13] Quan, X., “Short and sparse text topic modelling via self-aggregation”, Twenty-Fourth International Joint Conference on Artificial Intelligence, (2015).
[14] Diao, Q., “Finding bursty topics from microblogs”, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, (2012).
[15] Cheng, X., “Btm: Topic modelling over short texts”, IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 12, (2014), pp. 2928-2941.
[16] Kataria, S. and Agarwal, A., “Supervised topic models for microblog classification”, 2015 IEEE International Conference on Data Mining. IEEE, (2015).
[17] Zuo, Y., Zhao, J. and Xu, K., “Word network topic model: a simple but general solution for short and imbalanced texts”, Knowledge and Information Systems, vol. 48, no. 2, (2016), pp. 379-398.
[18] Barbieri, N., “Probabilistic topic models for sequence data”, Machine learning, vol. 93, no. 1, (2013), pp. 5-29.
[19] Sridhar, V. K. R., “Unsupervised topic modelling for short texts using distributed representations of words”, Proceedings of the 1st workshop on vector space modelling for natural language processing, (2015).
[20] Nguyen, D. Q., “Improving topic models with latent feature word representations”, Transactions of the Association for Computational Linguistics, vol. 3, (2015), pp. 299-313.
[21] Wang, B., “Topic selection in latent Dirichlet allocation”, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), IEEE, (2014).
[22] Blei, D. M. and Jordan, M. I., “Variational inference for Dirichlet process mixtures”, Bayesian analysis, vol. 1, no. 1, (2006), pp. 121-143.
[23] Griffiths, T. L. and Steyvers, M., “Finding scientific topics”, The National Academy of Sciences of the USA, vol. 101, no. 1, (2004), pp. 5228-5235.
[24] He, J., “Efficient correlated topic modelling with topic embedding”, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, (2017).
[25] Blei, D. M. and Lafferty, J. D., “A correlated topic model of science”, The Annals of Applied Statistics, vol. 1, no. 1, (2007), pp. 17-35.
[26] Yan, X., “A biterm topic model for short texts”, Proceedings of the 22nd international conference on World Wide Web. ACM, (2013).
[27] Li, X., “Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings”, The Computer Journal, vol. 62, no. 3, (2018), pp. 359-372.
[28] Vojnović, M., “Ranking and suggesting popular items”, IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 8, (2009), pp. 1133-1146.
[29] Allahyari, M., and Krys, K., “Automatic topic labelling using ontology-based topic models”, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). IEEE, (2015).
[30] Gutiérrez‐Batista, K., “An ontology‐based framework for automatic topic detection in multilingual environments”, International Journal of Intelligent Systems, vol. 33, no. 7, (2018), pp. 1459-1475.
[31] Allahyari, M. and Kochut, K., “OntoLDA: An Ontology-based Topic Model for Automatic Topic Labeling”, (2009).
[32] Blei, D. M., Ng, A. Y. and Jordan, M. I., “Latentdirichletallocation”, The Journal of Machine Learning research, vol. 3, (2003), pp. 993-1022.
[33] Boyd-Graber, J. L., Blei, D. M. and Zhu, X., “A topic model for word sense disambiguation”, In EM NLP-CoNLL, Citeseer, (2007), pp. 1024-1033.
[34] Hu, Y., Boyd-Graber, J., Satinoff, B. and Smith, A., “Interactive topic modelling”, Machine Learning, vol. 95, no. 3, (2014), pp. 423-469.
[35] Newman, D., Lau, J. H., Grieser, K. and Baldwin, T., “Automatic Evaluation of Topic Coherence. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies”, Los Angeles, California, ACL, Stroudsburg, PA, USA, (2010) June 02-04, pp. 100-108.

Citations:

 

APA:
Prasad KM, S.,& Reddy, T. H. (2020). Onto-Btm: Ontology-Based Biterm Topic Modelling for Automatic Topic Labelling. International Journal of Advanced Science and Technology (IJAST), ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, 144, 17-34. doi: 10.33832/ijast.2020.144.02.

MLA:
Prasad KM, Shiva et al. “Onto-Btm: Ontology-Based Biterm Topic Modelling for Automatic Topic Labelling.” International Journal of Advanced Science and Technology, ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, vol. 144, 2020, pp. 17-34. IJAST, http://article.nadiapub.com/IJAST/Vol144/2.html.

IEEE:
[1] S. Prasad KM, and T. Hanumantha Reddy, "Onto-Btm: Ontology-Based Biterm Topic Modelling for Automatic Topic Labelling." International Journal of Advanced Science and Technology (IJAST), ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, vol. 144, pp. 17-34, November 2020.