AUTOMATIC SUMMARIZATION OF KOREAN NEWS ARTICLES USING EXTRACTIVE-ABSTRACTIVE BINDING MODEL

Published 31 Mar 2019 •  vol 124  • 


Authors:

 

Meiying Ren, Department of Computer & Information Engineering, Daegu University
Sinjae Kang, Department of Computer & Information Engineering, Daegu University

Abstract:

 

The approach for text summarization can be broadly divided into extractive and abstractive summarization. The extractive method selects summary sentences from the original text, and the abstractive method generates new summary sentences through learned models. Extractive summarization is easy to implement, but the generated summary is unnatural, while the abstractive method generates natural summaries but is difficult to implement. The encoder-decoder model, which is the most commonly used abstractive method, has a disadvantage in that performance is degraded when the input is too long. Therefore, in this paper, we propose a method of extracting important sentences from the input document through the extractive model first, and then performing abstractive summarization by inputting the generated extraction summary. The extractive model is based on the TextRank graph method, and the abstractive summarization system is based on the recurrent neural network (RNN) Encoder-Decoder model. We applied the abstractive model and the extractive–abstractive binding model to Korean newspaper articles. We used 15,000 online news articles as the dataset. 13,500 articles were used for training, and 1,500 were used for testing. News highlights written by human reporters are considered gold summaries. Experimental results show that the proposed system provides better results than the abstractive summarization only.

Keywords:

 

Text Summarization, RNN Encoder-Decoder, LSTM, TextRank, Binding Model

References:

 

[1] A. M. Rush, S. Chopra and J. Weston, “A Neural Attention Model for Abstractive Sentence Summarization”, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (2015), pp. 379-389, available online: http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP044.pdf, last visit:21.12.2018.
[2] R. Nallapati, B. Zhou, C. Santos, C. Gulcehre, and B. Xian, “Ab-stractive Text Summarization Using Sequence-To-Sequence RNNs and Beyond”, Proceedings of the 20th SIGNLL Conference on Computational Natural Language Processing(CoNLL), (2016), pp. 280-290, available online: http://www.aclweb.org/anthology/K16-1028, last visit:21.12.2018.
[3] A. See, P. J. Liu, and C. D. Manning, “Get to The Point: Summari-zation with Pointer-Generator Networks”, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1: Long Papers, (2017), pp.1073-1083, http://dx.doi.org/10.18653/v1/P17-1099.
[4] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate”, ICLR 2015: Interna-tional Conference on Learning Representations 2015, (2015), pp. 1-15, available online: https://arxiv.org/pdf/1409.0473.pdf, last visit: 20.12.2018.
[5] R. Mihalcea and P. Tarau, “TextRank: Bringing Order into Texts”, Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, (2004), pp. 1-8, available online: http://aclweb.org/anthology/W04-3252, last visit: 21.12.2018.
[6] L. Page, S. Brin, R. Motwani and T. Winograd, “The Pagerank Cita-tion Ranking: Bringing Order to the Web”, Stanford InfoLab, (1999), pp. 1-17, available online: http://ilpubs.stanford.edu:8090/422/, last visit: 21.12.2018.
[7] F. Barrios, F. López, L. Argerich and R. Wachenchauzer, “Varia-tions of the Similarity Function of TextRank for Automated Sum-marization”, Proceedings of Argentine Symposium on Artificial In-telligence (ASAI), (2015), pp. 65-72, available online: http://sedici.unlp.edu.ar/bitstream/handle/10915/52082/Documento_completo.pdf-PDFA.pdf?sequence=1&isAllowed=y, last visit: 21.12.2018.
[8] Y. Wen, H. Yuan and P. Zhang, “Research On Keyword Extraction Based On Word2Vec Weighted Textrank”, 2016 2nd IEEE Interna-tional Conference on Computer and Communications (ICCC), (2016), pp. 2109-2113, http://dx.doi.org/10.1109/CompComm.2016.7925072.
[9] J. Kupiec, J. Pedersen and F. Chen, “A Trainable Document Sum-marizer”, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (1995), pp. 68-73, available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.452.7100&rep=rep1&type=pdf, last visit: 21.12.2018.
[10] D. O'leary and J. Conroy, “Text Summarization Via Hidden Mar-kov Models and Pivoted QR Matrix Decomposition”, Proceedings of SIGIR 2001 (the 24th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval), (2001), http://dx.doi.org/10.1145/383952.384042.
[11] A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet Classifi-cation with Deep Convolutional Neural Networks”, Communica-tions of the ACM, Vol. 60, No. 6, (2017), pp. 84-90, http://dx.doi.org/10.1145/3065386.
[12] C. N. dos Santos and G. Maira, “Deep Convolutional Neural Net-works for Sentiment Analysis of Short Texts”, Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING 2014), (2014), pp.69-78, available online: http://anthology.aclweb.org/C/C14/C14-1008.pdf, last vis-it: 21.12.2018.
[13] Y. LeCun, Y. Bengio and G. Hinton, “Deep Learning”, Nature, Vol. 521, (2015), pp. 436-444, https://doi.org/10.1038/nature14539.
[14] F. A. Gers, J. Schmidhuber and F. Cummins, “Learning to Forget: Continual Prediction with LSTM”, Neural Computation, Vol. 12, No. 10, (2000), pp. 2451-2471, http://dx.doi.org/10.1162/089976600300015015.
[15] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory”, Neural Computation, Vol. 9, No. 8, (1997), pp. 1735–1780, http://dx.doi.org/10.1162/neco.1997.9.8.1735.
[16] Y. Wu, M. Schuster, Z. Chen, Q. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Rie-sa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes and J. Dean, “Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation”, arXiv:1609.08144v2, (2016), pp. 1-23.
[17] M. Galeso, “Apple Siri for Mac: An Easy Guide to the Best Fea-tures”. First Rank Publishing, (2016).
[18] K. Cho, B. van Merrienboer, D. Bahdanau and Y. Bengio, “On the Properties of Neural Machine Translation: Encoder–Decoder Ap-proaches”, Proceedings of SSST-8(Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation), (2014), pp. 103-111, available online: https://www.aclweb.org/anthology/W14-4012, last visit: 21.12.2018.
[19] C. Lin, “ROUGE: A Package for Automatic Evaluation of Sum-maries”, Proceedings of the Post-Conference Workshop of ACL 2004 (Text Summarization Branches Out), (2004), pp. 74-82, avail-able online: http://www.aclweb.org/anthology/W04-1013, last visit: 21.12.2018.

Citations:

 

APA:
Ren, M. and Kang, S. (2019). Automatic Summarization of Korean News Articles Using Extractive-Abstractive Binding Model. International Journal of Advanced Science and Technology (IJAST), ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, 124, 59-68. doi: 10.33832/ijast.2019.124.05.

MLA:
Ren, Meiying, et al. “Automatic Summarization of Korean News Articles Using Extractive-Abstractive Binding Model.” International Journal of Advanced Science and Technology, ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, vol. 124, 2019, pp. 59-68. IJAST, http://article.nadiapub.com/IJAST/Vol124/5.html.

IEEE:
M. Ren and S. Kang, “Automatic Summarization of Korean News Articles Using Extractive-Abstractive Binding Model.” International Journal of Advanced Science and Technology (IJAST), ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, vol. 124, pp. 59-68, Mar. 2019.