A new corpus-based convolutional neural network for big data text analytics


  • Wedjdane Nahilia
  • Kahled Rezega
  • Okba Kazara




Convolutional neural networks, deep learning, natural language processing, NLP, user reviews, sentiment analysis, text classification


Companies market their services and products on social media platforms with today's easy access to the internet. As result, they receive feedback and reviews from their users directly on their social media sites. Reading every text is time-consuming and resourcedemanding. With access to technology-based solutions, analyzing the sentiment of all these texts gives companies an overview of how positive or negative users are on specific subjects will minimize losses. In this paper, we propose a deep learning approach to perform sentiment analysis on reviews using a convolutional neural network model, because that they have proven remarkable results for text classification. We validate our convolutional neural network model using large-scale data sets: IMDB movie reviews and Reuters data sets with a final accuracy score of ~86% for both data sets.


Bengio, Y. R. Ducharme, P. Vincent, and C. Jauvin, (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, (3), 1137-1155.

Bing Liu, (2011). Opinion Mining and Sentiment Analysis, WEB DATA MINING. Data Centric Systems and Applications, Part 2, 459-526.

Bing Liu, (2012). Sentiment analysis and opinion mining. San Rafael, CA: Morgan and Claypool Publishers. Britz, D. (2015). Understanding Convolutional neural networks for NLP, in WildML.

Retrieved October 17th, 2018, from http://www.wildml.com/2015/11/understandin g-convolutional-neural-networks-for-nlp/

Collobert, R., J. Weston, L. Bottou, M. Karlen, K. Kavukcuglu, and P. Kuksa. (2011). Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research, (12), 2493–2537

Deng, L. and D. Yu, (2014). Deep learning: Methods and applications. Grand Rapids, MI, United States: Now publishers.

Fei-Fei, L., R. Fergus, and P. Perona. (2007). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 objects categories. Journal of Computer Vision and Image Understanding, 106(1), 59-70.

Gibson, A. and J. Patterson, (2017). Deep Learning. Chapter 1: A review on machine learning.

O'Reilly Media, Inc. Graves, A. (2013). Generating sequences with Recurrent Neural Networks. Retrieved August 13th, 2018, from https://arxiv.org/abs/1308.0850

Heaton, J. (2015). Artificial intelligence for humans, volume 3: Deep learning and neural networks. United States: Createspace Independent Publishing Platform.

Houshmand, Shirani-Mehr, (2017). Applications of Deep Learning to Sentiment Analysis of Movie Reviews. Retrieved December 6th, 2018, from https://cs224d.stanford.edu/reports/ShiraniMehrH.pdf

Kalchbrenner, N., E. Grefenstette, and P. Blunsom. (2014). A Convolutional Neural Network for Modelling Sentences. In Proceedings of ACL 2014.

Kharde, A. and S. Sonawane, (2016). Sentiment Analysis of Twitter Data: A Survey of Techniques. International Journal of Computer Applications, Volume 139, No.11, 0975-8887

Kim, Y. (2014). Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (pp. 1746–1751)

Krizhevsky, A., I. Sutskever, and G. Hinton, (2012). Imagenet classification with deep convolutional neural networks. In Advances in

neural information processing systems, 10971105

Lai, S-H., V. Lepetit, K. Nishino, and Y. Sato, (2017). Computer Vision – ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part II, volume 10112, doi 10.1007/978-3-319-54184-6, 183-204

LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Journal of Neural Computation, 1(4), 541-551

LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. (1998). Gradient-based learning applied to document recognition. In proceeding of the IEEE, 86(11), (pp. 2278-2324).

Machine Learning Cheatsheet, (2018). Activation Functions. Retrieved December 6th, 2018, from https://mlcheatsheet.readthedocs.io/en/latest/activation _functions.html

Maas, A. et al., (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, volume 1, (pp. 142- 150)

Micolov, T., K. Chen, G. Corrado, and J. Dean, (2013). Efficient Estimation of Word Representations in Vector Space. Journal of Computing Research Repository.

Mohri, M., A. Rostamizadeh, and A. Talwalkar, (2012). Foundations of machine learning. Cambridge, MA: MIT Press.

Ojeda, T., R. Bilbro and B. Bengfort, (2018). Applied Text Analysis with Python. Chapter 4. Text Vectorization and Transformation Pipelines.

O'Reilly Media, Inc. Ouyang, X., P. Zhou, C. H. Li, and L. Liu. (2015). Sentiment analysis using Convolutional neural network. In IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

Russell, M. (2011). Mining the social web, O’Reilly Media. Santos, D., and C. Gatti, (2014). Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, (pp. 69–78)

Semanet, P., S. Chintala, and Y. LeCun. (2012). Convolutional neural networks applied to house numbers digit classification. In Proceeding of the 21st International Conference on Pattern Recognition (ICPR), (pp. 3288-3291).

Semanet, P., and Y. LeCun. (2011). Traffic sign recognition with multi-scale convolutional networks. In Proceeding of International Joint Conference on Neural Networks (IJCNN), (pp. 2809-2813).

Severyn, A., and A. Moschitti, (2015). Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 959–962)

Shanmugamani, R., and R. Arumugam, (2018). Hands-On Natural Language Processing with Python. Packt Publishing. Shen, Y., X. He, J.

Gao, L. Deng, and G. Mesnil. (2014). Learning Semantic Representations Using Convolutional Neural Networks for Web Search. In Proceedings of WWW 2014.

Srinivas, S., R. Sarvadevabhatla, K. Mopuri, N. Prabhu, (2016). A taxonomy of deep convolutional neural nets for computer vision. Frontiers in Robotics and AI 2, 36

Tang, D., and M. Zhang, (2018). Deep Learning in Sentiment Analysis. In: Deng L., Liu Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore, 219-253

Thoma, M. (2017). The reuters dataset, Retrieved October 23rd, 2018, from https://martinthoma.com/nlp-reuters/

Trieu, H.L., L. M. Nguyen and P. T. Nguyen, (2016). Dealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity. Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (PACLIC 30). 259-266

Yadav, V. (2017). How neural networks learn nonlinear functions and classify linearly nonseparable data?, Medium, Retrieved October 19th, 2018, from https://medium.com/@vivek.yadav/howneural-networks-learn-nonlinear-functionsand-classify-linearly-non-separable-data22328e7e5be1

Yih, W., K. Toutanova, J. Platt, and C. Meek. (2011). Learning Discriminative Projections for Text Similarity Measures. In Proceeding of the Fifteenth Conference on Computational Natural Language Learning CoNLL’11. (pp. 247-256).

Yih, W., X. He, and C. Meek. (2014). Semantic Parsing for Single-Relation Question answering. In ACL Proceeding. Zhang, Y. and C.

Wallace, (2016). A Sensitivity Analysis of Convolutional Neural Networks for Sentence Classification. Cornell University Library, Computer Science, Computation and Language

Søilen, K.S. (2019) How managers stay informed about the surrounding world. Journal of Intelligence Studies in Business. 9 (1) 28-35.

Lashgari, M., Sutton-Brady, C., Solberg Søilen, K., & Ulfvengren, P. (2018). Adoption strategies of social media in B2B firms: a multiple case study approach. Journal of Business & Industrial Marketing, 33(5), 730-743.

Tontini, G., & Söilen, K. S. (2017). Nonlinear antecedents of customer satisfaction and loyalty in third-party logistics services (3PL). Asia Pacific Journal of Marketing and Logistics, (just-accepted), http://www.emeraldinsight.com/doi/pdfplus/10.1108/APJML-09-2016-0173

Søilen, K. S. (2017). Why care about competitive intelligence and market intelligence? The case of Ericsson and the Swedish Cellulose Company. Journal of Intelligence Studies in Business, 7(2).

Söilen, K. S. (2017). Why the social sciences should be based in evolutionary theory: the example of geoeconomics and intelligence studies. Journal of Intelligence Studies in Business, 7(1).

Solberg Søilen, K. (2016). Economic and industrial espionage at the start of the 21st century–Status quaestionis. Journal of Intelligence Studies in Business, 6(3).

Solberg Søilen, K. (2016). Users’ perceptions of Data as a Service (DaaS). Journal of Intelligence Studies in Business, 6(2), 43-51.

Solberg Søilen, K. (2016). A research agenda for intelligence studies in business. Journal of Intelligence Studies in Business, 6(1), 21-36.

Gedda, David, Nilsson, Billy, Såthén, Zebastian and Solberg Søilen, Klaus (2016). Crowdfunding: Finding the optimal platform for funders and entrepreneurs. Technology Innovation Management Review, 6, 3, pp. 31-40

Solberg Søilen, Klaus (2015). A place for intelligence studies as a scientific discipline. Journal of Intelligence Studies in Business, Vol. 5, No 3, pp. 34-46.

Oubrich, Mourad, Aziza, Amine, Solberg Søilen, Klaus (2015). The impact of CRM on QoE: An exploratory study from mobile phone industry in Morocco. Journal of Intelligence Studies in Business, Vol 5, No 2, pp. 22-35.

Drozdz, Sebastian, Dufwa, Marcus, Meconnen, Robiel, Solberg Søilen, Klaus (2015). An assessment of Customer Shared Value in the restaurant industry. Theoretical and Applied Economics, No. 4, 605, pp. 85-98

Granquist, C., Strömberg, F., Solberg Søilen, K. (2015). Games as a marketing channel – the impact of players and spectators. International Journal of Electronic Business Management, Vol. 13, No. 1, pp. 57-65

Vriens, Dirk, Solberg Søilen, Klaus (2014). Disruptive Intelligence - How to gather Information to deal with disruptive innovations. Journal of Intelligence Studies in Business, Vol. 4, No 3, pp. 63-78

Solberg Søilen, Klaus (2014). A survey of users’ perspectives and preferences as to the value of JISIB – A spot-check. Journal of Intelligence Studies in Business, Vol. 4, No 2, pp. 61-65

Svensson, B., Frestad Solér, M. Solberg Søilen, K. (2014). Bara segrar

Tontini, G., Söilen, K. S., & Zanchett, R. (2017). Nonlinear antecedents of customer satisfaction and loyalty in third-party logistics services (3PL). Asia Pacific Journal of Marketing and Logistics, 29(5), 1116-1135.

Agostino, Alessandro, Solberg Søilen, Klaus, Gerritsen, Bart (2013). Cloud solution in Business Intelligence for SMEs –vendor and customer perspectives, Journal of Intelligence Studies in Business Vol 3, No 3, pp. 5-28

Solberg Søilen, K. (2013). An overview of articles on Competitive Intelligence in JCIM and CIR. Journal of Intelligence Studies in Business Vol 3, No 1, pp. 44-58.

Solberg Søilen, K., Jenster, P. (2013). The Relationship between Strategic Planning and Company Performance – A Chinese perspective. Journal of Intelligence Studies in Business, Vol 3, No 1, pp. 15-30.

Tontini, G., Solberg Søilen, K., Silveira, A. (2013). How interactions of service attributes affect customer satisfaction: A study of the Kano model’s attributes. Total Quality Management & Business Excellence, Volume 24, Issue 11-12, pages 1253-1271

Solberg Søilen, K., Nerme, P., Stemström, C., Darefelt, N. (2013). Usage of internet banking among different segments – trust and information needs. Journal of Internet Banking and Commerce, Vol 18, No 2, pp. 2-18

Fri, W., Pehrsson, T., Solberg Søilen, K. (2013). How the phases of cluster development are associated with innovation – the case of China. International Journal of Innovation Science, Vol. 5, Nr. 1, pp. 31-43.

Hansson, L., Wrangmo, A. Solberg Søilen, K. (2013). Optimal ways for companies to use Facebook as a marketing channel. Journal of Information, Communication and Ethics in Society. Vol. 11 Iss: 2, pp. 112 – 126.

Solberg Søilen, K., Tontini, G. (2013). Knowledge Management systems and Human Resource Management policies for Innovation benchmarking: a study at ST Ericsson. Internatinal Journal of Innovation Science, Vol 5, No 3, pp. 159-171

Yasmina, A., Solberg Søilen, K., Vriens, D. (2012). Using the SSAV model to evaluate Business Intelligence Software. Journal of Intelligence Studies in Business, Vol 2, No 1, pp. 29-40.

Solberg Søilen, K. Hasslinger, A. (2012). Factors shaping vendor differentiation in the Business Intelligence software industry. Journal of Intelligence Studies in Business, Vol 2, No 3, pp. 48-54.

Sabanovic, A., Solberg Søilen, K. (2012). Customers’ Expectations and Needs in the Business Intelligence Software Market. Journal of Intelligence Studies in Business, Vol 2, No 1, pp. 5-20.

Solberg Søilen, K. (2012). The Fallacy of the Service Economy. European Business Review, Vol 24, Iss: 4, pp. 308-319.

. Solberg Søilen, K. (Planned for 2020) Digital Marketing. Springer: Heidelberg/Berlin

Solberg Søilen, K. (2013). Exhibit Marketing & Trade Show Intelligence - Successful Boothmanship and Booth Design. Springer Verlag, Berlin

Solberg Søilen, K. (2012). Geoeconomics. Ventus Publishing ApS/Bookboon, London (50 000+ downloads per year)

Jenster, P., Solberg Søilen, K. (2009). Market Intelligence: Building Strategic Insight. Copenhagen Business School Press, Denmark

Solberg Søilen, K. and Huber, S. (2006). 20 svenska studier för små och medelstora företag – pedagogik och vetenskaplig metod. Studentlitteratur, Lund

Solberg Søilen, K. (2005). Introduction to Public and Private Intelligence. Studentlitteratur, Lund

Solberg Søilen, K. (2005). En liten bok i Logikk. Hvordan lære å tenke. GRIN Humanities, Norderstedt, Germany

Solberg Søilen, K. (2005). Wirtschaftsspionage in Verhandlungen aus Informationsökonomischer Perspektive - Eine Interdisziplinäre Analyse. Dissertation. Faculty of Economics/Wirtschaftswissenshaftlichen Fakultät Universität Leipzig, Germany