Content Accessibility and Semantic Networks Processed on Foreign Natural Language Analysis

Bernard Dousset, Anass El haddadi, Josiane Mothe


In this paper we present a methodology that makes it possible to mine a document collection from a domain without knowing the language in which the documents are written. We describe in detail a method, tools and results that can be used within a digital library context for Science Watch and Competitive Intelligence. We consider a collection associated with the aquaculture domain written in Chinese and extracted from a digital library. Based on the original coding (UNICODE) of the data and the tag marking the structure of the documents, we extract key elements (authors, phrases, etc.) from within the domain and analyse them. The results are displayed in the form of graphs and networks. We extract people networks and semantic networks before examining their evolution over a period of several years. The principles developed in this paper can be applied to any language.


Text mining, graph, Semantic network, Social network, Weak signals, Competitive Intelligence

