Analysis of Competition in Chinese Automobile Industry based on an Opinion and Sentiment Mining System

In this paper a methodology for a mining system is introduced. The architecture of the system is based upon what is called opinion and sentiment mining. The mining system is used to analyze competition in the auto industry. The results show the advantages with each of the two cars used for this study. Instead of offering theory this is a hands-on approach to help solve specific problems by describing a complex process.


Introduction
Internet has become the main source for Competitive Intelligence (CI).The reason is that internet users express their opinion and attitude towards products and images of enterprises online.This paper presents a concept for how to analyze the competition in the automobile industry.The main focus is based on what is called opinion and sentiment mining.A comparative analysis between two auto brands in China is shown as an example.
First the role of opinion and sentiment mining in CI will be introduced.Further on we present the methodology for this study as well as key issues of opinion and sentiment mining.Finally the architecture of the opinion and sentiment mining system and how to use this system to analyze the competition in the auto industry is discussed.

The Role of Opinion and Sentiment Mining in CI
As shown in Table 1, internet users increased dramatically with the development of internet over the past years.The number of internet users has reached close to 2 billion, and is about 30% of the world's population.The number is higher in developed countries and developed areas.(Table 1) Table 1.World internet users and population statistics Users express their thoughts online, making internet the main information distribution and access channel.This provides new opportunities and challenges for the development of CI as a discipline.It opens up user preferences and topics such as:  How do users evaluate the products? Do users like the products? Which properties of the products make users like or dislike them? How do internet users perceive the image of the enterprise? Which practices of the enterprises do users like or dislike? How do users choose between different products? What properties make users buy the products?
Opinion and sentiment mining provide views and preferences of internet users for different companies.The users' comments are important for companies and for product development.
Take Windows Vista as an example.Vista has been selected by Time magazine as one of the 10 biggest tech failures.Mr. Nash, Windows vice president of product, confirmed the hesitation to launch the product, based on early users opinions.It said that the service was not being user-friendly, which again influenced other users in a negative way.
Users of products are an important information source for CI, and their opinions can provide companies with rich contents, making them an important reference for enterprises.

The Methodology
Opinion and sentiment mining goes through five major steps as shown in figure 1: In the object analysis stage we answer the following questions:  Which competitors should be analyzed? What are the products, brands and services of the competitors?
According to our needs, these aspects are defined as our objects.

Determine information sources
In the stage of determining information sources, an alternative information source list can be created, containing authoritative forums, web stations, and blogs.It can be filtered according to the influence and quality of the information.It can also be filtered and complemented with help from industry experts.

Evaluation index system configuration
The third step is to build an evaluation index system to describe the properties of our objects.For example, the index system may contain engine, computer screen, wheel, seat and so on in an auto industry analysis.The index system creates an alternate property list.A sentiment vocabulary need to be built, which describes the "sentiment" of the properties like good, excellent, terrible and so on.In this step, the participation of industry experts who will help us filter and complement the property list and sentiment vocabulary is necessary.The relationship and weight of properties should be determined, after which a complete index system is constructed.

Collection and integration of information
The properties of index systems are used as the query words to retrieve from the information sources.At the same time, the opinion and sentiment words are extracted.This information will be integrated into the opinion and sentiment database.

Intelligence analysis
The final step is to analyze the data.Before the analysis, some provisions need to be done, including error correction and elimination of duplicates.Then we need to identify the emotion tendency, which can be positive, neutral or negative.Some intelligence analysis methods like association, comparative and trend analysis are used to research the competitive situation further.

Key Issues
The introduction above is the framework of the methodology, and in almost every step there are some key issues including:  How to select the more authoritative information sources? How to obtain and integrate the information which is heterogeneous? How to build index systems which can describe our objects comprehensively? Choose an opinion and sentiment mining algorithm.
(1) Selection of authoritative information source In the source selection, methods such as web metrics can be used to evaluate the information source, and inputs from industry experts are essential.
(2) Acquisition and integration of multiple heterogeneous information sources During the acquisition and integration of multiple heterogeneous information, spam and filter noise should be removed through metadata standards, using segmentation algorithms to process unstructured and semi-structured information.
(3) Evaluation index system For different CI tasks, the index system is different.This step is a semi-automated process and some work must be done manually.In order to improve efficiency, software to help industry experts build or modify the index system was developed.
(4) Opinion and sentiment mining algorithms The core part of the opinion and sentiment mining system is the algorithms, which include the corpusbased approach, dictionary-based approach, supervised machine learning methods, image segmentation algorithm and other opinion extraction algorithms.During the development of this system, a dictionary-based algorithm is more suitable for Chinese information processing, and the accuracy is about 82%.That is acceptable for a commercial operation.

Architecture of Opinion and Sentiment
Figure 2. Architecture of opinion and sentiment mining system The Opinion and Sentiment Mining System is developed to gather data about opinions and sentiments related to products and services.The system consists of four parts: data acquisition, data pretreatment, data analysis and user interface, as shown in figure 2.
 The function of the data acquisition part is information selection, information extraction and information integration;  The function of the data pretreatment is to eliminate duplication of information, do error correction, emotion tendency judgment and so on;  The main task of the data analysis part is to do association research, comparative research and trend research;  The analysis of the result will be shown through different types of terminals.

Analysis of Competition in the Chinese Auto Industry
How to use this system to analyze the competition in China's auto industry will be illustrated through a case study.In this case, Peugeot 307 and Ford Focus (shown as figure 3), are used as examples.
Both cars have a high selling rate and the competition between them is fierce.We performed an analysis of the competition of the two cars through analyzing the comments of internet users.(1) Information Source The information was mainly collected from auto forums using systems and saved information in Databases which provided information about the targeted cars.The information sources are shown in table 2.   After combining the positive and the negative analysis, the conclusion is that the negative comments occupy much larger proportions of the users' comments of Peugeot 307 than for Ford Focus.Other properties are compared in a similar way achieving this overall result.We see that compared to Peugeot 307, users prefer Ford Focus, but the appearance and trim of the Peugeot 307 is preferred to its rival.

No
Peugeot 307 is better on Ford Focus is better on Skylight, Fuel consumption, Seat, Appearance, Trim, Headlight, Door, RKE, Cruise Control System, ABS, Electronic anti-theft, Speaker Engine, Air-condition, Rear suspension, Tire We came to the conclusion that the advantages of Ford Focus is the car's power and performance, which is embodied in the engine, air-condition, rear suspension and tire.Peugeot 307 on the other hand has an advantage in appearance and design which is embodied in the skylight, fuel consumption, seat and so on.

Peugeot 307
Ford Focus Increase the PR about appearance and design.
Let consumers understand the importance of vehicle performance.Fix engine deficiencies.
Strengthen the design of appearance and trim.

Figure 1 .
Figure 1.Framework of opinion and sentiment mining

Figure 3 .
Figure 3. Auto products used in the case study

Figure 5
Figure 5 is the comparison of the attention between our targeted cars.Attention is measured by the number of posts about the given car.The red line is

Figure 5 .
Figure 5. Attention comparison between Peugeot 307 and Ford Focus

Figure 6 .
Figure 6.Positive comments of target cars

Figure 7 .
Figure 7. Negative comments of target cars

Table 4 .
Comparison result of Peugeot 307 and Ford Focus

Table 5 .
Recommendation according to opinion and sentiment mining 6. Outlook Further research in this field could include:  Use Opinion and Sentiment Mining System to perform other industry analysis, such as for cosmetic industry and health industry and see what are best applied areas. Improve the accuracy of the opinion extraction and sentiment judgment;  Embed natural language processing algorithms of other languages, which can make this system analyze the information of several languages at the same time.