Users’ perceptions of Data as a Service (DaaS)

In this study, 190 market intelligence (MI), competitive intelligence (CI) and business intelligence (BI) professionals and experts were asked about Data as a Service (DaaS). Findings show there we ...


INTRODUCTION
Intelligence today is inseparable from information technology (IT) systems, special software (business intelligence) and big data. Now one can buy or rent data, and this is referred to as Data as a Service (DaaS). Many suppliers only want users to see the actual intelligence or end analysis, not the raw data, as they are afraid that customers could sell it on or make their own analyses. Like many analysts, DaaS providers are hesitant to describe their scientific method and calculations, hoping instead that users will accept their business models and trust them.
DaaS is a cloud-assisted service that delivers data on demand through an Application Programming Interface (API) (Vu et al. 2012). DaaS can also be said to be the shifting philosophy of data ownership to data stewardship (Rajesh et al., 2012, p. 26). DaaS was first used primarily in web mashups (Rajesh et al., 2012). A mashup in this context is a web page, or web application that uses content from more than one source to create a single new service displayed in a single graphical interface. Many early business intelligence companies are built on the same technology, like Agent24 in Sweden.
DaaS can be seen as ready-made, or tailormade intelligence packages. The connection to intelligence is strong for vendors, for example in Oracle. For them DaaS is "intelligence from external sources", to create "action", meant as something wider than decisions. DaaS can also be seen as a logical step from previous aaS-products from Infrastructure aaS (Amazon Web Services), Platform aaS, Software aaS (Google Email, Google Doc.) and Database aaS. For DBaaS see Curino et al., 2011 andSeibold et al., 2012. Database-as-a-Service (DBaas) was brought forward as traditional relational database systems proved to be unable to efficiently manage big data datasets. It was first with cloud computing that the opportunity arose, especially with the model known as DBaaS (Abourezq and Idrissi, 2016). With DBaaS one still owns the data. This is not so with DaaS. To have one's own database feels safer that placing data in the cloud, so the question still remains open as to just how bright the future of DaaS is. When it comes to valuable information, consumers are particularly concerned about privacyprotection. The problem has been studied and a solution was suggested by Canard and Devigne (2016).
There is also Business Intelligence as a Service (Chang, 2014). It offers data access through a web interface, where the implementation and details are hidden from users.
The business processes are orchestrated in a simpler and faster manner (Sano, 2014).
What has created the right conditions for DaaS is the growing desire to seek competitive advantage from the use of big data and the challenge of managing increasingly complex and heterogeneous data landscapes (Pringle et al., 2014, p. 29). DaaS is being brought forward by advances in cloud computing as it avoids the overly scaled computer infrastructure that includes not only dedicated space, but expensive hardware and software (Sharma, 2015).
Users' perceptions of business intelligence (BI) have been studied many times, for example by Sabanovic and Solberg Søilen (2012) and by Nyblom et al. (2012). No one has studied customers' perceptions of DaaS empirically. It's essential for suppliers to know how to package and sell different DaaS products. Before that can happen suppliers need to know what potential customers think about DaaS. First they must understand what it is, and what its potential, challenges and future may be. For this an exploratory study is requested. For intelligence studies it is of interest to know how MI, CI and BI experts see DaaS today and how they see it developing in the future. Another study should look at if MI, CI and BI experts see these questions differently from other analysts and IT experts.
State and military intelligence organizations have become efficient at sharing intelligence, especially since September 11 th , and the appearance of the new global threat of Islamic fundamentalist terrorism. These organizations are sharing and exchanging intelligence not only at national levels but also internationally. New and faster performing information technology in the form of networks (infrastructure), hard disks (storage) and devices (working stations) is making these interactions easier and more attractive. Private organizations too are realizing the potential value in sharing intelligence even though the most common form of obtaining intelligence so far is to buy data from a third party, not sharing intelligence with competitors and third parties. In the future, we can imagine that private organizations will mark documents, reports and analyses that they want to sell to others and make them available on the web. Companies who excel in intelligence work will be able to finance part of their own capabilities through the sales of their own intelligence reports, much like consultancy companies (such as KPMG) or journals (such as EIU) today. Instead of conducting their own research-which is costly and demands special competenciescompanies are more often looking to buy or rent that information.
The most common product to sell is credit reports. The most common analysis is for target marketing, placing consumers into segments.
Companies who either sit on large amounts of data, like social media sites, or who send this data around, like Ericsson and Huawei, are eager to enter this new business segment. We hear companies talking about redefining their business models, like at Ericsson, are now afraid that Huawei will overrun them if they only focus on their core business.
Facebook, LinkedIn and Twitter are all in the same business, making money by capitalizing on our personal data. What they sell-connections to friends, colleagues or anyone who cares to listen and follow us-is less important for these companies than the amount of traffic (user activities) they gather. Their income is related to how well they package and present this data to advertisers. So far they have had significant success as users, like you and me, are telling them everything about ourselves in terms of what we search for, making segmentation easier and more accurate. As a consequence, they are becoming experts in getting us to "check-in" several times a day. On the surface it is all about friends, work or political debates, but as a business the data we leave can be packaged and sold. Moreover, there is little information for the user about what is done with their data.
In the market of market intelligence this kind of data is nothing new. For decades there have been data brokers: companies who gather data in secret and sell it off, much without direct interaction with consumers. Data brokers gather data from hundreds of millions of consumers, including data about characteristics, preferences, health and financial situation. They do not only gather data about home addresses and phone numbers, but also about what car they drive, how much and what they watch on TV and on the internet, and what sports they participate in. They sell products that identify financially vulnerable consumers divided into categories such as "Rural and Barely Making It," "Ethnic Second-City Strugglers," "Retiring on Empty: Singles," "Tough Start: Young Single Parents," and "Credit Crunched: City Families" and score each person accordingly. Data brokers have been systematically criticized for not disclosing their sources. Examples of such companies today are Acxiom, Experian, and Epsilon.
From the point of view of a researcher producing science it is unthinkable not to disclose sources or to give a detailed description of the method for gathering data. The scientific article will simply not pass the review process. Serious journalists also have some rules of thumb when it comes to the truth, like checking with two independent sources. The same issue of reliability and validity that we see among data brokers is also found in other industries, for example among consultancy companies and among survey companies. These organizations are not primarily focused on disclosing the truth, but instead on selling and profits.
Many survey companies, like Novus in Sweden, refuse to disclose their scientific method, viewing it as a trade secret. In a country like Sweden, a hand full of survey companies set much of the political agenda, which again shapes political opinion as their findings and publications make the backbone of TV news and debates in the established newspapers.
Many survey companies pay respondents to fill in e-surveys as the response rate is otherwise too low. This development is increasing as internet users are less willing to take time to fill in questionnaires. Thus we have a situation today were particular respondents who are attracted to e-surveys work for the money are overrepresented. As the method is not described and data are not shown, the reader never learns that respondents are not representative of the population, even though many companies have banned respondents from certain countries in Western Africa to avoid more blatant biases. The problem is that these surveys are likely to gain different answers from another group of respondents, which is referred to as a problem of reliability. There is no one to redo surveys and research. By the time the reports are out they are soon forgotten and replaced by new ones, but the damage to the democratic system is already done as politicians are quick to take on new results from the news and shape their policies accordingly. Surveys are hardly ever called back and apologies due to surveys errors are never made by news organizations. This is the same problem we face with DaaS, as suppliers are selling and renting data without giving the customer the possibility to investigate the scientific method or the raw data and its calculations. This leads to higher chances of manipulation.

RESEARCH QUESTIONS
Among the research problems mentioned in the literature we find the question of what types of vendors are available for DaaS. Ovum (2014) distinguishes among three types: large technology vendors like IBM, Microsoft, Oracle and SAP with substantial experience in the management of data (1), full service advertising agencies, like Dentsu/Aagis Media, Havas, Interpublic, PublicisOmnicon and WPP, who combine technological capabilities with business consulting (2) and data players like Axciom, Experian and Neustar with a substantial track record in managing vast and varied data sets (3). Companies see an interesting business model in combining business know-how with technological capabilities, as in the cooperation between Qlik, HP and Intel. This year the Swedish BI company Qlik was sold to Thoma Bravo for three billion USD. The question becomes: how do you best bundle data and software?
To that end, what we do not find in the literature today is what users and customers exist for DaaS, what they are looking for and what they see as strengths and weaknesses with the products available today. Intelligence professional of all kinds would be potential customers for DaaS, just as they represent a major group of customers for business intelligence products and are working with many of the same issues around quality of data and analysis. It would therefore be of interest for researchers to contact MI, BI, and CI professionals to get their ideas.
Another research question of interest is: what kind of data sets and software do these customers want? DaaS addresses a number of long-standing concerns in the CI field. For example, DaaS could be said to be a response to those who think companies spend too much time and money building and maintaining their own systems and data. Companies need to focus more time on creating value with the data instead, it is often said in boardrooms. As we have seen there is one major assumption in this equation: that the data DaaS provides and the analyses they perform are good. The DaaS providers are basically asking us to trust them, which from a critical point of view is impossible if they do not show their method, raw data or analyses. However many companies are ready to place that trust and many will receive intelligence that is good. Given that the price is not too high DaaS will be attractive to certain groups of consumers or users. To identify and locate this group then becomes an important question.
"Garbage in garbage out" (GIGO) is becoming a big problem for big data. Big data can be divided into transaction data (ERP, CRM), interaction data (logs, social feeds, click streams) and observation data (internet of things such as sensors, RFID chips, ATM machines).
When we look at the large quantity of big data produced today, most comes from social media, e-commerce, internet of things and sensors. This includes YouTube (1000 TB of new data per day), FB (600 TB), eBay (100 TB), and Twitter (100 TB) (Abourezq, Manar and Idrissi, Abdellah (2016, p. 159). Yet with all their computer power, Amazon is still not able to tell me what book I will buy next.
What DaaS vendors offer first is this data, GIGO, not intelligence. What the customer wants, on the other hand, is the opposite: intelligence, or strategic and actionable information. This is a major challenge for suppliers in this industry. It's not an impossible equation, but it's clear that intelligence has little to do with the sheer quantity of data. If data brokers have been able to do it so can DaaS companies. The question is how.
In many cases, another challenge is to get customers to accept to receive not the actual data itself -the raw data -but a graph or some output where that raw data is simply used.
Another challenge is to get buyers to accept the idea of renting -not owning -the data.
So research should try to find out what types of buyers may accept these different terms and what they are willing to pay for it. For many customers DaaS will make sense. Most businesses don't have all that many trade secrets. They succeeded because they were first, built loyalty and delivered customer value, or simply because they never gave up. Now they are looking for better demographic data. They can try to get it themselves, but it takes too much time and they are unsure about statistics.
Many of these companies will rent the data if it's much cheaper. It will be good enough for a presentation at work. The next question then is how low the price must be given the drawbacks of DaaS listed above. From the supplier's side the question becomes how they can produce products that are more cost efficient. There are obvious advantages in this business with economies of scale, but how does this business model look? Suppliers will probably be tempted to explore lock-ins and develop sophisticated schemes for up-selling, a bit like Apple does; if you have the hardware you can only access their data through their store. DaaS companies can offer you the hardware, the software and the data, and the total IT provider. A possible advantage with this is that customers can move from one dataset to another more easily, as long as they move within the system. For some this will be fine.
From the perspective of intelligence studies maybe Intelligence as a Service (IaaS) is a more interesting domain to explore than Data as a Service (DaaS); an open web based service where intelligence is bought or exchanged. From a CI perspective a market with a few big vendors seem far less ideal. Ideally we would like a marketplace for intelligence where everyone is a buyer and a seller, not least because every company has some intelligence to sell and there should be no middle men to take a profit or delay the process, but the development is not there yet.
Another problem with the term DaaS is that it can stand for two separate phenomenons, and also includes Desktop as a Service (DaaS) and to make things worse the latter meaning is, for the moment, more popular than the first. Another problem is what to do with stolen data, which is a market in itself. Data breaches are sometimes referred to as Hacking as a Service (HaaS) (McAffee). It can be individual hackers operating as lone cowboys or hackers engaged by companies or states. Most popular are financial data; credit cards and information regarding users. This market is so large today that it has already been segmented and products priced. According to the McAffee report a credit card and information about its user in the US will cost you 15 USD. The same in the EU costs 35 USD. The second most popular data are login access, followed by identities. There are thousands of hackers trying to get this intelligence from us right now through various techniques, everything from data fishing to old fashion theft. Market intelligence and CI professionals have a constant demand for this kind of data. As a result, companies specialize in these murky waters, like Kroll and its offspring, K2 Intelligence. These companies work on both sides of the table, helping to advise how to protect data from attackers and gathering data by dubious means. Thus the learning curve is just steeper. They do not solve the ethical dilemma, but hide it under a veil of secrecy. This is also the realm of private information warfare. DaaS is, by its very definition, a part of this world and we have to make ethical choices accordingly.
We cannot tackle all of these research questions here, but must start somewhere from the bottom. Based on the problems and research questions mentioned above we can define six questions for this study (Table 1)

THE METHOD
The population is defined as possible users of DaaS. The sample size is defined as a particularly strong group of possible users for DaaS, namely CI, BI and MI experts and professionals.
Five larger groups of users on LinkedIn were selected related to business intelligence, competitive intelligence, market intelligence and intelligence studies. These were from: 1. Business Intelligence Professionals (BI, Big Data, Analytics, IoT), 2. Veille Stratégique, e-réputation et Intelligence Economique, 3. Strategic and Competitive Intelligence Professionals (SCIP), 4. Competitive/Market Intelligence Professionals and 5. Journal of Intelligence Studies in Business (JISIB). For the four first groups the surveys ware posted as a "conversation" in the dataflow. For the last group the survey was sent as an in-mail to all users registered for the group. The five groups have 222,000 users, but many are the same so it can be estimated that there are no more than 150-200,000 unique users.
The five groups in more detail, including their self-descriptions: There are reasons to think that we would get the same result if we studied the same sample size again (reliability), even though these are questions to which the answers change with time as DaaS develops. The questions listed in Table 1 correspond to the answers we are looking for (validity). As the research is primarily exploratory a qualitative method was chosen. At this stage we are more interested in understanding a phenomenon.
The questionnaire was pretested and no weaknesses detected, so no changes were made to the final questionnaire. Once launched, the initial response rates were very low, partly related to the fact that it was summer vacation but maybe more related to the fact that social media users have become more reluctant to answers surveys. The surveys were therefore sent out four times to each network during the next two months. At the end we obtained about 206 responses. Out of these, 16 were removed because of incomplete or illogical answers.
Respondents, especially on e-surveys, tend to answer with or without knowing a topic. As we wanted experts and professionals, we started the survey with two control questions. We asked if the respondent knows what DaaS is (Q1). If they did not no further answers were collected from that respondent. To be sure that the respondent answered correctly he or she was also asked to define what DaaS is (Q2). If he or she did not answer correctly given a broad margin for interpretation, the rest of their answers were taken out of the analysis part.
E-surveys are an easy way to gather data when it works, but it has become more problematic. Respondents seem to be less interested in completing e-surveys as these become more frequent. Chances are they do it quickly and without much reflection on actual questions. Longer surveys are not completed. In many cases anonymous internet users are less sincere, are opinionated, promote their own interests, and do not answer questions directly. This may be related to the way the internet has developed. For our purpose it has meant that we have had to discard a large number of responses. In future research other methods should be explored, like interviews at conferences.

FINDINGS AND ANALYSIS
The analysis builds on 190 complete responses, summarized in Table 2. Access to multiple data sets irrespective of the platform it is stored on, or the platform that you use for analysis, it is an access to a data warehouse through an interface, it is related to cloud computing, it might be about accessing huge amounts of data about a sector for example, paid access to data, it is a distribution model that disintermediates data from the platform/software allowing you to integrate it into your own web applications, data can be provided as on demand, a way to keep together, in a framework, the same data about a topic, pay to save our data in a safe place, provisioning of data via the cloud in a protected and affordable way to users that they can work with it on demand, data used as a service for decision making, its sharing of information, buying information from supplier, buzzword 3. What kind of data could you imagine buying through DaaS?
Market information, demographics, information about competitors, financial developments, market changes, specific products consumed each minute with a cross section of colors and geography, text, statistics, raw data of any kind, video, all data that is captured and stored digitally, documents, photos, records, videos, codes, programmes, economic, tourism, politics, company information and profiles, news and publication subscriptions, data from custom webscapes, geolocation & metadata enrichment, all kinds of quant data, social media data, any data that is collected by others; spend data, geographical data, company information, personal information, any kind of structured data, products prices, data related to the behavior of consumers, principally consumer data and multiple transaction data, analytics Operational, qualitative information about B2B customer needs, or competitor intentions, more personal and private data, specific fine-tuned data, data not collected, like illicit drug use, anything that is not on the deep web, military, competitors' plans, new planned products, secret info, HUMINT, really valuable information that will give you an edge Data that is more sensitive, personal and private will be difficult to buy, users think. Company secrets and data for B2B will be especially scarce.
5. What are the biggest challenges you see with DaaS from an intelligence perspective?
Connectivity and performance of the various data sources, it has limited B2B applications since the quantity of information may be limited, secrecy of the companies, to create understanding/insight from data, data homogenization, overcoming privacy rights, updating patterns might be late of managed to be late by the acknowledged user, manipulation is also possible to generate false leads, knowing what to look for in your aggregated and combined data, counterintelligence = your activity is registered from which intel requirements can be inferred, data quality, the level of collecting, mapping, keeping and distributing, big data, bank of data, speed and accuracy, confidentiality, quality, reliability, security, accessibility, pricing, what happens when everyone has the same info? then competition will increase The major concerns from users' perspectives are confidentiality, quality, reliability, security, and accessibility. Besides, when everyone has much of same data competition will increase.
6. How would you like to see DaaS develop from an intelligence perspective?
More information about B2B transactions and company metrics, cheaper, secure, flexible, first it is interesting to develop methods to create intelligence through the acquired data to help decision making. secondly the legislation should follow the development of DaaS to protect users and private data, more data mining oriented, more focus on field verification, object-based production / activitybased intelligence using resource description framework metadata models will better exploit DaaS, become more comprehensive, moving from renting to buying and owning data, develop connectivity based on formats between data to connect data silos and enrich the basis for analysis, more useful and timely info, more tailor made data, great flexibility from DaaS companies, nonstandard deliveries Users want to see more on company metrics, less expensive, more secure and more flexible data.
In more detail, we find a number of concerns: How do you as user measure the value of the data you are thinking about buying or renting? By the time the company's financial results are recorded it may be difficult to go back and see where the value added was created in the value chain. Marketing departments may become lazy, preferring to rent the data instead of getting it themselves. Field work will suffer. The risk is that marketers and other users forget about the craft of how to obtain good data and analyze it. Thus chances are that those who present the figures become less critical and make wrong inferences. Chances are users will defend DaaS not because it is better for the company, but because it makes their jobs easier.
Legal issues are a set of problems by themselves and already of great concern in some industries, like health care. In health care there already is some legislation in place as to how to handle private data, but it has proven difficult to enforce so far.
As competitors subscribe to the same data they can expect to arrive at similar conclusions, even when these conclusions are wrong. Thus we get a situation of higher competition but also a risk of systematic failure in analyses.
The skills of how to produce good data and analysis are in jeopardy. With a few large DaaS providers, these skills will be placed in the hands of a few people.
The chances of manipulation increase, as these statisticians and analyses are not checked by outsiders.
Big data itself is worrying as there is confusion about what it can do and what it cannot do. Big data is good at sorting in existing data, such as when it comes up with the logarithm for a Google search, but is poor at predicting the future, such as when Amazon suggests what you may want to buy. The risk is that DaaS providers will not tell customers about the difference, promising too much of the data they are selling. The reason for this has to do with probability statistics, R.A. Fischer and the math of small numbers (Ellenberg, 2015). With plenty of data we can predict the course of an asteroid, but we can only predict the weather the next week or two and we have very little chance of predicting human behavior at all. As an example there is a very small chance that the NSA can find a terrorist by looking at our internet behavior. The chances are much greater that they will suspect innocent people. The same logic goes for commercial data. DaaS providers will make false predictions about who our customers are.

FUTURE STUDIES
In our discussion numerous research projects have been suggested, primarily related to the user perspective. It would be of interest to see if there are differences in different groups of users, where MI, CI and BI experts belong to one group. It may be that they see these questions differently from other analysts and IT experts. What data do companies want to share? What data do companies not want to share? Will there be a future Amazon or FB of DaaS, one dominating company, one winner takes it all or a large group of suppliers? Economies of scale and big data may suggest large players have an advantage. There are already some "super aggregators" among national signal intelligence agencies with the same reason, like the NSA. In the private side, Oracle offers 7.5 trillion marketing data transactions delivered per month, 200 billion social data operations processed per hour. Do customers accept only renting data, while not being able to download it? How short of a time do customers accept renting data for? In many cases renting data only means being allowed to read the data. This is different from traditional data delivery. How will customers react to this new packaging? How much are they willing to pay for it? These are some of the questions that future studies could address.