Geospatial analysis of census data for targeting new businesses using geoeconomics

Geoeconomics plays a vital role in encouraging goods and services on new marketplaces. Selecting a “sweet-spot” for new businesses is one of the biggest challenges for new entrepreneurs, enterprises, and investors, especially in the restaurant industry. This paper aims to present a novel geospatial methodological approach for new businesses using census data to answer an important business question: Where I should start my new Asian cuisine restaurant? State and zip code tabulation area (ZCTA) level data on race and income, downloaded from the US census website, were applied for the analysis. ArcGIS software was used as a geospatial analytics tool for hotspot analysis and for producing maps. Based on the state level standard deviation map, California was found to have the second-highest relative Asian population as gauged by the standard deviation (Std. Dev.) from the mean (1.5-2.5 Std. Dev.), after Hawaii (>2.5 Std. Dev.), and followed by New Jersey, New York, Nevada, and Washington. The state of California was selected for further investigation. Seventeen of 58 counties were found to be Asian community hotspots in California. A majority (48%, 854 of 1763) of the ZCTA were found to be Asian community hotspots in these zip codes in this state, and this was statistically significant. Only 9% (163 of 1763) of the ZCTA were not statistically significant Asian community hotspots, while 43% of the ZCTA were found to be statistically significant coldspots of Asian communities in California. Among the 17 hotspot counties of Asian communities, 14 were also derived as hotspots of mean income. The road layer map revealed that these ZCTAs are well connected to major roads in the state. New entrepreneurs, enterprises, and investors, those who are willing to open and or invest in new restaurants, but are not sure about the location, could target hotspot ZCTAs in these counties for Asian cuisine. Integrating ArcGIS with census data for producing maps of statistically significant potential business locations could be used as an important decision-making tool for opening new businesses.


INTRODUCTION
Geoeconomics is described as a theoretical and an applied science, and a methodical trend in socioeconomic geography, and can be applied in temporal, spatial, and political economic systems encouraging goods and services in new marketplaces (Alayev 1983, Anokhin andLachininskii 2015).It is also considered to be a multidisciplinary science investigating economic activities and is defined as "the study of spatial, cultural, and strategic aspects of resources, with the aim of gaining a sustainable competitive advantage" (Søilen 2012).Since geoeconomics lies on a trifold of scientific domains, including sociology, geography, and economics, each component plays a vital role in promoting new economic activities at local and global levels (Renner 1942, Lachininskii 2013, Anokhin and Lachininskii 2015).Therefore, the geoeconomic space, which is a complex network transborder system, is vital in promoting new business activities (Gay 2012, Søilen 2012, Anokhin and Lachininskii 2015).Furthermore, the local economic groups may strongly influence the regional economic performance (Porter 2003).Besides, the economic, legal, political, infrastructure, ecological, technical, cultural, and social factors are vital macroenvironments that help in business decisionmaking processes (Søilen 2012).
Selecting a "sweet-spot" for new businesses is one of the biggest challenges for new entrepreneurs, enterprises, and investors.In this study, a "sweet-spot" is defined as a geographic location where the likelihood of maximizing the benefits is the highest for a new business, considering the local environments.The location information plays a vital role in the establishment of new industries as it impacts the economic growth of the firms as well as the socioeconomic, environmental, and political status of the establishment area (Bhat et al. 2014, Demiriz and Ekizoğlu 2015, Mishra et al. 2015).The locational information could help in retail site selection (Karadeniz 2009), preventing retail banking fraud (Demiriz and Ekizoğlu 2015), financing commercial real estate acquisitions by real estate investment trusts (Conklin et al. 2016), fast-food industry (Austin et al. 2005) and much more.Moreover, the location could also impact the business-level innovation (Jordan 2015).The impact of taxes, subsidies and incentives, environmental regulations, quality of life and amenities, labor costs and availability, technical infrastructure, transportation, and accessibility have been reported as the most important factors in the assessment of finding potential locations for new businesses (Kimelberg andWilliams 2013, Bhat et al. 2014).However, the socioeconomics and demographics of the potential customers are largely ignored in most of the studies.This could adversely impact a newly established business or firm.For example, in the restaurant industry, the theme, food quality, ambiance, aesthetics, the service, and economic shifts play a vital role in the success and the failure of the business (Murillo 2010, Kimelberg andWilliams 2013).Nevertheless, selecting the wrong customer neighborhood, poor accessibility, and a less dense population in the surroundings may unfavorably impact the new establishments (Murillo 2010).
According to location theory, firms or enterprises tend to assess where and why economic activities happen so that they can maximize benefits (North 1955, Kimelberg and Williams 2013, Dubé et al. 2016).In this process, in most cases, non-spatial data, which are outcomes of small or large surveys, have been applied assessing the potential locations for establishing a new business (Kimelberg and Williams 2013).These surveys are very expensive and have inherent reliability challenges.Therefore, targeting a location for a new business based on the analysis of the survey data with a small sample size could be a big concern for entrepreneurs, investors, and enterprises.
The United States (US) census captures socioeconomics, demographics, and business information of the US population that could be used in business decision-making.However, this non-spatial data still lacks a geographical/spatial context.Spatial data have the advantage of showing patterns on maps and letting the users connect the dots, taking neighborhood geographies into consideration.Historically, presenting facts and figures on a map have played a vital role in both the political and the economic contexts (Søilen 2012).Geospatial analytics have emerged as an important method for the spatial and temporal analysis of data in various domains for informed decision-making (Prato et al. 1995, Boulos et al. 2011, Rey et al. 2015, Singh 2015, Singh and Vedwan 2015, Supak et al. 2015, Singh et al. 2016).However, in business, it is still in the rudimentary stages.The reason could be the lack of vision for integrating geospatial tools, such as Arc Geographical Information System (ArcGIS), in business decision-making.Similarly, the census data has been always available for public and private use, though it has not been integrated and/or used in business decision-making.
The main goal of this paper is to attempt to bridge the above-mentioned gaps and integrate US census data with ArcGIS to help new businesses answer an important business question: Where I should start my new Asian cuisine restaurant?There could be several business questions similar to this.
The state of California in the USA has been the center of economic growth with maximum wages and opportunities (Porter 2003).Because of the high rate of diverse immigration, the state became a hotspot for ethnic cuisine specially restaurants (Porter 2003, Capps 2007).The restaurant industry is one of the fastest growing industries, and small to large business entities could be impacted if a poor site is chosen to start a business.The small business entities may not have enough resources to evaluate site selection.Therefore, the current approach, described in this paper, could be a cost-effective and more efficient way of selecting a site for new businesses within the restaurant industry.However, the approach could be adapted for any other industries.

MATERIALS AND METHODS
The state and the zip code level population data for Asian immigrants and household mean income in the zip code areas were used.State and zip code tabulation area (ZCTA) shapefiles were applied to perform geospatial analyses and to create maps.

2.1
Data collection

State and ZCTA level Asian population data
The state level race data was downloaded from the American Community Survey (ACS).A detailed description of the ACS and the data is found in the 2016 US Census (US-Census 2016).The ACS offers a of nine different combinations of races at ZCTA level.For the current study, only HD01_VD05 (i.e.Asian alone) was selected.A detailed description on the ZCTA can be found at the US census site (US-Census 2016).In brief, the "ZCTAs are generalized areal representations of the United States Postal Service (USPS) ZIP Code service areas," however, the "USPS ZIP Codes are not areal features but a collection of mail delivery routes" (US-Census 2016).

ZCTA level income data
The ZCTA level mean income data was downloaded from the ACS.The ACS offers a total of 27 different combinations of mean income at ZCTA level.For the current study, only HC02_EST_VC02 (i.e.estimated mean income in dollars by all households) was selected (US-Census 2016).

State and ZCTA shapefiles
State and ZCTA shapefiles were downloaded from the US census website (US- Census 2014Census , 2015Census , 2016)).These shapefiles were used to produce maps for geospatial analysis.

Data integration to ArcGIS, analysis, and mapping
The state and the ZCTA data were joined to the shapefiles within the ArcGIS environment using GEOID as a join key (ESRI 2012).
ArcGIS is a mapping software, developed by

Standard deviation mapping
There are seven standard classification methods (manual, defined, equal, geometrical interval, quantile, breaks, and standard deviation) available in ArcGIS to spatially display numerical data on a map (ESRI 2012(ESRI , 2014(ESRI , 2016)).The state level Asian population data, used in this study, is available in absolute numbers and in percentage.The standard deviation classification method was applied to produce a classification map of Asian populations at the state level.In this method, ArcMap derives the mean and standard deviation and produce maps displaying which feature polygons deviate (positively and negatively) from the mean (ESRI 2014(ESRI , 2015)).
Based on the positive deviation and the highest standard deviation values, California was selected for further analysis.

Hotspot analysis
The hotspot analysis is one of the spatial statistical analysis tools in ArcMap that was applied for mapping spatial statistically significant clusters of high values (hotspots) and low values (coldspots) (ESRI 2014(ESRI , 2016)).
The output feature class is in the form of a shapefile with a Giz-score, Gip-value, and Gi_Bin.The Giz-score and Gip-value measure the statistical significance and the Gi_Bin represents the confidence intervals at 90, 95, and 99% (ESRI 2014(ESRI , 2016)).

State level distribution of Asian population in the US
The analysis revealed that the highest relative percentage (38%) of people identifying as Asian live in Hawaii, followed by 13% in California, 8.6% in Ney Jersey, 7.6% in New York, 7.4% in Nevada, and 7.3% in Washington.Although Hawaii has the highest relative Asian population, California was used for this case study because of the availability of other relevant data and the interest of new entrepreneurs, enterprises, and investors in California (Figure 1).Later, the focus for further investigation was California and all the ZCTAs in the state.

ZCTA level distribution of Asian populations in the US
The ZCTA is the smallest census unit and may offer more specific information on the socioeconomics, demographics, and businesses of those who live within the ZCTA boundaries.
The hotspot analysis of Asian communities clearly indicated two hotspots in California (Figure 2).There are 1763 ZCTAs in California.The level hotspot analysis of the Asian population revealed that a majority (48%, 854 of 1763) of the ZCTAs were found to be statistically significant Asian community hotspots (Figure 2).Only 9% (163 of 1763) of the ZCTAs were not statistically significant hotspots of Asian communities, while 43% of the ZCTAs were found to be statistically significant coldspots of Asian communities in California (Figure 2).This further explains that 854 ZCTAs are densely populated with Asian communities and could be potential locations for opening new Asian cuisine restaurants.
California has 58 counties (Figure 2), of which 17 counties are hotspots of Asian communities.Contra Costa, Los Angeles, Marin, Orange, San Francisco, San Mateo, Santa Clara, and Santa Cruz counties were found to be hotspots of Asian communities.However, Merced, Monterey, Riverside, San Bemardino, San Diego, San Joaquin, Solano, Stanislaus, and Ventura counties were partially categorized as hotspots of Asian communities in California (Figure 2).

ZCTA level distribution of mean income in California, US
The mean income in California ZCTAs ranges between $9,471 and $413,643.Furthermore, the hotspot analysis of ZCTA level mean income revealed that a majority (51% 894 of 1763) of the ZCTAs were found to be hotspots of mean income, and this was statistically significant (Figure 3) in the state.
Thirty-nine percent (746 of 1763) of the ZCTAs were found to be statistically significant coldspots relative to the mean income, while 10% (170 of 1763) of the ZCTAs were statistically not significant hotspots relative to the mean income in the California.
The hotspots of mean income cover a total of 20 counties (Figure 3).Marin, San Francisco, San Mateo, Santa Clara, Santa Cruz, Napa, Solano, Contra Costa, Ventura, Los Angeles, and Orange counties were found to be hotspots of mean income in the ZCTAs in the state.On the other hand, the areas of Sonoma, Monterey, Lake, Yolo, San Joaquin, Stanislaus, Santa Barbara, Riverside, and San Diego were partially indicated as hotspots of mean income in the ZCTAs in the state.The common counties with the hotspots of Asian communities and mean income were derived (Table 1).Out of 17 counties with hotspots of Asian communities (Figure 2), 14 counties were either fully or partially identified as the hotspots of mean income (Table 1, Figure 2 and 3) in California.Only three hotspot counties with Asian communities, including Merced, San Bemardino, and Stanislaus were not hotspots of mean income (Table 1).Therefore, the above 14 counties could be targeted for new business establishments of Asian cuisine restaurants (Table 1).

Hotspot ZCTAs with roads, availability in California, US
As discussed in the introduction, the accessibility to the facilities plays an important role in the success of a newly established business.Therefore, the hotspot map of the Asian population was overlaid with a road layer to see whether the hotspot ZCTAs are close enough to roads and accessibility is not a constraint (Figure 4).Both of the hotspots were found to have a good network of roads (Figure 4).Consequently, the 14 listed counties (Table 1) meet three major criteria for opening an Asian cuisine restaurant in California: high Asian population, higher mean income, and greater road network.

CONCLUSIONS
Geoeconomics can be examined at the lowest, low, middle, upper, and top geographical levels (Anokhin and Lachininskii 2015).In this study, the lowest geographic unit, the ZCTA, was used to show how social, economic, and geographic components could potentially impact new businesses.Additionally, ArcGIS offers several tools for geospatial analysis and could be used as decision-making tool for informed business management within geoeconomics.Moreover, census data offer various significant insights for the establishments of new businesses as well as for the existing businesses.Integrating ArcGIS with census data could help businesses accomplish their goals, starting from locating new sites to predicting their business growth.The socioeconomic and demographic data are freely available at the ZCTA level.These could be applied for informed business decisionmaking through geospatial analytics using ArcGIS.In this study, only race, mean income, and road network data were used for the analysis.However, the US census offers several other important attributes such as gender, age, occupation, and household, housing, and business data, and much more.These attributes could be applied in the selection of new business sites and/or for other business needs.The current geospatial approach described in this paper is a costeffective, easy, and efficient way of selecting new business sites and could be used by small as well as large business entities within and outside the restaurant industry.

Figure 1
Figure 1 A standard deviation map of the Asian population in the United States.

Figure 2
Figure 2 Hotspot and coldspot map of Asian communities in the Zip Code Tabulation Areas of California, United States.

Figure 3
Figure 3 Hotspot and coldspot map of mean income in the Zip Code Tabulation Area of California, United States.

Table 1
Hotspot counties and the coverage of Asian population and mean income in California, United *NHS = not a hotspot.