Hot Spot Spatial Analysis
Software |
|
Courses |
Overview
Hotspot analysis is a spatial analysis and mapping technique interested in the identification of clustering of spatial phenomena. These spatial phenomena are depicted as points in a map and refer to locations of events or objects.
Description
A hotspot can be defined as an area that has higher concentration of events compared to the expected number given a random distribution of events. Hotspot detection evolved from the study of point distributions or spatial arrangements of points in a space (Chakravorty, 1995). When examining point patterns, the density of points within a defined area is compared against a complete spatial randomness model, which describes a process in which point events occur completely at random (i.e., homogeneous spatial Poisson process). Beyond assessing the density of points in a given area, hotspot techniques also measure the extent of point event interaction to understand spatial patterns (Baddeley, 2010).
The application of hotspot analysis within public health and epidemiological research as well as in other disciplines (e.g., a great deal of the literature on hotspot analysis comes from crime mapping and research) has increased significantly in the past couple of decades mainly due to the advent of geographic information systems (GIS)-based software.
Selected examples of applications in public health practice and research include:
-
Where and when violent trauma hotspot occurs in Vancouver, Canada?
-
Are the nature of the victim, the injury, the physical environment and the socioeconomic deprivation level of the neighborhood where violent trauma events took place associated with violent trauma patterns in Vancouver, Canada?
-
What are the areas where future reassortment between human influenza A subtype H3N2 and avian influenza subtype H5N1 could generate a novel influenza virus is most likely to occur?
Data Elements
Hotspot analysis involves the following type of data:
-
Points – Locations of objects/events occurring in a study (e.g., violent trauma, crime, earthquake epicenters, cases of avian influenza, etc.)
Depending on the research question, the hotspot analysis may involve the following type of data:
-
Attributes – Categorical or continuous variable that further describes the objects/events
-
Period – Date or time of events
-
Other covariates – Explanatory variables of any kind
A base map, which is a reference map that usually contains basic geographic features, is used for overlaying the data. These maps are generally freely available on the Internet and in different formats. The software chosen for the hotspot analysis will determine the base map format you need. Maps clearinghouses established by either governments or private organizations also provide free access to a range of base maps for use in research. The base map in your research may also consist of user-defined quadrants or thematic boundaries depending on the mapping technique selected. For instance, one may be interested in assessing incident intensity. In this case, you can aggregate your data using a polygon grid consisting of same size quadrants or thematic units such as, zip codes, counties, or states (see section below on Mapping Techniques).
Hotspot Analysis Steps
1. Create or identify data set
2. Identify base map file/download
3. Test for spatial autocorrelation/clustering in data
4. Create the hotspot map
5. Define the hotspot map legend threshold
Spatial Patterns and Clustering Tests
There are different methods for analyzing spatial patterns and detecting hotspots including spatial autocorrelation and cluster analysis. The nearest neighbor index (NNI) is an indicator for clustering, which is calculated by comparing the observed distribution of events against an expected random distribution of these values. Spatial autocorrelation analysis looks at how similar are those values that are closer to each other. Measures of spatial autocorrelation can be categorized as global or local indicators of spatial association (LISA). Moran’s I and Geary’s I are examples of global spatial autocorrelation statistics. Gettis-Ord Gi* statistic is an example of a LISA statistic. Other statistics known as centrographic statistics (mean center, standard deviation distance, and standard deviation ellipse) measure centrality and dispersion, which is useful for describing spatial patterns in the data or for comparing two distributions.
Nearest Neighbor Index
The NNI measures the average distance between each point and its nearest neighbor’s location. The index is expressed as the ratio of the observed distance divided by the expected distance (expected distance is based on a hypothetical random distribution with the same number of features covering the same total area). If the NNI is less than 1, then we can consider the points clustered. If the NNI is greater than 1, then we conclude that there is evidence of a uniform pattern in the point distribution (NIJ, 2005).
The significance of the difference between the observed and the random expectation can be tested using a Z score (Clark & Evans, 1954). The formula for the Z score is the following (Levine, 2013):
Spatial Autocorrelation
Many statistics are available to evaluate whether points (i.e., events or objects) that are close together are related to each other (i.e., are not spatially independent) as compared to those points farther apart. Spatial autocorrelation can be defined globally as well as locally based on the level of the spatial analysis units.
Global Moran’s I and Geary’s I are two statistics that estimate the overall degree of spatial autocorrelation. In other words, they measure the tendency of events to cluster or the extent to which points close together have similar values on average than those farther apart. There are versions of these two statistics that measure local spatial autocorrelation. These differ from the global statistics in that these are applied to each spatial analysis unit.
Before Moran’s can be calculated, the points need to be aggregated by imposing a structure on the data points (grid or geographic unit) that constraints the number of neighbors to be considered. This is done to be able to calculate a weight matrix. This weight matrix can be a measure of contiguity between cells or can be defined as a distance-based weight.
Where,zi is the deviation of an attribute at location i from the global mean,zj is the deviation of an neighbor at location j from the global mean, wi , jis the spatial weight between feature i and j,n is the total number of features, and So is the aggregate of all the spatial weights. The product of zi and zj create a matrix, which is a measure of the proximity of values i,j on some other dimension that is then multiplied by the weight matrix.
The range of values for Moran’s is between –1.0 and +1.0, where a value of zero can be interpreted as random spatial ordering. A value greater than zero is interpreted as positive spatial autocorrelation and a value less than zero is interpreted as negative spatial autocorrelation (NIJ, 2005).
Similar to the NNI, an observed distribution can be compared to a theoretical average of a random distribution by calculating a Z score.
Geary’s I is similar to Moran’s I with the exception of the product term in the numerator, which consists of multiplying the difference in the intensity values of each point from all other values (i.e., deviation in intensities of each observation location with one another). The values of this statistic range between 0 and 2, where a value of 1 would be expected if any location is unrelated to other locations. A value less than 1 indicates positive spatial autocorrelation and a value greater than 1 indicates negative spatial autocorrelation (NIJ, 2005).
LISA Statistics
Anselin (1995) defines a LISA as any statistic that meets the following two requirements:
-
“the LISA for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation;
-
the sum of LISAs for all observations is proportional to a global indicator of spatial association.” (p. 94).
The need for LISA statistics evolved from the limitations of mapping methods that use geographic boundary areas such as, census blocks or uniform grid cells (or quadrants) as the unit of spatial analysis (e.g., thematic maps) to depict patterns of spatial clustering. Thematic units and quadrants assume spatial homogeneity within the unit, which limits detection of local hotspots within thematic units.
In general, LISA statistics measure the extent to which points that are close to a given point have similar values based on a measure of contiguity among these units within a specified radius, and thus are useful for identifying local spatial autocorrelation (Anselin, 1995). An example is the Gettis-Ord Gi* statistic, which calculates a Z score and P value for each grid cell or unit of spatial analysis (statistical significance is usually set at 99.9%). Generally, the Gi* statistic can be described as a ratio of the total of the values in a specified area to the global total.
In the case of local Moran’s I, the intensity value is calculated for each point, which is a reflection of the level of clustering of similar values around this point within a specified area. Areas with high Moran’s I scores are those where the intensity value is higher than average. In the global version of Moran’s I, only one estimate of Moran’s I is calculated for an area as a whole.
Mapping Techniques
Different mapping techniques are used to depict hotspots including the following: Point, spatial ellipses, thematic and quadrats, interpolation, and kernel density estimation (Chainey, Tompson & Uhlig, 2008; NIJ, 2005). In point mapping, the distribution of discrete geographic phenomena (i.e., an object or event) is depicted using identical dots. In this case, the hotspot is a dot at a specific address. Identification of clusters by observing single events as points in an area can be problematic as it would depend on the visual perception of the observer. Spatial ellipses apply standard deviational ellipses around point clusters to display hotspots on a map. However, ellipses may not be as informative as other mapping techniques when the type of hotspot under analysis does not follow an elliptical form (NIJ, 2005).
Thematic maps use geographic boundaries or quadrats (e.g., census blocks or uniform grids, respectively) effectively aggregating the data and spatial details by the thematic area. An example of a thematic map is a choropleth map. Hotspots detected through this mapping approach are restricted to the shape of the thematic units, which is problematic when making interpretations, in particular when not accounting for characteristics of each specific area (e.g., population density).
A hotspot mapping technique that depicts hotspots as a smooth density surface is the kernel density estimation (KDE). It is a popular mapping method because of its visual impact.
Kernel Density Estimation
KDE creates a smooth, continuous surface map showing gradients of the variation in the intensity of events across the study areas without being limited to thematic boundaries. Therefore, it is different from the other mapping methods in that the surface generated is based on a nonparametric estimate of the intensity function across cell grids using a weighting function based on a constant bandwidth or search radius (Waller & Gotway, 2004; Chainey et al., 2008). The bandwidth can be obtained through a Moran’s I analysis, which will determine the smallest distance at which clustering of events is most intense and significant, therefore representing an appropriate scale of analysis. Selecting bigger bandwidths increases the degree of smoothing.
As in thematic mapping, defining hotspot legend thresholds for KDE hotspot maps is an arbitrary process, which is influenced by the researcher’s experience, and includes a lot of trial and error and experimentation. As an alternative, standardization approaches (e.g., standard deviations from the mean) could be used to develop hotspot thresholds levels (NIJ, 2005).
Although KDE is a more flexible approach for visualizing hotspots in a map, it still suffers from the same limitations as thematic mapping in that it does not identify statistically significant hotspots and coldspots. Therefore, KDE can be used in conjunction with LISA statistics to be able to distinguish more clearly the hotspots from non-hotspot areas.
Accounting for Hotspots Related Factors
The application of Bayesian approaches for hotspot identification have been increasing and have been demonstrated to reduce by 50 percent the rates of false positives and false negatives hotspots as compared classic methods of hotspot identification such as, classically based confidence intervals (Cheng & Washington, 2005). Research on crime hotspots have made available new methods to improve the identification of hotspots through the ‘mining’ of spatial patterns using hotspot related factors, which can help distinguish between hotspots and normal areas (Wang, 2013).
Table 1. Advantages and disadvantages of different spatial pattern analytical tools and clustering tests
Spatial Pattern/ |
Advantage |
Disadvantage |
Neighborhood Nearest Index |
Simple method for testing clustering in the data |
Does not account for spatial autocorrelation of events |
Moran’s I or Geary’s C |
Provides a global average useful for comparison among smaller portions of the area, can be standardized, and variations exists for testing autocorrelation at the local level |
Issues with specifying the optimal weight to be used in the calculation of Moran’s I; seems to work better when there is little spatial dependence in the data but alternatives have been develop to address this problema |
Gi* statistic |
Adds definition to maps by estimating density distribution of events at the local level; allows assessment of spatial association in a study area or of a particular observation; and identifies statistically significant hotspots/coldspots |
Clusters composed of few observations may inflate Gi* although other methods that allow the selection of only the most robust clusters is availableb |
Standard deviation ellipses |
Size and shape of ellipses provides easy visualization of differences in point dispersion |
Large areas defined by ellipses are not very informative for prioritizing intervention areas; clusters may not follow an elliptical shape, and thus any interpretation maybe incorrect |
Thematic maps |
Allows flexibility for visualizing hotspot for different audiences using themes (i.e., physical or political boundaries) as the spatial units |
Spatial units based on themes can separate true clusters that would be observable if these boundaries did not exist |
Quadrant maps |
User-defined data scale and ranges |
Do not consider autocorrelation of events in adjacent cells |
Kernel density estimation |
Creates a smooth, continuous surface of the density of observations, which is visually appealing |
User must specify grid cell, bandwidth and thematic threshold, which can lead to diverse results depending on the values chosen |
a Li et al., 2007. |
Readings
Textbooks & Chapters
Chainey, S.P. & J. Ratcliffe. (2005). GIS and crime mapping. Hoboken, NJ: John Wiley & Sons, Inc.
Full textbook available online through the Columbia Catalog.
Chainey, S.P., Reid, S., & N. Stuart. (2002). When is a hotspot a hotspot? A procedure for creating statistically robust hotspot maps of crime. In D. Kidner, G. Higgs, & S. White (Eds.), Socio-economic Applications of Geographic Information Science. Innovations in GIS (9) (pp. 21-36). London: Taylor & Francis.
Cromley, E.K., & McLafferty, S.L. (2002). GIS and public health. New York, NY: Guilford Press.
Waller, L.A. & Gotway, C.A. (2004). Applied spatial statistics for public health data. Hoboken, New Jersey: John Wiley & Sons, Inc.
Full textbook available online through the Columbia Catalog.
Reports
NIJ (National Institute of Justice). (2005). Mapping crime: Understanding hotspots. J.E. Eck, S. Chainey, J.G. Cameron, M. Leitner, and R.E. Wilson (Eds). Washington, DC: NIJ.
Methodological Articles
Anselin, L. (1995). Local Indicators of Spatial Association—LISA. Geographical Analysis 27(2): 93–115.
Chakravorty, S. (1995). Identifying crime clusters: The spatial principles. Middle States Geographer, 28, 53-58.
Cheng, W., & Washington, S. P. (2005). Experimental evaluation of hotspot identification methods. Accident Analysis and Prevention, 37(5), 870-881.
Chainey, S., Tompson, L. & Uhlig, S. (2008). The Utility of Hotspot Mapping for Predicting Spatial Patterns of Crime. Security Journal, 21, 4–28.
Clark, P.J. & Evans, F.C. (1954). Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology, 35(4), 445-453.
El-Basyouny, K., & Sayed, T. (2013). Depth-based hotspot identification and multivariate ranking using the full Bayes approach. Accident Analysis and Prevention, 50, 1082-1089.
Getis, A., and Ord, K. (1992). The analysis of spatial association by use of distance statistics.Geographical Analysis, 24:189–206.
Li, H., Calder, C.A., Cressie, N. (2007). Beyond Moran’s I: Testing for spatial dependence based on the SAR model. Geographical Analysis, 39(4: 357-75.
Karlström, A., Ceccato, V. (2002). A new information theoretical measure of global and local spatial association: S. The Review of Regional Research, 22: 13-40.
Wang, D., Ding, W., Lo, H., Stepinski, T., Salazar, J., & Morabito, M. (2013). Crime hotspot mapping using the crime related factors – A spatial data mining approach. Applied Intelligence, 39(4), 772-781.
Application Articles
2.1. Injury Prevention
Walker, B.B., Schuurman, N., & Hameed, S.M. (2014). A GIS-based spatiotemporal analysis of violent trauma hotspots in Vancouver, Canada: identification, contextualization and intervention. BMJ Open, 4(2):e003642.
2.2. Access to Healthcare Services
Stopka, T.J., Krawczyk, C., Gradziel, P., & Geraghty, E.M. (2014). Use of spatial epidemiology and hot spot analysis to target women eligible for prenatal women, infants, and children services.Am J Public Health, 104 (Suppl 1):S183-189.
2.3. Emerging Infectious Diseases
Jones, K.E., Patel, N.G., Levy, M.A., Storeygard, A., Balk, D., Gittleman, J.L., & Daszak, P. (2008). Global trends in emerging infectious diseases. Nature, 451(7181), 990-993.
First time researchers used hotspot methodology to create a hotspot map of the origin emerging infectious disease events (EIDs).
Fuller, T.L., Gilbert, M., Martin, V., Cappelle, J., Hosseini, P., Njabo, K.Y., Abdel, A.S., Xiao, X., Daszak, P., & Smith, T.B. (2013). Predicting hotspots for influenza virus reassortment. Emerg Infect Dis,19(4),581-588.
This is another good example of the use of hotspot mapping for predicting the location of the next EIDs event. Here investigators used a similar methodology as in Jones et al. (2008) to identify geographic areas where agricultural production systems are conducive to influenza A virus reassortment.
Websites
Chow, J. (2013, June 24). Visualising crime hotspots in England and Wales using {ggmap}. Retrieved from: http://www.r-bloggers.com/visualising-crime-hotspots-in-england-and-wales-using-ggmap-2/.
DiMaggio, C. (2013). P9489 Practicals and exercises. Part III: Spatial analysis in R. Retrieved from:http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/resources/R/practicalsBookNoAns.pdf.
Part III provides introduces R users to the sp package and other spatial data analysis functions in R including practical examples for creating choropleth maps.
Esri (Environmental Systems Research Institute, Inc). (n.d.) ArcGIS How Hot Spot Analysis (Getis-Ord Gi*) works. Retrieved from:
http://resources.arcgis.com/en/help/main/10.2/index.html#/How_Hot_Spot_Analysis_Getis_Ord_Gi_works/005p00000011000000/.
Esri.(n.d.). ArcGIS Spatial Statistical Resources. Retrieved from:
http://blogs.esri.com/esri/arcgis/2010/07/13/spatial-statistics-resources/.
Levine, N. (2013). CrimeStat IV: A spatial statistics program for the analysis of crime incident locations. Ned Levine & Associates, Houston, TX and the National Institute of Justice, Washington, DC. Retrieved from: https://www.icpsr.umich.edu/CrimeStat/download.html.
CrimeStat Version 4.0 documentation including chapters on spatial autocorrelation and distance statistics.
Nelson, J. (2011, October 25). Mapping hotspots with R: The GAM. Retrieved from: http://www.r-bloggers.com/mapping-hotspots-with-r-the-gam/.
Map Resources
Columbia University’s spatial data catalog. Available at: https://guides.library.columbia.edu/GIS/us.
NYC Department of City Planning “Bytes of Big Apple” project. Available at:http://www.nyc.gov/html/dcp/html/bytes/applbyte.shtml.
US Geological Survey. The national map. Available at: https://www.usgs.gov/programs/national-geospatial-program/national-map
Wikipedia’s List of GIS data sources. Available at:http://en.wikipedia.org/wiki/List_of_GIS_data_sources.