Hands-on Exercise 2: Spatial Weights & Spatial Autocorrelation

On this page, I address Hands-On Exercise for Chapter 02:

1. Spatial Weights & Applications

  • Creating nb matrix using QUEEN/ROOK Contiguity
  • Creating wm_d62 weight matrix with fixed distance
    • Fixed distance found by creating nb matrix using knn=1
  • Creating rwsm_q with style="W", row-standardised weight matrix
    • this creates Spatial lag with row-standardized weights
    • also creates Spatial Window Average
  • Creating rwsm_ids with style="B", binary weights,
    • this creates Spatial lag as a sum of neighbouring values
    • also creates Spatial Window Sum

1.2 Import Hunan Shapefile datasets

1.2.1 Geospatial Data Sets

  • /data/geospatial/Hunan.###: This is a geospatial data set in ESRI shapefile format.

1.2.2 Aspatial Data Sets

  • /data/aspatial/Hunan_2012.csv: This csv file contains selected Hunan’s local development indicators in 2012.

1.3 Import packages & files

  • hunan: sf data.frame
  • hunan2012: tbl_df data.frame
show code
print("Importing packages...")
[1] "Importing packages..."
show code
pacman::p_load(sf, spdep, tmap, tidyverse, knitr)

print("\nLoading dataset packages...")
[1] "\nLoading dataset packages..."
show code
hunan <- st_read(dsn = "data/geospatial", 
                 layer = "Hunan")
Reading layer `Hunan' from data source 
  `C:\1darren\ISSS624\Hands-on_Ex02\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS:  WGS 84
show code
hunan2012 <- read_csv("data/aspatial/Hunan_2012.csv")
Rows: 88 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, City
dbl (27): avg_wage, deposite, FAI, Gov_Rev, Gov_Exp, GDP, GDPPC, GIO, Loan, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  • Left join hunan2012 to hunan, select only specific columns
show code
hunan <- left_join(hunan,hunan2012)%>%
  select(1:4, 7, 15)
Joining with `by = join_by(County)`
show code
print("Previewing first 5 rows of joined, filtered hunan df")
[1] "Previewing first 5 rows of joined, filtered hunan df"
show code
kable(head(hunan, 5))
NAME_2 ID_3 NAME_3 ENGTYPE_3 County GDPPC geometry
Changde 21098 Anxiang County Anxiang 23667 POLYGON ((112.0625 29.75523…
Changde 21100 Hanshou County Hanshou 20981 POLYGON ((112.2288 29.11684…
Changde 21101 Jinshi County City Jinshi 34592 POLYGON ((111.8927 29.6013,…
Changde 21102 Li County Li 24473 POLYGON ((111.3731 29.94649…
Changde 21103 Linli County Linli 25554 POLYGON ((111.6324 29.76288…

1.4 Visualisation with qtm next to basemap

show code
basemap <- tm_shape(hunan) +
  tm_polygons() +
  tm_text("NAME_3", size=0.5) +
  tm_layout(main.title = "basemap of Hunan")

gdppc <- qtm(hunan, "GDPPC")  +
  tm_layout(main.title = "GDPPC Quintile Map")
tmap_arrange(basemap, gdppc, asp=1, ncol=2)

1.5 Contiguity Spatial Weights

  • There are 2 types of contiguity, based on chess pieces
    • QUEEN two regions are contiguous if they share a vertex;
    • ROOK two regions are contiguous if they share an edge;
  • Literature suggests they are mostly similar, but QUEEN is more robust at capturing neighbouring/contiguity more consistently

1.5.1 QUEEN Contiguity Neighbours

show code
wm_q <- poly2nb(hunan, queen=TRUE)
summary(wm_q)
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 448 
Percentage nonzero weights: 5.785124 
Average number of links: 5.090909 
Link number distribution:

 1  2  3  4  5  6  7  8  9 11 
 2  2 12 16 24 14 11  4  2  1 
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 11 links

Identifying all neighbours of most connected region:

show code
cat("Most connected region, 85:", hunan$County[85], "\nNeighbours:\n\t")
Most connected region, 85: Taoyuan 
Neighbours:
    
show code
wm_q[[85]]
 [1]  1  2  3  5  6 32 56 57 69 75 78
show code
cat("\nPreviewing neighbours:\n\n>> ID\t|  Name \t| GDPPC  \t|  Neighbours:")

Previewing neighbours:

>> ID   |  Name     | GDPPC     |  Neighbours:
show code
for (value in wm_q[[85]]){
  cat("\n>> ", value, "\t|", hunan$County[value], "  \t|", hunan$GDPPC[value], "  \t|", wm_q[[value]])
  }

>>  1   | Anxiang       | 23667     | 2 3 4 57 85
>>  2   | Hanshou       | 20981     | 1 57 58 78 85
>>  3   | Jinshi    | 34592     | 1 4 5 85
>>  5   | Linli     | 25554     | 3 4 6 85
>>  6   | Shimen    | 27137     | 4 5 69 75 85
>>  32  | Yuanling      | 24194     | 24 31 50 54 55 56 75 85
>>  56  | Anhua     | 14567     | 8 31 32 36 78 80 85
>>  57  | Nan       | 21311     | 1 2 58 64 76 85
>>  69  | Cili      | 18714     | 6 75 85
>>  75  | Sangzhi       | 14624     | 6 32 53 55 69 85
>>  78  | Taojiang      | 19509     | 2 8 9 56 58 68 85
  • Print adjacency matrix with str (warning: long!)
str(wm_q)
List of 88
 $ : int [1:5] 2 3 4 57 85
 $ : int [1:5] 1 57 58 78 85
 $ : int [1:4] 1 4 5 85
 $ : int [1:4] 1 3 5 6
 $ : int [1:4] 3 4 6 85
 $ : int [1:5] 4 5 69 75 85
 $ : int [1:4] 67 71 74 84
 $ : int [1:7] 9 46 47 56 78 80 86
 $ : int [1:6] 8 66 68 78 84 86
 $ : int [1:8] 16 17 19 20 22 70 72 73
 $ : int [1:3] 14 17 72
 $ : int [1:5] 13 60 61 63 83
 $ : int [1:4] 12 15 60 83
 $ : int [1:3] 11 15 17
 $ : int [1:4] 13 14 17 83
 $ : int [1:5] 10 17 22 72 83
 $ : int [1:7] 10 11 14 15 16 72 83
 $ : int [1:5] 20 22 23 77 83
 $ : int [1:6] 10 20 21 73 74 86
 $ : int [1:7] 10 18 19 21 22 23 82
 $ : int [1:5] 19 20 35 82 86
 $ : int [1:5] 10 16 18 20 83
 $ : int [1:7] 18 20 38 41 77 79 82
 $ : int [1:5] 25 28 31 32 54
 $ : int [1:5] 24 28 31 33 81
 $ : int [1:4] 27 33 42 81
 $ : int [1:3] 26 29 42
 $ : int [1:5] 24 25 33 49 54
 $ : int [1:3] 27 37 42
 $ : int 33
 $ : int [1:8] 24 25 32 36 39 40 56 81
 $ : int [1:8] 24 31 50 54 55 56 75 85
 $ : int [1:5] 25 26 28 30 81
 $ : int [1:3] 36 45 80
 $ : int [1:6] 21 41 47 80 82 86
 $ : int [1:6] 31 34 40 45 56 80
 $ : int [1:4] 29 42 43 44
 $ : int [1:4] 23 44 77 79
 $ : int [1:5] 31 40 42 43 81
 $ : int [1:6] 31 36 39 43 45 79
 $ : int [1:6] 23 35 45 79 80 82
 $ : int [1:7] 26 27 29 37 39 43 81
 $ : int [1:6] 37 39 40 42 44 79
 $ : int [1:4] 37 38 43 79
 $ : int [1:6] 34 36 40 41 79 80
 $ : int [1:3] 8 47 86
 $ : int [1:5] 8 35 46 80 86
 $ : int [1:5] 50 51 52 53 55
 $ : int [1:4] 28 51 52 54
 $ : int [1:5] 32 48 52 54 55
 $ : int [1:3] 48 49 52
 $ : int [1:5] 48 49 50 51 54
 $ : int [1:3] 48 55 75
 $ : int [1:6] 24 28 32 49 50 52
 $ : int [1:5] 32 48 50 53 75
 $ : int [1:7] 8 31 32 36 78 80 85
 $ : int [1:6] 1 2 58 64 76 85
 $ : int [1:5] 2 57 68 76 78
 $ : int [1:4] 60 61 87 88
 $ : int [1:4] 12 13 59 61
 $ : int [1:7] 12 59 60 62 63 77 87
 $ : int [1:3] 61 77 87
 $ : int [1:4] 12 61 77 83
 $ : int [1:2] 57 76
 $ : int 76
 $ : int [1:5] 9 67 68 76 84
 $ : int [1:4] 7 66 76 84
 $ : int [1:5] 9 58 66 76 78
 $ : int [1:3] 6 75 85
 $ : int [1:3] 10 72 73
 $ : int [1:3] 7 73 74
 $ : int [1:5] 10 11 16 17 70
 $ : int [1:5] 10 19 70 71 74
 $ : int [1:6] 7 19 71 73 84 86
 $ : int [1:6] 6 32 53 55 69 85
 $ : int [1:7] 57 58 64 65 66 67 68
 $ : int [1:7] 18 23 38 61 62 63 83
 $ : int [1:7] 2 8 9 56 58 68 85
 $ : int [1:7] 23 38 40 41 43 44 45
 $ : int [1:8] 8 34 35 36 41 45 47 56
 $ : int [1:6] 25 26 31 33 39 42
 $ : int [1:5] 20 21 23 35 41
 $ : int [1:9] 12 13 15 16 17 18 22 63 77
 $ : int [1:6] 7 9 66 67 74 86
 $ : int [1:11] 1 2 3 5 6 32 56 57 69 75 ...
 $ : int [1:9] 8 9 19 21 35 46 47 74 84
 $ : int [1:4] 59 61 62 88
 $ : int [1:2] 59 87
 - attr(*, "class")= chr "nb"
 - attr(*, "region.id")= chr [1:88] "1" "2" "3" "4" ...
 - attr(*, "call")= language poly2nb(pl = hunan, queen = TRUE)
 - attr(*, "type")= chr "queen"
 - attr(*, "sym")= logi TRUE

1.5.2 ROOK Contiguity Neighbours

show code
wm_r <- poly2nb(hunan, queen=FALSE)
summary(wm_r)
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 440 
Percentage nonzero weights: 5.681818 
Average number of links: 5 
Link number distribution:

 1  2  3  4  5  6  7  8  9 10 
 2  2 12 20 21 14 11  3  2  1 
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 10 links
  • Which of Taoyuan’s neighbours is now missing?
setdiff(wm_q[[85]], wm_r[[85]])
[1] 57
hunan$County[[57]]
[1] "Nan"
  • This is the difference between ROOK & QUEEN: Taoyuan & Nan share only a vertex, no edges Alt text

1.5.3 Exploring Contiguity Weights

  • Get latitude, longtiude by
    • map_dbl retrieving double-precision datatype via map function on geometry column of hunan
    • use st_centroid to find centroid of each row
    • indexing via [[1]], [[2] for long,lat of centroid
    • retrieving a vector of regions
  • cbind combines separate vectors back into single dataframe (coords)with two columns
show code
longitude <- map_dbl(hunan$geometry, ~st_centroid(.x)[[1]])
latitude <- map_dbl(hunan$geometry, ~st_centroid(.x)[[2]])
coords <- cbind(longitude, latitude)
cat("Printing first 6 rows of `coords`:\n")
Printing first 6 rows of `coords`:
show code
head(coords)
     longitude latitude
[1,]  112.1531 29.44362
[2,]  112.0372 28.86489
[3,]  111.8917 29.47107
[4,]  111.7031 29.74499
[5,]  111.6138 29.49258
[6,]  111.0341 29.79863
  • now using centroid vertices, we plot:
    • plot QUEEN-contiguity map
    • plot ROOK-contiguity map
    • plot differences (i.e. ROOK overlapping Queen)
show code
par(mfrow=c(1,3), lty = 2, lwd = 2)
plot(hunan$geometry, border="lightgrey")
plot(wm_q, coords, pch = 19, cex = 0.6, add = TRUE, col= "red", main="Queen Contiguity")
title("QUEEN Contiguity")
box()
plot(hunan$geometry, border="lightgrey")
plot(wm_r, coords, pch = 19, cex = 0.6, add = TRUE, col = "blue", main="Rook Contiguity")
title("ROOK Contiguity")
box()
plot(hunan$geometry, border="lightgrey")
plot(wm_q, coords, pch = 19, cex = 0.6, add = TRUE, col= "red", main="Queen Contiguity")
plot(wm_r, coords, pch = 19, cex = 0.6, add = TRUE, col = "blue", main="Rook Contiguity")
title("Differences:")
box()


1.6 Distance-based neighbours

1.6.1 Identifying max inter-neighbour distance

  • k1 created by parsing
    • knearneigh returns matrix of k (default=1) nearest neighbours’s index based on coords, apparently in knn object
    • knn2nb converts k-nearest-neighbours to neighbours-list in nb class
  • unlist unbinds list structure of output into vector
    • nbdists takes in nb neighbours list and returns euclidean distances between neighbours in same structure
  • all this searches the greatest neighbour distance (max 61.79 below) to ensure each region has at least one neighbour
show code
# coords <- coordinates(hunan) #following previous steps
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
cat("Printing summary stats for k-1 distances\n")
Printing summary stats for k-1 distances
show code
summary(k1dists)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  24.79   32.57   38.01   39.07   44.52   61.79 

1.6.2 Creating fixed distance weight matrix

  • dnearneigh returns list of vectors of regions satisfying distance criteria (eg within max neighbour distance)
show code
wm_d62 <- dnearneigh(coords, 0, 62, longlat = TRUE)
cat("Printing details of distance weight matrix\n")
Printing details of distance weight matrix
show code
wm_d62
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 324 
Percentage nonzero weights: 4.183884 
Average number of links: 3.681818 
show code
cat("\nInspecting first six rows of [wm_d62] obj \n")

Inspecting first six rows of [wm_d62] obj 
show code
cat(str(head(wm_d62, n=6)))
List of 6
 $ : int [1:5] 3 4 5 57 64
 $ : int [1:4] 57 58 78 85
 $ : int [1:4] 1 4 5 57
 $ : int [1:3] 1 3 5
 $ : int [1:4] 1 3 4 85
 $ : int 69
show code
cat("\n^ Note how 6th row only has one neighbour, i.e. region 69")

^ Note how 6th row only has one neighbour, i.e. region 69
Quiz: What is the meaning of “Average number of links: 3.681818” shown above?
  • Each region has 3.68 links on average, i.e. (total number of links) / (total number of regions)
  • Alternative structure (warning: long!)
    • this uses table to combine the column name from hunan$Country with wm_d62
    • card apparently looks at the length of the neighbour list and prints 1 if yes, 0 if no (i.e. Anhua has 1 neighobur, Anren has 4)
  • Warning, long table!
table(hunan$County, card(wm_d62))
               
                1 2 3 4 5 6
  Anhua         1 0 0 0 0 0
  Anren         0 0 0 1 0 0
  Anxiang       0 0 0 0 1 0
  Baojing       0 0 0 0 1 0
  Chaling       0 0 1 0 0 0
  Changning     0 0 1 0 0 0
  Changsha      0 0 0 1 0 0
  Chengbu       0 1 0 0 0 0
  Chenxi        0 0 0 1 0 0
  Cili          0 1 0 0 0 0
  Dao           0 0 0 1 0 0
  Dongan        0 0 1 0 0 0
  Dongkou       0 0 0 1 0 0
  Fenghuang     0 0 0 1 0 0
  Guidong       0 0 1 0 0 0
  Guiyang       0 0 0 1 0 0
  Guzhang       0 0 0 0 0 1
  Hanshou       0 0 0 1 0 0
  Hengdong      0 0 0 0 1 0
  Hengnan       0 0 0 0 1 0
  Hengshan      0 0 0 0 0 1
  Hengyang      0 0 0 0 0 1
  Hongjiang     0 0 0 0 1 0
  Huarong       0 0 0 1 0 0
  Huayuan       0 0 0 1 0 0
  Huitong       0 0 0 1 0 0
  Jiahe         0 0 0 0 1 0
  Jianghua      0 0 1 0 0 0
  Jiangyong     0 1 0 0 0 0
  Jingzhou      0 1 0 0 0 0
  Jinshi        0 0 0 1 0 0
  Jishou        0 0 0 0 0 1
  Lanshan       0 0 0 1 0 0
  Leiyang       0 0 0 1 0 0
  Lengshuijiang 0 0 1 0 0 0
  Li            0 0 1 0 0 0
  Lianyuan      0 0 0 0 1 0
  Liling        0 1 0 0 0 0
  Linli         0 0 0 1 0 0
  Linwu         0 0 0 1 0 0
  Linxiang      1 0 0 0 0 0
  Liuyang       0 1 0 0 0 0
  Longhui       0 0 1 0 0 0
  Longshan      0 1 0 0 0 0
  Luxi          0 0 0 0 1 0
  Mayang        0 0 0 0 0 1
  Miluo         0 0 0 0 1 0
  Nan           0 0 0 0 1 0
  Ningxiang     0 0 0 1 0 0
  Ningyuan      0 0 0 0 1 0
  Pingjiang     0 1 0 0 0 0
  Qidong        0 0 1 0 0 0
  Qiyang        0 0 1 0 0 0
  Rucheng       0 1 0 0 0 0
  Sangzhi       0 1 0 0 0 0
  Shaodong      0 0 0 0 1 0
  Shaoshan      0 0 0 0 1 0
  Shaoyang      0 0 0 1 0 0
  Shimen        1 0 0 0 0 0
  Shuangfeng    0 0 0 0 0 1
  Shuangpai     0 0 0 1 0 0
  Suining       0 0 0 0 1 0
  Taojiang      0 1 0 0 0 0
  Taoyuan       0 1 0 0 0 0
  Tongdao       0 1 0 0 0 0
  Wangcheng     0 0 0 1 0 0
  Wugang        0 0 1 0 0 0
  Xiangtan      0 0 0 1 0 0
  Xiangxiang    0 0 0 0 1 0
  Xiangyin      0 0 0 1 0 0
  Xinhua        0 0 0 0 1 0
  Xinhuang      1 0 0 0 0 0
  Xinning       0 1 0 0 0 0
  Xinshao       0 0 0 0 0 1
  Xintian       0 0 0 0 1 0
  Xupu          0 1 0 0 0 0
  Yanling       0 0 1 0 0 0
  Yizhang       1 0 0 0 0 0
  Yongshun      0 0 0 1 0 0
  Yongxing      0 0 0 1 0 0
  You           0 0 0 1 0 0
  Yuanjiang     0 0 0 0 1 0
  Yuanling      1 0 0 0 0 0
  Yueyang       0 0 1 0 0 0
  Zhijiang      0 0 0 0 1 0
  Zhongfang     0 0 0 1 0 0
  Zhuzhou       0 0 0 0 1 0
  Zixing        0 0 1 0 0 0

1.6.2x Unfinished Disjoint subgraph plot

  • this section was tucked right above 8.6.2.1 without explanation
  • n.comp.nb() finds the number of disjoint connected subgraphs [see source]
show code
n_comp <- n.comp.nb(wm_d62)
cat("Number of disjoint subgraphs:", n_comp$nc)
Number of disjoint subgraphs: 1
show code
cat("\nTable of disjoint subgraphs by region:\n")

Table of disjoint subgraphs by region:
show code
table(n_comp$comp.id)

 1 
88 
show code
cat("^ i.e. 88 regions all report 1 distjoint subgraph, i.e. no region is disjoint")
^ i.e. 88 regions all report 1 distjoint subgraph, i.e. no region is disjoint

1.6.2.1 Plotting fixed distance weight matrix

  • Plot background of hunan$Geometry
  • Plot points of centroids in coords, connected by black lines
  • Plot k=1-nearest-neighbours (i.e. show nearest neighbours as in k1) in red lines on top;
show code
plot(hunan$geometry, border="lightgrey")
plot(wm_d62, coords, add=TRUE)
plot(k1, coords, add=TRUE, col="red", length=0.08)

title("Comparing fixed-distance neighbours (black)\nvs 1st-nearest-neighbours (red)")
box()

  • Side-by-side comparison:
show code
par(mfrow=c(1,2))
plot(hunan$geometry, border="lightgrey")
plot(k1, coords, add=TRUE, col="red", length=0.08, main="1st nearest neighbours")
title("1st Nearest Neighbours")
box()
plot(hunan$geometry, border="lightgrey")
plot(wm_d62, coords, add=TRUE, pch = 19, cex = 0.6, main="Distance link")
title("Distance-Based Neighbours")
box()

1.6.3 Exploring Contiguity Weights

  • Now calculating 6 nearest neighbours via knn algorithm
show code
knn6 <- knn2nb(knearneigh(coords, k=6))
cat("Printing details of knn neighbour matrix, k=6 \n")
Printing details of knn neighbour matrix, k=6 
show code
knn6
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 528 
Percentage nonzero weights: 6.818182 
Average number of links: 6 
Non-symmetric neighbours list
show code
cat("\nInspecting first six rows of [knn6] obj \n")

Inspecting first six rows of [knn6] obj 
show code
cat(str(head(knn6, n=6)))
List of 6
 $ : int [1:6] 2 3 4 5 57 64
 $ : int [1:6] 1 3 57 58 78 85
 $ : int [1:6] 1 2 4 5 57 85
 $ : int [1:6] 1 3 5 6 69 85
 $ : int [1:6] 1 3 4 6 69 85
 $ : int [1:6] 3 4 5 69 75 85
show code
cat("\n^ Note how every row now has 6 neighbours exactly.")

^ Note how every row now has 6 neighbours exactly.
  • Here’s what it looks like instead:
show code
par(mfrow=c(1,2))
plot(hunan$geometry, border="lightgrey", main="1st Nearest Neighbours")
plot(k1, coords, pch = 19, cex = 0.6, add=TRUE, col="red", length=0.08)
box()


plot(hunan$geometry, border="lightgrey", main ="6st Nearest Neighbours")
plot(knn6, coords, pch = 19, cex = 0.6, add = TRUE, col = "red")
box()

1.7 Using Inversed Distance to plot neighbour

  • Starting with wm_q for queen contiguity, coords for centroids
    • nbdists takes in nb neighbours list and returns euclidean distances between neighbours in same structure
    • longlat uses Great Circle distances i.e. distance on a round earth instead of flat map
  • lapply function(x) applies inverse (1/x) to all output distances
show code
dist <- nbdists(wm_q, coords, longlat = TRUE)
ids <- lapply(dist, function(x) 1/(x))
cat("\nInspecting first five rows of [ids] obj \n\n")

Inspecting first five rows of [ids] obj 
show code
head(ids, 5)
[[1]]
[1] 0.01535405 0.03916350 0.01820896 0.02807922 0.01145113

[[2]]
[1] 0.01535405 0.01764308 0.01925924 0.02323898 0.01719350

[[3]]
[1] 0.03916350 0.02822040 0.03695795 0.01395765

[[4]]
[1] 0.01820896 0.02822040 0.03414741 0.01539065

[[5]]
[1] 0.03695795 0.03414741 0.01524598 0.01618354

1.7.1 Creating row-standardised weight matrix

  • style "W" gives equal weight to each neighbour (e.g. 0.125 for 8 neighbours, below)
show code
rswm_q <- nb2listw(wm_q, style="W", zero.policy = TRUE)
rswm_q
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 448 
Percentage nonzero weights: 5.785124 
Average number of links: 5.090909 

Weights style: W 
Weights constants summary:
   n   nn S0       S1       S2
W 88 7744 88 37.86334 365.9147
show code
cat("\nInspecting weights for region 10, with 8 neighbours: \n")

Inspecting weights for region 10, with 8 neighbours: 
show code
rswm_q$weights[10]
[[1]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
  • style "B" performs standardisation based on row distance
show code
rswm_ids <- nb2listw(wm_q, glist=ids, style="B", zero.policy=TRUE)
rswm_ids
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 448 
Percentage nonzero weights: 5.785124 
Average number of links: 5.090909 

Weights style: B 
Weights constants summary:
   n   nn       S0        S1     S2
B 88 7744 8.786867 0.3776535 3.8137
show code
cat("\nPrinting summary stats for row-standardised weights matrix \n")

Printing summary stats for row-standardised weights matrix 
show code
summary(unlist(rswm_ids$weights))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.008218 0.015088 0.018739 0.019614 0.022823 0.040338 
show code
cat("\nInspecting weights for region 10, with 8 neighbours: \n")

Inspecting weights for region 10, with 8 neighbours: 
show code
rswm_ids$weights[10]
[[1]]
[1] 0.02281552 0.01387777 0.01538326 0.01346650 0.02100510 0.02631658 0.01874863
[8] 0.01500046

1.8 Making use of spatial weight matrix

1.8.1 Spatial lag with row-standardized weights

Quiz: Can you see the meaning of Spatial lag with row-standardized weights now?
  • Spatial lag as a concept describes how spatially-neighbouring regions affect each other
    • Use of row-standardised weights assigns weights to neighbours based on proximity (i.e. nearer neighbour affects more)
    • It’s one way to calculate spatial lag, using distance to weight importance of neighbours
  • Calculating spatially lagged GDPPC values
show code
GDPPC.lag <- lag.listw(rswm_q, hunan$GDPPC)


cat("\nInspecting first 5 values for `Average Neighbour GDPPC`: \n")

Inspecting first 5 values for `Average Neighbour GDPPC`: 
show code
head(GDPPC.lag, 5)
[1] 24847.20 22724.80 24143.25 27737.50 27270.25
  • Creating lag.res dataframe with regionname and lag GDPPC value
    • NAME_3 column created for ease of left-join with hunan
  • left_join to create table of rows of region-neighbour-lag GDPPC-geometry
show code
lag.list <- list(hunan$NAME_3, lag.listw(rswm_q, hunan$GDPPC))
lag.res <- as.data.frame(lag.list)
colnames(lag.res) <- c("NAME_3", "lag GDPPC")

cat("\nInspecting first row for `lag.res` : \n")

Inspecting first row for `lag.res` : 
show code
cat(str(lag.res[1]))
'data.frame':   88 obs. of  1 variable:
 $ NAME_3: chr  "Anxiang" "Hanshou" "Jinshi" "Li" ...
show code
cat("\nShowing first 6 rows for joined `hunan +lag.res` : \n")

Showing first 6 rows for joined `hunan +lag.res` : 
show code
hunan <- left_join(hunan,lag.res)
Joining with `by = join_by(NAME_3)`
show code
head(hunan)
Simple feature collection with 6 features and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 110.4922 ymin: 28.61762 xmax: 112.3013 ymax: 30.12812
Geodetic CRS:  WGS 84
   NAME_2  ID_3  NAME_3   ENGTYPE_3  County GDPPC lag GDPPC
1 Changde 21098 Anxiang      County Anxiang 23667  24847.20
2 Changde 21100 Hanshou      County Hanshou 20981  22724.80
3 Changde 21101  Jinshi County City  Jinshi 34592  24143.25
4 Changde 21102      Li      County      Li 24473  27737.50
5 Changde 21103   Linli      County   Linli 25554  27270.25
6 Changde 21104  Shimen      County  Shimen 27137  21248.80
                        geometry
1 POLYGON ((112.0625 29.75523...
2 POLYGON ((112.2288 29.11684...
3 POLYGON ((111.8927 29.6013,...
4 POLYGON ((111.3731 29.94649...
5 POLYGON ((111.6324 29.76288...
6 POLYGON ((110.8825 30.11675...
  • Visual comparison of regional GDPPC and spatial lag GDPPC
    • Adjusted colour breaks to use comparable scale – MAPS CAN LIE! as Prof Kam says
show code
gdppc <- qtm(hunan, "GDPPC") +
  tm_layout(main.title = "GDPPC", main.title.position = "right")
lag_gdppc <- qtm(hunan, "lag GDPPC",
                  fill.style="fixed",fill.breaks=c(0,20000,40000,60000,80000,100000)) +
  tm_layout(main.title = "lag GDPPC", main.title.position = "right")
tmap_arrange(gdppc, lag_gdppc, asp=1, ncol=2)

1.8.2 spatial lag as a sum of neighbouring values

  • Using binary weights (0/1) to create spatial lag as a simple unweighted sum

    • use of nb2listw, style = "B" from before:
    show code
    b_weights <- lapply(wm_q, function(x) 0*x + 1)
    b_weights2 <- nb2listw(wm_q, 
                           glist = b_weights, 
                           style = "B")
    b_weights2
    Characteristics of weights list object:
    Neighbour list object:
    Number of regions: 88 
    Number of nonzero links: 448 
    Percentage nonzero weights: 5.785124 
    Average number of links: 5.090909 
    
    Weights style: B 
    Weights constants summary:
       n   nn  S0  S1    S2
    B 88 7744 448 896 10224
show code
lag_sum <- list(hunan$NAME_3, lag.listw(b_weights2, hunan$GDPPC))
lag.res <- as.data.frame(lag_sum)
colnames(lag.res) <- c("NAME_3", "lag_sum GDPPC")

cat("Printing first five rows of lag_sum\n")
Printing first five rows of lag_sum
show code
for (i in 1:5) {
  print(str(c(i, lag_sum[[1]][[i]], lag_sum[[2]][[i]])))
}
 chr [1:3] "1" "Anxiang" "124236"
NULL
 chr [1:3] "2" "Hanshou" "113624"
NULL
 chr [1:3] "3" "Jinshi" "96573"
NULL
 chr [1:3] "4" "Li" "110950"
NULL
 chr [1:3] "5" "Linli" "109081"
NULL
show code
hunan <- left_join(hunan, lag.res)
Joining with `by = join_by(NAME_3)`
show code
gdppc <- qtm(hunan, "GDPPC",
             fill.style="fixed", fill.breaks=c(0,20000,40000,60000,80000,100000, 200000, 300000, 400000, 500000)) +
  tm_layout(main.title = "GDPPC", main.title.position = "right")

lag_sum_gdppc <- qtm(hunan, "lag_sum GDPPC",
                     fill.style="fixed", fill.breaks=c(0,20000,40000,60000,80000,100000, 200000, 300000, 400000, 500000)
                     ) +
  tm_layout(main.title = "lag_sum GDPPC", main.title.position = "right")
tmap_arrange(gdppc, lag_sum_gdppc, asp=1, ncol=2)

Quiz: Can you understand the meaning of Spatial lag as a sum of neighboring values now?
  • Unlike before, here the spatial lag GDPP is calculated simply as a sum of neighbouring regions; this looks less accurate
    • more neighbours, more spatial lag; leads to huge disparity if one region has 8 neighbours and one region has only 1
    • note that often the scale is up to 10x the GDPPC; harder to compare values

1.8.3 Spatial window average

  • Spatial Window Average is row-standardised weights + self (“diagonal element”)
    • area[1] now has an additional ‘neighbour’, itself, for use in calculating row-standardised weights
show code
wm_qs <- include.self(wm_q)
wm_qs <- nb2listw(wm_qs)
wm_qs
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 536 
Percentage nonzero weights: 6.921488 
Average number of links: 6.090909 

Weights style: W 
Weights constants summary:
   n   nn S0       S1       S2
W 88 7744 88 30.90265 357.5308
  • Now we create lag variable, as before;
    • lag.listw calculates lag value
    • as.data.frame() and list() converts into dataframe
    • colnames renames columns to NAME_3, lag_window_avg GDPPC for ease of left joining
  • Maybe I should’ve used kable() to display my values instead of using cat to print:
show code
lag_w_avg_gpdpc <- lag.listw(wm_qs, 
                             hunan$GDPPC)
lag.list.wm_qs <- list(hunan$NAME_3, lag.listw(wm_qs, hunan$GDPPC))
lag_wm_qs.res <- as.data.frame(lag.list.wm_qs)
colnames(lag_wm_qs.res) <- c("NAME_3", "lag_window_avg GDPPC")
hunan <- left_join(hunan, lag_wm_qs.res)
Joining with `by = join_by(NAME_3)`
show code
cat("Displaying table of modified values:")
Displaying table of modified values:
show code
hunan %>%
  select("County", 
         "lag GDPPC", 
         "lag_sum GDPPC",
         "lag_window_avg GDPPC") %>%
  kable()
County lag GDPPC lag_sum GDPPC lag_window_avg GDPPC geometry
Anxiang 24847.20 124236 24650.50 POLYGON ((112.0625 29.75523…
Hanshou 22724.80 113624 22434.17 POLYGON ((112.2288 29.11684…
Jinshi 24143.25 96573 26233.00 POLYGON ((111.8927 29.6013,…
Li 27737.50 110950 27084.60 POLYGON ((111.3731 29.94649…
Linli 27270.25 109081 26927.00 POLYGON ((111.6324 29.76288…
Shimen 21248.80 106244 22230.17 POLYGON ((110.8825 30.11675…
Liuyang 43747.00 174988 47621.20 POLYGON ((113.9905 28.5682,…
Ningxiang 33582.71 235079 37160.12 POLYGON ((112.7181 28.38299…
Wangcheng 45651.17 273907 49224.71 POLYGON ((112.7914 28.52688…
Anren 32027.62 256221 29886.89 POLYGON ((113.1757 26.82734…
Guidong 32671.00 98013 26627.50 POLYGON ((114.1799 26.20117…
Jiahe 20810.00 104050 22690.17 POLYGON ((112.4425 25.74358…
Linwu 25711.50 102846 25366.40 POLYGON ((112.5914 25.55143…
Rucheng 30672.33 92017 25825.75 POLYGON ((113.6759 25.87578…
Yizhang 33457.75 133831 30329.00 POLYGON ((113.2621 25.68394…
Yongxing 31689.20 158446 32682.83 POLYGON ((113.3169 26.41843…
Zixing 20269.00 141883 25948.62 POLYGON ((113.7311 26.16259…
Changning 23901.60 119508 23987.67 POLYGON ((112.6144 26.60198…
Hengdong 25126.17 150757 25463.14 POLYGON ((113.1056 27.21007…
Hengnan 21903.43 153324 21904.38 POLYGON ((112.7599 26.98149…
Hengshan 22718.60 113593 23127.50 POLYGON ((112.607 27.4689, …
Leiyang 25918.80 129594 25949.83 POLYGON ((112.9996 26.69276…
Qidong 20307.00 142149 20018.75 POLYGON ((111.7818 27.0383,…
Chenxi 20023.80 100119 19524.17 POLYGON ((110.2624 28.21778…
Zhongfang 16576.80 82884 18955.00 POLYGON ((109.9431 27.72858…
Huitong 18667.00 74668 17800.40 POLYGON ((109.9419 27.10512…
Jingzhou 14394.67 43184 15883.00 POLYGON ((109.8186 26.75842…
Mayang 19848.80 99244 18831.33 POLYGON ((109.795 27.98008,…
Tongdao 15516.33 46549 14832.50 POLYGON ((109.9294 26.46561…
Xinhuang 20518.00 20518 17965.00 POLYGON ((109.227 27.43733,…
Xupu 17572.00 140576 17159.89 POLYGON ((110.7189 28.30485…
Yuanling 15200.12 121601 16199.44 POLYGON ((110.9652 28.99895…
Zhijiang 18413.80 92069 18764.50 POLYGON ((109.8818 27.60661…
Lengshuijiang 14419.33 43258 26878.75 POLYGON ((111.5307 27.81472…
Shuangfeng 24094.50 144567 23188.86 POLYGON ((112.263 27.70421,…
Xinhua 22019.83 132119 20788.14 POLYGON ((111.3345 28.19642…
Chengbu 12923.50 51694 12365.20 POLYGON ((110.4455 26.69317…
Dongan 14756.00 59024 15985.00 POLYGON ((111.4531 26.86812…
Dongkou 13869.80 69349 13764.83 POLYGON ((110.6622 27.37305…
Longhui 12296.67 73780 11907.43 POLYGON ((110.985 27.65983,…
Shaodong 15775.17 94651 17128.14 POLYGON ((111.9054 27.40254…
Suining 14382.86 100680 14593.62 POLYGON ((110.389 27.10006,…
Wugang 11566.33 69398 11644.29 POLYGON ((110.9878 27.03345…
Xinning 13199.50 52798 12706.00 POLYGON ((111.0736 26.84627…
Xinshao 23412.00 140472 21712.29 POLYGON ((111.6013 27.58275…
Shaoshan 39541.00 118623 43548.25 POLYGON ((112.5391 27.97742…
Xiangxiang 36186.60 180933 35049.00 POLYGON ((112.4549 28.05783…
Baojing 16559.60 82798 16226.83 POLYGON ((109.7015 28.82844…
Fenghuang 20772.50 83090 19294.40 POLYGON ((109.5239 28.19206…
Guzhang 19471.20 97356 18156.00 POLYGON ((109.8968 28.74034…
Huayuan 19827.33 59482 19954.75 POLYGON ((109.5647 28.61712…
Jishou 15466.80 77334 18145.17 POLYGON ((109.8375 28.4696,…
Longshan 12925.67 38777 12132.75 POLYGON ((109.6337 29.62521…
Luxi 18577.17 111463 18419.29 POLYGON ((110.1067 28.41835…
Yongshun 14943.00 74715 14050.83 POLYGON ((110.0003 29.29499…
Anhua 24913.00 174391 23619.75 POLYGON ((111.6034 28.63716…
Nan 25093.00 150558 24552.71 POLYGON ((112.3232 29.46074…
Yuanjiang 24428.80 122144 24733.67 POLYGON ((112.4391 29.1791,…
Jianghua 17003.00 68012 16762.60 POLYGON ((111.6461 25.29661…
Lanshan 21143.75 84575 20932.60 POLYGON ((112.2286 25.61123…
Ningyuan 20435.00 143045 19467.75 POLYGON ((112.0715 26.09892…
Shuangpai 17131.33 51394 18334.00 POLYGON ((111.8864 26.11957…
Xintian 24569.75 98279 22541.00 POLYGON ((112.2578 26.0796,…
Huarong 23835.50 47671 26028.00 POLYGON ((112.9242 29.69134…
Linxiang 26360.00 26360 29128.50 POLYGON ((113.5502 29.67418…
Miluo 47383.40 236917 46569.00 POLYGON ((112.9902 29.02139…
Pingjiang 55157.75 220631 47576.60 POLYGON ((113.8436 29.06152…
Xiangyin 37058.00 185290 36545.50 POLYGON ((112.9173 28.98264…
Cili 21546.67 64640 20838.50 POLYGON ((110.8822 29.69017…
Chaling 23348.67 70046 22531.00 POLYGON ((113.7666 27.10573…
Liling 42323.67 126971 42115.50 POLYGON ((113.5673 27.94346…
Yanling 28938.60 144693 27619.00 POLYGON ((113.9292 26.6154,…
You 25880.80 129404 27611.33 POLYGON ((113.5879 27.41324…
Zhuzhou 47345.67 284074 44523.29 POLYGON ((113.2493 28.02411…
Sangzhi 18711.33 112268 18127.43 POLYGON ((110.556 29.40543,…
Yueyang 29087.29 203611 28746.38 POLYGON ((113.343 29.61064,…
Qiyang 20748.29 145238 20734.50 POLYGON ((111.5563 26.81318…
Taojiang 35933.71 251536 33880.62 POLYGON ((112.0508 28.67265…
Shaoyang 15439.71 108078 14716.38 POLYGON ((111.5013 27.30207…
Lianyuan 29787.50 238300 28516.22 POLYGON ((111.6789 28.02946…
Hongjiang 18145.00 108870 18086.14 POLYGON ((110.1441 27.47513…
Hengyang 21617.00 108085 21244.50 POLYGON ((112.7144 26.98613…
Guiyang 29203.89 262835 29568.80 POLYGON ((113.0811 26.04963…
Changsha 41363.67 248182 48119.71 POLYGON ((112.9421 28.03722…
Taoyuan 22259.09 244850 22310.75 POLYGON ((112.0612 29.32855…
Xiangtan 44939.56 404456 43151.60 POLYGON ((113.0426 27.8942,…
Dao 16902.00 67608 17133.40 POLYGON ((111.498 25.81679,…
Jiangyong 16930.00 33860 17009.33 POLYGON ((111.3659 25.39472…
  • Now comparing lag_gdppc and w_ave_gdppc on the same colour scale
show code
lag_gdppc <- qtm(hunan, "lag GDPPC",
                  fill.style="fixed",fill.breaks=c(10000,20000,30000,40000,50000,60000),
                 legend.text.size = 0.5,legend.title.size = 0.5
                 ) +
  tm_layout(main.title = "Lag GDPPC", main.title.position = "right")

w_avg_gdppc <- qtm(hunan, "lag_window_avg GDPPC",
                   fill.style="fixed", fill.breaks=c(10000,20000,30000,40000,50000,60000),
                   legend.text.size = 0.5,legend.title.size = 0.5
                   ) +
  tm_layout(main.title = "lag_window_avg GDPPC", main.title.position = "right")
tmap_arrange(lag_gdppc, w_avg_gdppc, asp=1, ncol=2)

1.8.4 Spatial Window Sum

  • Simple sum of neighbouring values including self/diagonal value
    • lapply to create matrix of ones in shape of nb structure
    • use of nb2listw to assign a weights list object according to nb shape
  • use of lag.listw() to create lag variable for each region
show code
wm_qs <- include.self(wm_q) # Run above 
b_weights <- lapply(wm_qs, function(x) 0*x + 1)
b_weights2 <- nb2listw(wm_qs, 
                       glist = b_weights, 
                       style = "B")
w_sum_gdppc <- list(hunan$NAME_3, lag.listw(b_weights2, hunan$GDPPC))

cat("Printing first five rows of w_sum_gdppc\n")
Printing first five rows of w_sum_gdppc
show code
for (i in 1:5) {
  print(str(c(i, w_sum_gdppc[[1]][[i]], w_sum_gdppc[[2]][[i]])))
}
 chr [1:3] "1" "Anxiang" "147903"
NULL
 chr [1:3] "2" "Hanshou" "134605"
NULL
 chr [1:3] "3" "Jinshi" "131165"
NULL
 chr [1:3] "4" "Li" "135423"
NULL
 chr [1:3] "5" "Linli" "134635"
NULL
  • Then, just as before, convert using as.data.frame(), rename with col_names and left_join into huge hunan “sf” data.frame
show code
w_sum_gdppc.res <- as.data.frame(w_sum_gdppc)
colnames(w_sum_gdppc.res) <- c("NAME_3", "w_sum GDPPC")
hunan <- left_join(hunan, w_sum_gdppc.res)
Joining with `by = join_by(NAME_3)`
show code
hunan %>%
  select("County", 
         "GDPPC", 
         "lag GDPPC", 
         "lag_window_avg GDPPC",
         "lag_sum GDPPC",
         "w_sum GDPPC") %>%
  kable()
County GDPPC lag GDPPC lag_window_avg GDPPC lag_sum GDPPC w_sum GDPPC geometry
Anxiang 23667 24847.20 24650.50 124236 147903 POLYGON ((112.0625 29.75523…
Hanshou 20981 22724.80 22434.17 113624 134605 POLYGON ((112.2288 29.11684…
Jinshi 34592 24143.25 26233.00 96573 131165 POLYGON ((111.8927 29.6013,…
Li 24473 27737.50 27084.60 110950 135423 POLYGON ((111.3731 29.94649…
Linli 25554 27270.25 26927.00 109081 134635 POLYGON ((111.6324 29.76288…
Shimen 27137 21248.80 22230.17 106244 133381 POLYGON ((110.8825 30.11675…
Liuyang 63118 43747.00 47621.20 174988 238106 POLYGON ((113.9905 28.5682,…
Ningxiang 62202 33582.71 37160.12 235079 297281 POLYGON ((112.7181 28.38299…
Wangcheng 70666 45651.17 49224.71 273907 344573 POLYGON ((112.7914 28.52688…
Anren 12761 32027.62 29886.89 256221 268982 POLYGON ((113.1757 26.82734…
Guidong 8497 32671.00 26627.50 98013 106510 POLYGON ((114.1799 26.20117…
Jiahe 32091 20810.00 22690.17 104050 136141 POLYGON ((112.4425 25.74358…
Linwu 23986 25711.50 25366.40 102846 126832 POLYGON ((112.5914 25.55143…
Rucheng 11286 30672.33 25825.75 92017 103303 POLYGON ((113.6759 25.87578…
Yizhang 17814 33457.75 30329.00 133831 151645 POLYGON ((113.2621 25.68394…
Yongxing 37651 31689.20 32682.83 158446 196097 POLYGON ((113.3169 26.41843…
Zixing 65706 20269.00 25948.62 141883 207589 POLYGON ((113.7311 26.16259…
Changning 24418 23901.60 23987.67 119508 143926 POLYGON ((112.6144 26.60198…
Hengdong 27485 25126.17 25463.14 150757 178242 POLYGON ((113.1056 27.21007…
Hengnan 21911 21903.43 21904.38 153324 175235 POLYGON ((112.7599 26.98149…
Hengshan 25172 22718.60 23127.50 113593 138765 POLYGON ((112.607 27.4689, …
Leiyang 26105 25918.80 25949.83 129594 155699 POLYGON ((112.9996 26.69276…
Qidong 18001 20307.00 20018.75 142149 160150 POLYGON ((111.7818 27.0383,…
Chenxi 17026 20023.80 19524.17 100119 117145 POLYGON ((110.2624 28.21778…
Zhongfang 30846 16576.80 18955.00 82884 113730 POLYGON ((109.9431 27.72858…
Huitong 14334 18667.00 17800.40 74668 89002 POLYGON ((109.9419 27.10512…
Jingzhou 20348 14394.67 15883.00 43184 63532 POLYGON ((109.8186 26.75842…
Mayang 13744 19848.80 18831.33 99244 112988 POLYGON ((109.795 27.98008,…
Tongdao 12781 15516.33 14832.50 46549 59330 POLYGON ((109.9294 26.46561…
Xinhuang 15412 20518.00 17965.00 20518 35930 POLYGON ((109.227 27.43733,…
Xupu 13863 17572.00 17159.89 140576 154439 POLYGON ((110.7189 28.30485…
Yuanling 24194 15200.12 16199.44 121601 145795 POLYGON ((110.9652 28.99895…
Zhijiang 20518 18413.80 18764.50 92069 112587 POLYGON ((109.8818 27.60661…
Lengshuijiang 64257 14419.33 26878.75 43258 107515 POLYGON ((111.5307 27.81472…
Shuangfeng 17755 24094.50 23188.86 144567 162322 POLYGON ((112.263 27.70421,…
Xinhua 13398 22019.83 20788.14 132119 145517 POLYGON ((111.3345 28.19642…
Chengbu 10132 12923.50 12365.20 51694 61826 POLYGON ((110.4455 26.69317…
Dongan 20901 14756.00 15985.00 59024 79925 POLYGON ((111.4531 26.86812…
Dongkou 13240 13869.80 13764.83 69349 82589 POLYGON ((110.6622 27.37305…
Longhui 9572 12296.67 11907.43 73780 83352 POLYGON ((110.985 27.65983,…
Shaodong 25246 15775.17 17128.14 94651 119897 POLYGON ((111.9054 27.40254…
Suining 16069 14382.86 14593.62 100680 116749 POLYGON ((110.389 27.10006,…
Wugang 12112 11566.33 11644.29 69398 81510 POLYGON ((110.9878 27.03345…
Xinning 10732 13199.50 12706.00 52798 63530 POLYGON ((111.0736 26.84627…
Xinshao 11514 23412.00 21712.29 140472 151986 POLYGON ((111.6013 27.58275…
Shaoshan 55570 39541.00 43548.25 118623 174193 POLYGON ((112.5391 27.97742…
Xiangxiang 29361 36186.60 35049.00 180933 210294 POLYGON ((112.4549 28.05783…
Baojing 14563 16559.60 16226.83 82798 97361 POLYGON ((109.7015 28.82844…
Fenghuang 13382 20772.50 19294.40 83090 96472 POLYGON ((109.5239 28.19206…
Guzhang 11580 19471.20 18156.00 97356 108936 POLYGON ((109.8968 28.74034…
Huayuan 20337 19827.33 19954.75 59482 79819 POLYGON ((109.5647 28.61712…
Jishou 31537 15466.80 18145.17 77334 108871 POLYGON ((109.8375 28.4696,…
Longshan 9754 12925.67 12132.75 38777 48531 POLYGON ((109.6337 29.62521…
Luxi 17472 18577.17 18419.29 111463 128935 POLYGON ((110.1067 28.41835…
Yongshun 9590 14943.00 14050.83 74715 84305 POLYGON ((110.0003 29.29499…
Anhua 14567 24913.00 23619.75 174391 188958 POLYGON ((111.6034 28.63716…
Nan 21311 25093.00 24552.71 150558 171869 POLYGON ((112.3232 29.46074…
Yuanjiang 26258 24428.80 24733.67 122144 148402 POLYGON ((112.4391 29.1791,…
Jianghua 15801 17003.00 16762.60 68012 83813 POLYGON ((111.6461 25.29661…
Lanshan 20088 21143.75 20932.60 84575 104663 POLYGON ((112.2286 25.61123…
Ningyuan 12697 20435.00 19467.75 143045 155742 POLYGON ((112.0715 26.09892…
Shuangpai 21942 17131.33 18334.00 51394 73336 POLYGON ((111.8864 26.11957…
Xintian 14426 24569.75 22541.00 98279 112705 POLYGON ((112.2578 26.0796,…
Huarong 30413 23835.50 26028.00 47671 78084 POLYGON ((112.9242 29.69134…
Linxiang 31897 26360.00 29128.50 26360 58257 POLYGON ((113.5502 29.67418…
Miluo 42497 47383.40 46569.00 236917 279414 POLYGON ((112.9902 29.02139…
Pingjiang 17252 55157.75 47576.60 220631 237883 POLYGON ((113.8436 29.06152…
Xiangyin 33983 37058.00 36545.50 185290 219273 POLYGON ((112.9173 28.98264…
Cili 18714 21546.67 20838.50 64640 83354 POLYGON ((110.8822 29.69017…
Chaling 20078 23348.67 22531.00 70046 90124 POLYGON ((113.7666 27.10573…
Liling 41491 42323.67 42115.50 126971 168462 POLYGON ((113.5673 27.94346…
Yanling 21021 28938.60 27619.00 144693 165714 POLYGON ((113.9292 26.6154,…
You 36264 25880.80 27611.33 129404 165668 POLYGON ((113.5879 27.41324…
Zhuzhou 27589 47345.67 44523.29 284074 311663 POLYGON ((113.2493 28.02411…
Sangzhi 14624 18711.33 18127.43 112268 126892 POLYGON ((110.556 29.40543,…
Yueyang 26360 29087.29 28746.38 203611 229971 POLYGON ((113.343 29.61064,…
Qiyang 20638 20748.29 20734.50 145238 165876 POLYGON ((111.5563 26.81318…
Taojiang 19509 35933.71 33880.62 251536 271045 POLYGON ((112.0508 28.67265…
Shaoyang 9653 15439.71 14716.38 108078 117731 POLYGON ((111.5013 27.30207…
Lianyuan 18346 29787.50 28516.22 238300 256646 POLYGON ((111.6789 28.02946…
Hongjiang 17733 18145.00 18086.14 108870 126603 POLYGON ((110.1441 27.47513…
Hengyang 19382 21617.00 21244.50 108085 127467 POLYGON ((112.7144 26.98613…
Guiyang 32853 29203.89 29568.80 262835 295688 POLYGON ((113.0811 26.04963…
Changsha 88656 41363.67 48119.71 248182 336838 POLYGON ((112.9421 28.03722…
Taoyuan 22879 22259.09 22310.75 244850 267729 POLYGON ((112.0612 29.32855…
Xiangtan 27060 44939.56 43151.60 404456 431516 POLYGON ((113.0426 27.8942,…
Dao 18059 16902.00 17133.40 67608 85667 POLYGON ((111.498 25.81679,…
Jiangyong 17168 16930.00 17009.33 33860 51028 POLYGON ((111.3659 25.39472…
  • Side-by-side comparison plot with lag_sum GDPPC
show code
lag_sum_gdppc <- qtm(hunan, "lag_sum GDPPC",
                     fill.style="fixed", fill.breaks=c(0,100000, 200000, 300000, 400000, 500000),
                     legend.text.size = 0.5,legend.title.size = 0.5
                     ) +
  tm_layout(main.title = "lag_sum GDPPC", main.title.position = "right")
w_sum_gdppc <- qtm(hunan, "w_sum GDPPC",
                   fill.style="fixed", fill.breaks=c(0,100000, 200000, 300000, 400000, 500000),
                  legend.text.size = 0.5,legend.title.size = 0.5
                   ) +
  tm_layout(main.title = "w_sum GDPPC", main.title.position = "right")
tmap_arrange(lag_sum_gdppc, w_sum_gdppc, asp=1, ncol=2)

2. Global Measures of Spatial Autocorrelation

  • Visualizing difference between equal interval and equal quantile plots
  • Use of Moran's I test to test global spatial autocorrelation
    • Important to note, we are calculating Global Moran’s I, vs Local Moran’s I, later
    • Use of Monte Carlo Moran's I simulations
  • Use of Geary's C test to test global spatial autocorrelation
    • Use of Monte Carlo Geary's C simulations
  • Drawing Spatial Correlogram for both Moran's I and Geary's C

2.1 Analytical Question

  • Identify if development is equally distributed geographically in Hunan province
  • If NO, then ask: Is there signs of spatial clustering?
  • If YES, then ask: Where is the spatial clustering?

2.2 Import Datasets

2.3 Import Packages

  • Actually, the same datasets are used from above; see Section 1.2
    • /data/geospatial/Hunan.###: This is a geospatial data set in ESRI shapefile format.
    • /data/geospatial/Hunan.###: This is a geospatial data set in ESRI shapefile format.
  • Similarly, the same packages are used, see Section 1.3
    • sf, spdep, tmap, tidyverse
# import can be ignored, loaded from before
# pacman::p_load(sf, spdep, tmap, tidyverse, knitr)

# We run here to reload hunan dataframe, remove spatial lag columns added in earlier
hunan <- st_read(dsn = "data/geospatial", 
                 layer = "Hunan")
Reading layer `Hunan' from data source 
  `C:\1darren\ISSS624\Hands-on_Ex02\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS:  WGS 84
hunan2012 <- read_csv("data/aspatial/Hunan_2012.csv")
Rows: 88 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, City
dbl (27): avg_wage, deposite, FAI, Gov_Rev, Gov_Exp, GDP, GDPPC, GIO, Loan, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hunan <- left_join(hunan,hunan2012) %>%
  select(1:4, 7, 15)
Joining with `by = join_by(County)`
kable(head(hunan, 5))
NAME_2 ID_3 NAME_3 ENGTYPE_3 County GDPPC geometry
Changde 21098 Anxiang County Anxiang 23667 POLYGON ((112.0625 29.75523…
Changde 21100 Hanshou County Hanshou 20981 POLYGON ((112.2288 29.11684…
Changde 21101 Jinshi County City Jinshi 34592 POLYGON ((111.8927 29.6013,…
Changde 21102 Li County Li 24473 POLYGON ((111.3731 29.94649…
Changde 21103 Linli County Linli 25554 POLYGON ((111.6324 29.76288…
  • Now we create a basemap and chloropleth to look at GDPPC values for 2023
  • Note the difference in scales!
show code
equal <- tm_shape(hunan) +
  tm_fill("GDPPC",
          n = 5,
          style = "equal") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "Equal interval classification", main.title.position = "right")

quantile <- tm_shape(hunan) +
  tm_fill("GDPPC",
          n = 5,
          style = "quantile") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "Equal quantile classification", main.title.position = "right")

tmap_arrange(equal, 
             quantile, 
             asp=1, 
             ncol=2)

2.4 Calculating Global Spatial Autocorrelation

  1. Create QUEEN contiguity weight matrix as in Section 1.5, Contiguity Spatial Weights
  2. Create row-standardised weight matrix as inSection 1.7.1, Creating row-standardised weight matrix
    • use of style="W" for equal weights here for example, but Style="B" more robust
show code
wm_q <- poly2nb(hunan, 
                queen=TRUE)
# summary(wm_q)
rswm_q <- nb2listw(wm_q, 
                   style="W", 
                   zero.policy = TRUE)
rswm_q
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 448 
Percentage nonzero weights: 5.785124 
Average number of links: 5.090909 

Weights style: W 
Weights constants summary:
   n   nn S0       S1       S2
W 88 7744 88 37.86334 365.9147

2.4 “M”: Moran’s I test

  • Moran's I evaluates spatial autocorrelation and returns whether pattern is clustered, dispersed, or random (i.e. no autocorrelation)
show code
moran.test(hunan$GDPPC, 
           listw=rswm_q, 
           zero.policy = TRUE, 
           na.action=na.omit)

    Moran I test under randomisation

data:  hunan$GDPPC  
weights: rswm_q    

Moran I statistic standard deviate = 4.7351, p-value = 1.095e-06
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.300749970      -0.011494253       0.004348351 
Quiz: What statistical conclusion can you draw from the output above?
  • Moran’s I Statistic: of ~0.3 indicates weak spatial correlation
    • (+) means similar values are closer (e.g. high with high); (-) means dissimilar values cluster (e.g. high with low)
    • Closer to 0 means random, closer to 1 or -1 indicates strong correlation
    • Expected value is -1/(N-1), i.e. -1/87 here, which is close enough to 0 for estimation
  • alternative: greater: Suggests that the alternative hypothesis is true, that GDPPC value has is spatially correlated (e.g. neighbouring regions affect value)
    • null hypothesis is that the GDPPC is randomly distributed in space
  • p-Value: of magnitude 1e-06 suggests confidence/statistical significance of result
    • p value < 0.05 suggests result is not due to random chance

2.4.1 “M”: Monte Carlo Moran’s I test

  • Not sure where the is necessary, but R documentation suggests also using the Monte Carlo version
  • Moran's I evaluates spatial autocorrelation and returns whether pattern is clustered, dispersed, or random (i.e. no autocorrelation)
    • even using a different seed from Prof, values are similar
show code
set.seed(42)
bperm= moran.mc(hunan$GDPPC, 
                listw=rswm_q, 
                nsim=999, 
                zero.policy = TRUE, 
                na.action=na.omit)
bperm

    Monte-Carlo simulation of Moran I

data:  hunan$GDPPC 
weights: rswm_q  
number of simulations + 1: 1000 

statistic = 0.30075, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Quiz: What statistical conclusion can you draw from the output above?
  • Moran’s I Statistic: of 0.30075 indicates weak spatial correlation, as before;
    • alternative: greater: Suggests that the alternative hypothesis is true, that GDPPC value has is spatially correlated (e.g. neighbouring regions affect value)
    • p-Value: of 0.001 suggests confidence/statistical significance of result; 0.001 probability of observing results like that
    • observed rank according to documentations suggests the observed statistic is ranked 1000th of 1000 simulations, but I am not sure of what this means.

2.4.4.2 Visualising Monte Carlo Moran’s I test

  • Extracting key statistics from $res column
show code
cat("Printing values from simulation:")
Printing values from simulation:
show code
cat("\n>> mean\t: ", mean(bperm$res[1:999]))

>> mean :  -0.007182342
show code
cat("\n>> var\t: ", var(bperm$res[1:999]))

>> var  :  0.004295062
show code
cat("\n>> summary:\n")

>> summary:
show code
summary(bperm$res[1:999])
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.200131 -0.052501 -0.013190 -0.007182  0.034529  0.228826 
  • plotting res column in histogram:
show code
hist(bperm$res, 
     freq=TRUE, 
     breaks=20, 
     xlab="Simulated Moran's I",
     main="Histogram of Moran's I statistics, 1000 simulations")
abline(v=0, 
       col="red") 

Quiz: What statistical observation can you draw from the output above?
  • Monte Carlo results in normal distribution; most values are close to zero or slightly negative – close to expected value
    • most values within 0.2; relatively weak spatial correlation
  • observed rank The Moran’s I statistic of 0.30075 is ranked 1000/1000 simulations, and the highest statistic obtained
Challenge: Instead of using Base Graph to plot the values, plot the values by using ggplot2 package.
  • Doesn’t generate the exact same histogram, but close enough
  • Not sure why breaks = 20 for geom_histogram doesn’t work right, have to manually compute data_binwidth instead
show code
library(ggplot2)

# Create a sample dataset
set.seed(123)
data <- as.data.frame(bperm$res)
data_binwidth <- (max(bperm$res) - min(bperm$res)) / 20

ggplot(data, aes(x = bperm$res)) +
  geom_histogram(binwidth = data_binwidth, fill = "grey", color = "black", alpha = 0.7) +
  labs(title = "ggplot2 Histogram Counter-Example", x = "Simulated Moran's I", y = "Frequency") + 
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", size = 1)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

2.4.5 Geary’s C Statistics

  • Geary's C evaluates spatial autocorrelation at global level;
    • Geary’s C is inversely related to Moran’s I;
    • Geary’s C uses sum of squared distances, less sensitive to linear associations
    • Moran’s I uses standardized spatial covariance
show code
geary.test(hunan$GDPPC, listw=rswm_q)

    Geary C test under randomisation

data:  hunan$GDPPC 
weights: rswm_q 

Geary C statistic standard deviate = 3.6108, p-value = 0.0001526
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic       Expectation          Variance 
        0.6907223         1.0000000         0.0073364 
Quiz: What statistical conclusion can you draw from the output above?
  • Geary’s C Statistic: of 0.69 suggests spatial correlation, i.e. GDPPC is related to neighbours
    • Geary’s c < 1 indicates positive spatial autocorrelation, while >1 suggests spatial dispersion (negative auto-correlation)
    • alternative hypothesis: Expectation greater than statistic: Suggests that the alternative hypothesis is true, that GDPPC value has is positively spatially correlated (e.g. neighbouring regions affect value)
    • p-Value: of 0.0001526 suggests confidence/statistical significance of result; very very low probability that result obtained is due to pure chance
  • corresponding Monte Carlo permutation test for Geary’s C; even with a different seed, the values are similar
show code
set.seed(42)
bperm=geary.mc(hunan$GDPPC, 
               listw=rswm_q, 
               nsim=999)
bperm

    Monte-Carlo simulation of Geary C

data:  hunan$GDPPC 
weights: rswm_q 
number of simulations + 1: 1000 

statistic = 0.69072, observed rank = 2, p-value = 0.002
alternative hypothesis: greater
Quiz: What statistical conclusion can you draw from the output above?
  • observed rank The Geary’s C statistic of 0.69072 is ranked 2/1000 simulations, and the second-lowest; i.e. most values are higher.
  • p-value of 0.002 is still very small, results unlikely to be due to randomness; alternative hypothesis: greater suggests there exists positive auto-correlation

2.4.5.3 Visualising Monte Carlo Geary’s C test

  • Extracting key statistics from $res column
show code
cat("Printing values from Geary's C simulation:")
Printing values from Geary's C simulation:
show code
cat("\n>> mean\t: ", mean(bperm$res[1:999]))

>> mean :  0.9953715
show code
cat("\n>> var\t: ", var(bperm$res[1:999]))

>> var  :  0.00723939
show code
cat("\n>> summary:\n")

>> summary:
show code
summary(bperm$res[1:999])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.6827  0.9404  0.9960  0.9954  1.0535  1.2888 
  • plotting res column in histogram:
show code
hist(bperm$res, 
     freq=TRUE, 
     breaks=20, 
     xlab="Simulated Geary's c",
     main="Histogram of Geary's c statistics, 1000 simulations")
abline(v=1, 
       col="red") 

Quiz: What statistical conclusion can you draw from the output above?
  • observed rank The Geary’s C statistic of 0.69072 is ranked 2/1000 simulations, and the second-lowest; i.e. most values are higher.
    • as before, Monte Carlo simulations seem to generate normal distribution
    • positive spatial correlation exists but not strong; mostly near expected value 1.0

2.5 Spatial Correlogram

2.5.1 Moran’s I Correlogram

show code
MI_corr <- sp.correlogram(wm_q, 
                          hunan$GDPPC, 
                          order=6, 
                          method="I", 
                          style="W")
print(MI_corr)
Spatial correlogram for hunan$GDPPC 
method: Moran's I
         estimate expectation   variance standard deviate Pr(I) two sided    
1 (88)  0.3007500  -0.0114943  0.0043484           4.7351       2.189e-06 ***
2 (88)  0.2060084  -0.0114943  0.0020962           4.7505       2.029e-06 ***
3 (88)  0.0668273  -0.0114943  0.0014602           2.0496        0.040400 *  
4 (88)  0.0299470  -0.0114943  0.0011717           1.2107        0.226015    
5 (88) -0.1530471  -0.0114943  0.0012440          -4.0134       5.984e-05 ***
6 (88) -0.1187070  -0.0114943  0.0016791          -2.6164        0.008886 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
show code
plot(MI_corr)

Quiz: What statistical conclusion can you draw from the plot/output above?
  • Correlogram At smaller lags (i.e. smaller distance away) there is positive spatial auto-correlation, fading away to randomness/no autocorrelation at intermediate lags, and negative at higher (5-6) lags
  • Observing printout, lags at 3 and 4 are not strongly statistically significant, best results at lags 1, 2, 5.

2.5.2 Geary’s C Correlogram

show code
GC_corr <- sp.correlogram(wm_q, 
                          hunan$GDPPC, 
                          order=6, 
                          method="C", 
                          style="W")
print(GC_corr)
Spatial correlogram for hunan$GDPPC 
method: Geary's C
        estimate expectation  variance standard deviate Pr(I) two sided    
1 (88) 0.6907223   1.0000000 0.0073364          -3.6108       0.0003052 ***
2 (88) 0.7630197   1.0000000 0.0049126          -3.3811       0.0007220 ***
3 (88) 0.9397299   1.0000000 0.0049005          -0.8610       0.3892612    
4 (88) 1.0098462   1.0000000 0.0039631           0.1564       0.8757128    
5 (88) 1.2008204   1.0000000 0.0035568           3.3673       0.0007592 ***
6 (88) 1.0773386   1.0000000 0.0058042           1.0151       0.3100407    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
show code
plot(GC_corr)

Quiz: What statistical conclusion can you draw from the plot/output above?
  • Correlogram At smaller lags (i.e. smaller distance away) there is positive spatial auto-correlation, fading away to randomness/no autocorrelation at intermediate lags, and negative at higher (5-6) lags
  • Observing printout, lags at 3, 4, 6 are not strongly statistically significant, best results at lags 1, 2, 5.
    • Probably because few regions are 6-regions away; results probably most accurate for 1- or 2-regions away

3. Local Measures of Spatial Autocorrelation

Moved to new page: Hands-on_Ex02_local