Malcomb, Weaver and Krakowka (2014) published one of the first sub-national geographic climate change vulnerability models for a developing country (1.4). The authors intended for the study to be replicable across space (other African countries with similar data available) (7.1), time (when new survey data is published) (4.5 and 7.1), and vulnerability stimuli (7.1). The study’s social impacts are to address extreme vulnerability to climate change (1.3) and assisting in the allocation and evaluation of foreign aid (1.2). The methodology was designed to be “transparent and easily replicable” (2.1) in its use of “locally derived indicators and granular data” (2.1). The study was designed to address critiques of vulnerability models aimed at their uncertainty and sensitivity due to problems of scale and spatial aggregation, normative and subjective modelling decisions, and data availability, and challenges in model comparability (2.1). The model uses household adaptive capacity data from the United States Agency for International Development (USAID) Demographic and Health Surveys (DHS) (1.4 and 4.1) available in 44 African countries (7.1), livelihood sensitivity data from the USAID / Famine Early Warning Systems Network (FEWSnet) livelihood zones baseline surveys available in 23 African countries (3.6), and global physical exposure data from the United Nations Environment Programme (UNEP) Global Risk Data Platform.
This replication study is motivated by three factors. First, there is an urgent need to evaluate the reproducibility of research in human-environment and geographical sciences (HEGS) and to establish protocols and infrastructure for conducting and publishing reproduction/replication studies and reproducible research in HEGS. Second, a fully reproducible publication can be more readily replicated in new geographic, temporal, and thematic contexts, and tested for uncertainty due to data constraints and subjective modelling decisions. Third, climate change is causing increasingly severe in Africa. Improving the reproducibility and replicability of climate vulnerability research will hopefully enhance the potential for research to inform policy and reduce harm caused by climate change.
Malcomb et al (2014) produce two models of interest for Malawi. Figure 4, labelled “Malawi Household Resilience”, visualizes the average adaptive capacity score of households in each traditional authority. Figure 5, labelled “Malawi Composite Vulnerability Index”, visualizes vulnerability scores by locations (cells) in a continuous raster grid. In this study, we will attempt to identically reproduce figure 4 (adaptive capacity by traditional authority) and figure 5 (vulnerability grid) using The R Project for Statistical Computing and the same data sources cited in the original publication. We will visually compare our resulting reproduction figures with the original figures. Comparison will be aided by digitizing and joining the original figure results to the reproduction results for each model, and then calculating any differences between them. Differences will be visualized with thematic maps for both models, a confusion matrix for figure 4 (adaptive capacity by traditional authority), and a scatterplot for figure 5 (vulnerability grid). An exact reproduction should produce exact replicas of the rank order of traditional authorities by adaptive capacity and grid cells by vulnerability. We will test this with the Spearman’s Rho Correlation Coefficient, expecting values of 1 for perfect correlation.
The original study is a descriptive geographic multi-criteria analysis based on local expert opinion, and therefore has no testable hypotheses or effects.
The replication study data and code will be made available in a GitHub repository to the greatest extent that licensing and file sizes permit. The repository will be made public at github.com/HEGSRR/RPr-Malcomb-2014
Malcomb, D. W., E. A. Weaver, and A. R. Krakowka. 2014. Vulnerability modeling for sub-Saharan Africa: An operationalized approach in Malawi. Applied Geography 48:17–30. DOI:[10.1016/j.apgeog.2014.01.004](https://doi.org/10.1016/j.apgeog.2014.01.004).
Reproducibility, Vulnerability, GIS, Climate Change, Africa
The reproduction study design will first implement the original study as closely as possible to reproduce the 2010 Household Resilience map (F4) and Malawi Vulnerability Map (F5). Our two confirmatory hypotheses are that we will be able to independently reproduce results for both maps.
The working hypotheses are therefore:
H1: There is no perfect positive correlation between Malcomb et al’s ranking of traditional authorities by household resilience and our reproduction study’s ranking of traditional authorities by household resilience.
H2: There is no perfect positive between Malcomb et al’s ranking of locations by climate vulnerability and our reproduction study’s ranking of locations by climate vulnerability.
We will evaluate each of these hypotheses using a Spearman’s Rho Correlation. A failure to reject these hypotheses would indicate that our results do not exactly match those of the original authors. A positive correlation approaching 1 would indicate a partial reproduction
The original study is observational and descriptive, with no hypotheses or effect sizes. The study is a multi-criteria analysis using geographic information systems (GIS) to implement a hierarchical geographic model of climate change vulnerability model in Malawi.
The spatial extent of the study was the country of Malawi. The spatial scale of the study was the third administrative level (traditional authorities) and a raster grid of unknown spatial resolution. The temporal extent of the study was explicitly 2004—2010 (4.5), but the contains secondary data collected earlier (3.6 and F5).
The model themes, indicators, and weights were selected based upon 70 interviews and 11 village focus groups from field trips to Malawi in March and August of 2011 (1.4, 4.2 and A1). Themes and indicators were also contextualized in literature (3.3 through 3.7) and adjusted based on redundancy and representativeness across the country (4.3). The model and weights were adjusted through “several iterations of the model using alternative weighting schemes” (4.3) to produce a “final product that reflects Malawi’s contextual and perceptual vulnerability” (4.3). Each theme was constructed of indicators from a single data provider: adaptive capacity is measured with USAID DHS surveys, livelihood sensitivity is measured with FEWSnet/Malawi Vulnerability Assessment Committee (MVAC) livelihood zones baseline data, and physical exposure is measured with UNEP Global Risk Data Platform data (T1 and T2). Although the authors emphasize a grounded local evidence-based selection of indicators and weights (2.1, 4.2, 5.1 and 7.1), other evidence in the publication suggests a model design based on a more pragmatic combination of factors including expert local opinion, deductive theory, and the availability and characteristics of data.
The study did not use any randomization.
The original study was conducted using STATA™ (4.4) and ArcGIS™ (4.6, F3 and F4) with unspecified software versions, by 2012 according to creation dates on map figures (F3, F4 and F5).
The study was originally conducted using ArcGIS and unspecified statistical software. This reproduction study uses R, including the rdhs package for DHS survey data, the sf package for vector analysis, the stars package for raster analysis, and the tmap package for cartography.
# set up default knitr parameters
knitr::opts_chunk$set(
echo = FALSE,
fig.width = 8,
fig.path = paste0(here("results", "figures"), "/")
)
# these values allow you to access private and public raw data more efficiently
private_r <- here("data", "raw", "private")
public_r <- here("data", "raw", "public")
public_d <- here("data", "derived", "public")
scratch <- here("data", "scratch")
Major lakes were downloaded from MASDAP, the Malawi Spatial Data Platform.
Dissolve lakes into a single multi-part feature with one field
EA
containing the value Lake
.
Livelihood zones geographic data may be downloaded from the FEWS NET Data Center at https://fews.net/fews-data/335.
Livelihood zones attribute data was provided by FEWS NET in the form
of one three spreadsheets describing typical livelihood profiles for
each zone, with one sheet for poor
households, one for
middle
income households, and one for rich
households. This data was based on focus groups with stakeholders in
each livelihood zone. The authors have summarized the individual
poor
household spreadsheets into one comprehensive table of
variables relevant to the study.
In order to prepare geographic livelihood zone data for analysis,
geometry errors are fixed, national parks are removed, and the
coordinate reference system is transformed to EPSG:4326 (WGS 1984)
geographic coordinates. Livelihood zone attribute data is then joined to
the geographic data by livelihood zone code LZCODE
.
The UNEP Global Risk Data Platform used for this research is no longer available online. The data is provided with the research compendium.
#SKIP THIS (load DHS data if no DHS account) ## Household DHS data
Geographic USAID Demographic and Health Survey (DHS) data requires pre-approved access clearance and login credentials from the DHS Program. For this reproduction study, the following procedure was used to gain access:
The rdhs
package can be used to download the data,
provided a login email and project name via console and password via
pop-up dialogue.
Download the Malawi 2010 survey data and geographic points.
Load tabular data of household surveys
Load geographic data of household survey clusters. Some household survey points are erroneously placed at the WGS 1984 coordinate reference system origin (Equator and Prime Meridian).
In order to simultaneously maximize reproducibility while avoiding direct redistribution of DHS GPS data, we spatially join the GPS data to the Traditional Authority enumeration areas. Adaptive capacity is ultimately mapped by traditional authority, but the data comes from household-level surveys. Surveys are grouped into clusters with one geographic point. Therefore, the traditional authority to which each survey will be assigned must be spatially joined to the cluster point, and then joined by attribute to the household survey. The adaptive capacity calculation at the household level also requires urban/rural status, which is stored in the cluster.
Many household surveys contain inconclusive answers (e.g. “I don’t know”) or are missing data for survey questions used in the adaptive capacity calculation. The livestock variable will be calculated as a sum of four livestock types, so we remove any household with uncertain answers about any of the livestock types and remove households with missing data for all livestock types. Households with answers about some livestock types and missing data for others are still included in the data.
We remove incomplete household surveys.
Calculate percent rank for each component of household adaptive capacity. We had to make many assumptions about calculating individual components, e.g. about how to aggregate different forms of livestock, and which values to invert such that high numbers correspond to low capacity (e.g. number of orphans or sick members of the household). Rescaling to a quintile rank as described in the original study is unclear, especially considering the number of discrete or even binary inputs. We have made a judgement call to do this by calculating percent rank and multiplying by 4, producing a theoretical domain of 0 to 4 similar to that of quintiles.
Calculate household-level adaptive capacity scores based on original
study Table 2 weights. The indicators have already been rescaled to a
possible domain of 0
to 4
, and the weights sum
to 0.4
, giving a possible domain of adaptive capacity
scores from 0.0
to 1.6
.
Summary statistics of adaptive capacity and its components at the household level.
Join adaptive capacity data to geographic TAs and rescale in attempt to match original publication. The original publication figure 4 shows ranges from 11.48 to 25.77, but after rescaling indicators to domains of 0 to 4 and multiplying by percentages in table 2 (which sum to 0.4), the theoretical domain is only 0 to 1.6. We might suppose that the authors had rescaled adaptive capacity to a possible domain of 0 to 40 in accordance with the 40% weight of adaptive capacity in the overall vulnerability model. Therefore, we may multiply our possible domain of 0 to 1.6 by 25 to achieve a possible domain of 0 to 40.
rpac_unscaled | rpac | |
---|---|---|
nbr.val | 215.00 | 215.00 |
nbr.na | 24.00 | 24.00 |
min | 0.30 | 7.41 |
max | 0.68 | 16.90 |
range | 0.38 | 9.48 |
median | 0.43 | 10.66 |
mean | 0.44 | 10.99 |
std.dev | 0.07 | 1.80 |
The original publication uses the Jenks Natural Breaks method to classify the data.
rpac_class | n |
---|---|
1 | 67 |
2 | 80 |
3 | 53 |
4 | 15 |
NA | 24 |
Map reproduction results for comparison to figure 4.
Ordinal data from figure 4 was digitized in QGIS with the following procedure:
pdf
file using
Adobe Acrobat Pro.png
file with pixel
dimensions 1982 by 2811ta_v.gpkg
using WGS 84 geographic coordinates (epsg:4326).
Use linear georeferencing with points in
metadata\malcomb_fig4.png.points
ta_v
to UTM 36S epsg:32736:
ta_v_fig4.gpkg:utm36s
.-600m
:
ta_v_fig4.gpkg:utm36s
.ta_v_fig4.gpkg:buffer_wgs84
.ta_v_fig4.gpkg:r
,
ta_v_fig4.gpkg:rb
and ta_v_fig4.gpkg:rbg
ta_v
layer by
the ID_2
attribute:
ta_v_fig4.gpkg:ta_v_fig4
orac
(original
adaptive capacity) using the field calculator and CASE
statements, choosing break points that classify most traditional
authorities correctly.orac
attribute
for any mis-classified area.ta_v_fig4.gpkg:fig4_errors
.
Other areas are coded as follows:code | description |
---|---|
-3 | polygon too small to discern color or pattern fill |
-2 | white fill not matching any legend item |
-1 | pattern fill for “missing DHS data” |
1 | lowest adaptive capacity |
2 | … |
3 | … |
4 | highest adaptive capacity |
Load digitized figure 4 data and display counts of results. Convert
all forms of missing data to NA
to be excluded from mapping
and statistics. Join original figure 4 adaptive capacity results to
ta_v
.
orac | n |
---|---|
-3 | 3 |
-2 | 30 |
-1 | 3 |
1 | 38 |
2 | 56 |
3 | 72 |
4 | 37 |
Map original figure 4.
Calculate and map difference between the two maps.
##
## 1 2 3 4
## 1 34 27 6 0
## 2 4 26 44 5
## 3 0 0 19 29
## 4 0 0 0 3
##
## Spearman's rank correlation rho
##
## data: ta_v$rpac_class and ta_v$orac
## S = 268637, p-value < 2.2e-16
## alternative hypothesis: true rho is greater than 0
## sample estimates:
## rho
## 0.7891711
Create bounding box representing the spatial extent of Malawi. Create
a raster grid frame matching the extent of the bounding box and the
spatial resolution of the drought exposure raster, which is
0.041667
decimal degrees. Although the flood risk raster
has a coarser spatial resolution, visual inspection of the original
figure 5 suggests that the finer spatial resolution of drought exposure
was used for the original analysis.
Convert adaptive capacity to raster grid.
Clip and warp drought exposure to match our extent and spatial resolution.
Create a mask with the adaptive capacity results so that lakes, conservation areas, and traditional authorities with no data will not skew the classification / rescaling of drought exposure. Apply this mask to drought exposure. Masking is our own decision based on intuition: it is not specified in the original publication.
Classify drought exposure into quintile classes (0 to 4) Then rescale to 20% by multiplying by 4.
Clip and warp flood risk to match our extent and spatial resolution.
Mask and rescale flood. Since flood is already on scale from 0 to 4, simply multiply by 5 to achieve the 20% weight.
Calculate livelihood sensitivity indicators from FEWSnet livelihood zone baseline profiles of poor households according to table 2.
Rescale livelihood sensitivity indicators into quantiles.
## pctOwnCrop pctIncWage pctIncCashCrops pctDisasterCope ownCrop
## nbr.val 18.0 18.0 18.0 18.0 18.0
## nbr.null 0.0 0.0 13.0 1.0 1.0
## nbr.na 0.0 0.0 0.0 0.0 0.0
## min 29.4 9.7 0.0 0.0 0.0
## max 88.0 50.3 75.1 71.9 4.0
## range 58.6 40.6 75.1 71.9 4.0
## sum 1059.3 489.6 171.8 236.5 36.0
## median 55.0 24.7 0.0 8.8 2.0
## mean 58.9 27.2 9.5 13.1 2.0
## SE.mean 3.1 2.6 5.3 3.7 0.3
## CI.mean.0.95 6.6 5.5 11.2 7.9 0.6
## var 176.8 121.3 507.8 251.0 1.6
## std.dev 13.3 11.0 22.5 15.8 1.3
## coef.var 0.2 0.4 2.4 1.2 0.6
## wageIncome cashCropIncome disasterCope
## nbr.val 18.0 18.0 18.0
## nbr.null 1.0 1.0 1.0
## nbr.na 0.0 0.0 0.0
## min 0.0 0.0 0.0
## max 4.0 1.2 4.0
## range 4.0 1.2 4.0
## sum 36.0 17.6 36.0
## median 2.0 1.2 2.0
## mean 2.0 1.0 2.0
## SE.mean 0.3 0.1 0.3
## CI.mean.0.95 0.6 0.2 0.6
## var 1.6 0.1 1.6
## std.dev 1.3 0.4 1.3
## coef.var 0.6 0.4 0.6
Calculate aggregate livelihood sensitivity score
## sensitivity
## nbr.val 18.00
## nbr.null 0.00
## nbr.na 0.00
## min 5.88
## max 14.00
## range 8.12
## sum 161.65
## median 8.65
## mean 8.98
## SE.mean 0.56
## CI.mean.0.95 1.18
## var 5.65
## std.dev 2.38
## coef.var 0.26
Convert livelihood sensitivity into raster grid
Based on the original Malcomb et al (2014) study’s Figure 5, the following equation was used to represent vulnerability scores.
\[ Vulnerability = assets + access + livelihoods - exposure \] This contradicts both the original Figure 5’s map as well as the methodology section of the study which states the equation below is equal to the ‘household resilience score.’ We will attempt to reproduce Figure 5 using this equation to demonstrate why the original equation is inconsistent with the original figure.
The presence of negative vulnerability scores suggest an inaccurate figure methodology. This error showcases that the equation on the original Figure 5 does not reflect what the original figure is representing. We will then compare it to the original Malcomb et al. (2014) study’s Figure 5.
Comparing the reproduction of figure 5 with the original figure 5 requires first digitizing the original figure 5 (unclassified choropleth map with yellow to red gradient) in QGIS as follows:
pdf
file using Adobe Acrobat Pro.png
file with pixel
dimensions 1949 by 2811ta_v.gpkg
using WGS 84 geographic coordinates (epsg:4326).
Use linear georeferencing with points in ...
ta_capacity.tif
raster to vector polygonszonal statistics
georef_bg.gpkg
.To approximate data values from the yellow to red gradient of the original map, the blue and green bands are then added, inverted, and rescaled to a range from 0 to 100.
##
## Spearman's rank correlation rho
##
## data: vulnerability_p$orv and vulnerability_p$rpv
## S = 1.1195e+10, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.2677035
Note: The vulnerability difference has a range of +100 to -100 and the Spearman’s Rho correlation is negative.
Since Figure 5’s original equation is not consistent with the figure, we will use a new equation that more accurately represents the figure. We will calculate an aggregated vulnerability score by adding low adaptive capacity (invert adaptive capacity by subtracting from the maximum score of 40), livelihood sensitivity, drought exposure, and flood risk. This equation was constructed using a common equation among outside literature including the IPCC (Intergovernmental Panel on Climate Change).
\[ Vulnerability = (40 - Adaptive Capacity) + Livelihood Sensitivity + Drought Exposure + Flood Risk \]
##
## Spearman's rank correlation rho
##
## data: vulnerability_p$orv and vulnerability_p$rpv
## S = 7087504387, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.1974578
Map differences in Figure 5
Note: The vulnerability difference has a range of +84 to -84 and the Spearman’s Rho correlation is positive.
For the reproduction of Figure 4, we could not perfectly reproduce the original figure. The reproduced figure had differences ranging from +1 to -2 in adaptive capacity.
For the reproduction of Figure 5, we also could not perfectly reproduce the original figure. The reproduced figure had differences ranging from +84 to -84 in vulnerability scores.
Neither Spearman’s Rho Correlation tests of the reproduced figures showcased a perfect positive correlation. Therefore, we fail to reject our two working hypotheses, indicating that our work does not match those of the original authors.
While loading the metadata from the original Malcomb et al. (2014) study, we found that the traditional authority (TA) data erroneously included some populated areas of land in water body features. We then extracted the features and found areas of Lake Malawi that are actually land by buffering lakes by 500 meters and clipping the Lake Malawi TA features. Then we calculated new unique second level ID’s as 1000 times the row number and removed splinter polygons by selecting polygons over 4 km^2 with centroids intersecting livelihood zones. Lastly, we merged the fixed errors in the TA data back into the rest of the data.
UNEP Global Risk Data used in the original Malcomb et al. (2014) study could not be accessed online; therefore we used data provided within the research compendium such as loading the aggregated public adaptive capacity data.
In order to replicate Figure 4 and match the rescaled range of 0 to 40, we supposed that the original authors rescaled with the 40% weight of adaptive capacity in the vulnerability model. We multiplied the theoretical domain of 0 to 1.6 by 25 to achieve a possible domain of 0 to 40.
In order to replicate Figure 5, we rasterized adaptive capacity. We created a mask with the adaptive capacity results so that lakes, conservation areas, and traditional authorities with no data would not skew the classification / rescaling of drought exposure.
Additionally, we clipped, warped, classified (into quintile classes), and rescaled the variables drought exposure and flood risk.
For the livelihood and sensitivity score, we calculated the indicators from FEWSnet livelihood zone baseline profiles of poor households according to table 2, rescaled to quintiles, calculated aggregate scores, and converted to a raster grid.
When loading the metadata for the reproduction, we created visualizations of the lakes, livelihood zones, and the traditional authorities. Additionally, visualizations helped to identify that the (TA) data includes conservation areas and water bodies that do not contain populated villages.
In order to compare the reproduced Figure 4 and the original Figure 4, we calculated a map difference between the two maps. The comparative map showcased differences ranging from +1 to -2. Additionally, we conducted a Spearman’s Rho correlation test, and the results did not showcase a perfect positive correlation.
We then demonstrated the original study’s inconsistent equation presented on Figure 5. By recreating the Figure 5 with the original (wrong) equation, we showcased why the equation presented was not used to make the original figure.
For both the ‘wrong’ and ‘fixed’ equation reproductions of Figure 5, we calculated Spearman’s Rho correlation tests, plotted the points in relation to the original Figure 5, and plotted a difference map between the reproduction and the original. The vulnerability difference had a range of +100 to -100 and the Spearman’s Rho correlation was negative for the ‘wrong’ equation. This signifies that not only did the reproduction not match, but the results were negatively correlated. The vulnerability difference had a range of +84 to -84 and the Spearman’s Rho correlation was positive for the ‘fixed’ equation. Though the results were positively correlated, they were not perfectly positive correlated and did not match the original study.
We found that neither of the reproduced figures perfectly matched those of the original study. This result suggests that we could not reproduce the original figures from the Malcomb et al (2014) study.
Malcomb et al’s (2014) attempt to construct a social vulnerability model surrounding populations throughout Malawi is a critical feat of research, especially as one of the first sub-national geographic climate change vulnerability models for a developing country. As a study that is so crucial to the development of the social vulnerability model as a tool for vulnerable populations, Malcomb et al. would benefit from greater reproducibility and reflect greater transparency and clarity. This reproduction study’s inability to reproduce Figures 4 and 5, suggests an unclear or ambiguous definition of ‘vulnerability’ within the wider literature and the Malcomb et al. (2014) study. In order to improve this study as well as the wider literature surrounding social vulnerability models, the definition of ‘vulnerability’ should be uniform and clear.
The increased transparency in this study as well as other studies can help to improve reproductions. This will strengthen social vulnerability models which serve an important social and ethical role in how we address the most at-risk populations throughout the world.
Modifications to the reproduction of Malcomb et al (2014) focused on the addition of a results, discussion, conclusion and rationale for the updated report. Additionally, this new reproduction also more directly addressed the inconsistent labeling and equation in the original study.
The addition of comprehensive results, discussion, conclusion, and rationale section promotes a better understanding of the reproduction, and the study will have a more clear and organized analysis for future reproductions
The addition of figures and annotation improves the reproduction’s explanations of the discrepancies and inconsistencies of the figures in the original study. The addition of figures and annotations are accompanied by additions to the discussion section to strengthen the clarity and the transparency of the reproduction.