This section presents the proposed people-centered decision-making framework for future urban infrastructure expansion. The framework comprises two main calculation stages, as it couples infrastructure modeling (first stage) with a bespoke ABM (second stage) that accounts for implications of variations in infrastructure expansion on dynamic land values and related residential location decision making. The “Infrastructure modeling” section reviews infrastructure performance modeling that underpins the first stage (and the performance-oriented expansion as described in the “Results” section). The “Agent-based model for residential location decision making” section describes the ABM that represents the second stage. The “Holistic infrastructure development” section presents the end-to-end framework that integrates the two calculation stages and facilitates the holistic expansion described in the “Results” section.
Infrastructure modeling
This section briefly reviews the mathematical formulation for modeling the time-varying performance of infrastructure using graph theory12,14. Graphs are mathematical structures representing the pairwise relations between objects called nodes (points or vertices) via edges (arcs, lines, or links.) We define a graph as \(G=(V,E)\), where \(V\) is the set of nodes and \(E\) is the set of edges. Networks are graphs in which the nodes and edges also possess additional attributes like names, types, and state variables14.
Infrastructure is represented as a collection of networks, where each network captures a specific feature or function of the infrastructure14. The collection of all networks is written as \({{{\mathcal{G}}}}=\{{G}^{[k]}=\left({V}^{[k]},{E}^{[k]}\right):k=1,\ldots ,K\}\), where superscript \([k]\) is the feature or function captured by \({G}^{[k]}\). The state of each network is quantified at any given time using a unique set of vectors \([{{{{\bf{C}}}}}^{\left[k\right]}(t),{{{{\bf{D}}}}}^{\left[k\right]}(t),{{{{\bf{S}}}}}^{\left[k\right]}(t)]\) that represent the basic performance measures for \({G}^{[k]}\) – i.e., (i) capacity measures \({{{{\bf{C}}}}}^{\left[k\right]}(t)\), (ii) demand measures \({{{{\bf{D}}}}}^{\left[k\right]}(t)\), and (iii) supply measures \({{{{\bf{S}}}}}^{\left[k\right]}(t)\) – and are used to compute an overall performance measure \({{{{\bf{Q}}}}}^{\left[k\right]}(t)\) of \({G}^{[k]}\). In general, \({{{{\bf{C}}}}}^{\left[k\right]}(t)\), \({{{{\bf{D}}}}}^{\left[k\right]}(t)\), and \({{{{\bf{S}}}}}^{\left[k\right]}(t)\) depend on a set of variables \({{{{\bf{x}}}}}^{\left[k\right]}(t)\) that describe the dynamic state of infrastructure accounting for deterioration41 or repair42, for instance.
We derive an overall infrastructure performance measure \(Q(t)\) as an aggregate of the underlying network performances \({{{{\bf{Q}}}}}^{\left[k\right]}(t)\) that can be determined in various ways depending on the infrastructure of interest. For example, \(Q(t)\) could be estimated from a topology-based approach in the case of road infrastructure43 or from a flow-based approach in the case of potable water infrastructure44.
\({\mathfrak{R}}\left[Q\left(t\right)\right]\) denotes some specific societal implication of the infrastructure performance, e.g., some measurement of distance between households to locations of interest related to a specific road infrastructure. In addition, \({\mathfrak{R}}\left[Q\left(t\right)\right]\) can be disaggregated based on socioeconomic factors (e.g., income, age, gender) to capture higher resolution effects of infrastructure performance (or non-performance) across different population segments.
Agent-based model for residential location decision making
This section describes the ABM for residential location decision making45,46, which is structured following Gamal et al.47. The ABM includes agents (buyers and sellers) that interact in a spatial context, where each agent represents one household. Here, buyers correspond to renters of residential units and sellers correspond to owners of residential units.
Following Alonso36, the benefit an agent gains from (or an agent’s attractiveness to) a residential unit is quantified using a utility-based approach, where utility is a function of residential unit attributes (e.g., distance from a specific location, access to water supply or sanitation infrastructure) and the individual agent’s unique preferences towards such attributes. Utility is expressed as
$${U}_{r,i}={\sum }_{j=1}^{n}{\alpha }_{i,j}\cdot u({\lambda }_{r,j})$$
(1)
Where \({U}_{r,i}\) is the total utility of residential unit \(r\) for the \({i}^{{th}}\) agent, \({\lambda }_{{r,j}}\) is a measurement of the \({j}^{th}\) attribute, \(u({\lambda }_{r,j})\) is an objective representation of the benefit associated with the \({j}^{{th}}\) attribute, \({\alpha }_{i,j}\) is the weight representing the subjective preference of the \({i}^{{th}}\) agent towards attribute \(j\), and \(n\) is the total number of attributes considered. \(u({\lambda }_{r,j})\) is written as
$$u({\lambda }_{r,j})=\left\{\begin{array}{c}\frac{{\lambda }_{r,j}}{{\max }_{r}\left({\lambda }_{r,j}\right)}{{{\rm{if}}}}\,{\lambda }_{r,j}\,\in \Lambda \\ 1-\frac{{\lambda }_{r,j}}{{\max }_{r}\left({\lambda }_{r,j}\right)}{{{\rm{otherwise}}}}\end{array}\right.$$
(2)
where \(\Lambda\) is the set of desirable attributes.
Agents are distinguished between buyers, \(b\) and sellers, \(s\), i.e., \(i\in \{b,{{{\rm{s}}}}\}\). The maximum price \(b\) would pay for a residential unit is based on \({U}_{r,i}\); units with higher \({U}_{r,i}\) values (therefore higher \({\lambda }_{r,j}\) and/or \({\alpha }_{i,j}\)) will yield a higher willingness to pay. Accordingly, the \({b}^{{th}}\) buyer’s willingness to pay for the \({r}^{{th}}\) residential unit, \({{{\rm{WT}}}}{{{{\rm{P}}}}}_{r,b}\), is written as
$${{{\rm{WT}}}}{{{{\rm{P}}}}}_{r,b}=\frac{{H}_{r,b}\cdot {U}_{r,b}^{2}}{{\beta }_{r,b}+{U}_{r,b}^{2}}$$
(3)
where \({H}_{r,b}\) is the available budget of the \({b}^{{th}}\) buyer, which can be expressed as a raw monetary value or a relative purchase capacity48, and \({\beta }_{r,b}\) is a parameter controlling the convexity of \({{{\rm{WT}}}}{{{{\rm{P}}}}}_{r,b}\) that reflects the risk appetite of the buyer. The range of \({\beta }_{r,b}\) is the same as that of \({U}_{r,b}\), where high values of \({\beta }_{r,b}\) indicate risk-averse behavior and low values indicate risk-taking behavior.
The rental price of the \({r}^{{th}}\) residential unit set by the \({s}^{{th}}\) seller, \({P}_{r,s}\) is based on the benefit of the unit to the seller \({U}_{r,s}\), and is expressed as
$${P}_{r,s}=\frac{{H}_{r,s}.{U}_{r,s}^{2}}{{\beta }_{r,s}+{U}_{r,s}^{2}}$$
(4)
where \({H}_{r,s}\) is the buyer’s budget as perceived by the seller, analogous to \({H}_{r,b}\), and \({\beta }_{r,s}\) is analogous to \({\beta }_{r,b}\,\).
Modeling details
The ABM represents agents’ behaviors in the form of relocations, which occur when a buyer can no longer afford to pay their rent (i.e., \({P}_{r,s} > {{{\rm{WT}}}}{{{{\rm{P}}}}}_{r,b}\)). Relocations are triggered by changes in \({P}_{r,s}\) and/or \({{{\rm{WT}}}}{{{{\rm{P}}}}}_{r,b}\), which are the result of changes in \({\lambda }_{r,j}\) that ultimately stem from the infrastructure development process. The number of triggered relocations \(\varepsilon\) due to changes in (expansions of) the infrastructure (i.e., the number of times \({P}_{r,s} > \,{{{{\rm{WTP}}}}}_{r,b}\)) can be considered a proxy for gentrification and represents the unintended socioeconomic consequences of infrastructure development in this study.
Residential location decision making is modeled by assigning a choice of \({\theta }_{N}^{* }\) vacant residential units to relocating renters in their current neighborhood \(N\). Relocating renters move to the closest (vacant) residential unit \({r}^{* }\) within \({\theta }_{N}^{* }\) that satisfies the following conditions: (i) \({{{\rm{WT}}}}{{{{\rm{P}}}}}_{{r}^{* },b}{\ge}{P}_{{r}^{* },{{{\rm{s}}}}}\) and (ii) \({U}_{{r}^{* },b}{\ge}{U}_{b}^{* }\), where \({U}_{b}^{* }\) is the average utility of the \({\theta }_{N}^{* }\) residential units. If none of the \({\theta }_{N}^{* }\) residential units satisfy these conditions, an alternative neighborhood within the urban area is randomly selected and the same process of identifying a satisfactory residential unit (or another alternative neighborhood) is repeated. If no satisfactory residential unit is found, the relocating renters emigrate from the urban system and the affected household is no longer considered in the analysis.
Holistic infrastructure development
This section presents the proposed end-to-end approach for achieving a holistic infrastructure expansion that is both performance-oriented (risk-informed) and accounts for unintended socioeconomic consequences of infrastructure development. First, we leverage the theory of the “Infrastructure modeling” section to formulate the performance-oriented expansion as an optimization problem, where the objective function accounts for infrastructure performance during day-to-day operations, in the immediate aftermath of a disrupting event (i.e., response phase), and during the long-term recovery phase. Then, we discuss the approach for solving the optimization problem while accounting for unintended consequences that are quantified using the utility-based residential location ABM in the “Agent-based model for residential location decision making” section.
Mathematical formulation
The objective function is expressed as
$${{{\rm{Z}}}}={\max}\,{\mathbb{E}}\left[\left({{\gamma }_{1}\cdot Z}_{1}+{{\gamma }_{2}\cdot Z}_{2}+{{\gamma }_{3}\cdot Z}_{3}\right)\right]\,$$
(5)
where \({\mathbb{E}}\left[\cdot \right]\) is the expected value operator, \({\gamma }_{1}\), \({\gamma }_{2}\), and \({\gamma }_{3}\) are weights that respectively control the relative importance of infrastructure performance on a day-to-day basis (\({Z}_{1}\)), during the immediate post-hazard response period (\({Z}_{2}\)), and the longer term recovery phase (\({Z}_{3}\)). \({\gamma }_{1}\), \({\gamma }_{2}\), and \({\gamma }_{3}\) values are defined in consultation with relevant stakeholders, in a participatory, people-centered approach to risk-informed decision making. \({Z}_{1}\) is written as
$${Z}_{1}=\frac{1}{{n}_{a}}{\sum}_{\begin{array}{c}a=1\end{array}}^{{n}_{a}}{\omega }_{a}\frac{1}{{N}_{{{{\rm{H}}}}}}{\sum}_{i=1}^{{N}_{{{{\rm{H}}}}}}{w}_{i}{{\mathfrak{R}}}_{i,a}\left[Q\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)\right]$$
(6)
where \({n}_{a}\) is the number of considered infrastructure needs (types), \({{{{\rm{\omega }}}}}_{a}\) is the weight (priority) placed on the \({a}^{{th}}\) infrastructure need, \({N}_{{{{\rm{H}}}}}\) is the number of household agents in the community, and \({w}_{i}\) is the weight (priority) placed on meeting the \({i}^{{th}}\) household’s infrastructure needs. \({{\mathfrak{R}}}_{i,a}\left[Q\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)\right]\) describes a specific implication of infrastructure performance at household-level during \({t}_{{0}^{-}}\) (before the occurrence of the hazard event) and \({{{\bf{g}}}}\) is the set of \({G}^{\left[k\right]}\) to be added as part of the infrastructure expansion. For example, in the case of a topology-based analysis of road infrastructure that is used for accessing hospitals, schools, and workplaces, \({{\mathfrak{R}}}_{i,a}\left[Q\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)\right]\) for the ith household is written as
$${{\mathfrak{R}}}_{i,a}\left[Q\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)\right]=\frac{{\eta }_{i,a}^{\left(H\right)}\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)}{{\eta }_{i,a}^{* }\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)}=\frac{1}{N\left(i\right)}{\sum}_{m=1}^{N\left(i\right)}\frac{{d}_{i,a(m)}^{* }}{{d}_{i,a\left(m\right)}}$$
(7)
where \(N(i)\) is the number of individuals in household \(i\) that have infrastructure need \(a\) (i.e., access to a hospital, school, or workplace) and \({d}_{i,a\left(m\right)}\) is the distance from the residence of household agent \(i\) to the activity (location) of interest of the \({m}^{{th}}\) individual in household \(i\). \({\eta }_{i,a}^{* }\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)\) is a reference value for normalizing \({\eta }_{i,a}^{\left(H\right)}\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)\) such that \({d}_{i,a(m)}^{* }\) is a corresponding value in terms of road distance, enabling each component of the objective function in Eq. (5) to be added together. In the limiting case when the distance between origin and destination is infinity (i.e., the destination is unreachable), \({\eta }_{i,a}^{\left(H\right)}({t}_{{0}^{-}},{{{\bf{g}}}})=0\).
\({Z}_{2}\) is expressed as
$${Z}_{2}=\frac{1}{{n}_{a^{\prime} }}{\sum}_{\begin{array}{c}a^{\prime} =1\end{array}}^{{n}_{a^{\prime} }}{\omega }_{{a}^{{\prime} }}\frac{1}{{N}_{{{{\rm{H}}}}}}{\sum}_{\begin{array}{c}i=1,\\ p\left(i\right)\subseteq i\in {\Omega }_{{a}^{{\prime} }}\end{array}}^{{N}_{{{{\rm{H}}}}}}{w}_{i}{{\mathfrak{R}}}_{i,{a}^{{\prime} }}\left[Q\left({t}_{{0}^{+}},{{{\bf{g}}}} \right) \right]$$
(8)
where \({\omega }_{a^{\prime} }\) is the weight of the \({{a}^{\prime} }^{{th}}\) infrastructure need in the response phase \({t}_{{0}^{+}}\), \(p\left(i\right)\subseteq i\in {\Omega }_{{a}^{\prime} }\) identifies the individuals (in household \(i\)) associated with the \({{a}^{\prime} }^{{th}}\) infrastructure need, and \({{\mathfrak{R}}}_{i,{a}^{\prime} }\left[Q\left({t}_{{0}^{+}},{{{\bf{g}}}}\right)\right]\) describes some aspect of infrastructure performance in \({t}_{{0}^{+}}\). For example in the case of road infrastructure, \({a}^{\prime}\) may refer to accessing hospitals (for immediate treatment) or shelters (if there is post-event dislocation) and \(p\left(i\right)\subseteq i\in {\Omega }_{{a}^{\prime} }\) would define the individuals in the \({i}^{{th}}\) household that are either injured or displaced. In the context of using a topology-based approach to measure the performance of road infrastructure, \({{\mathfrak{R}}}_{i,{a}^{\prime} }\left[Q\left({t}_{{0}^{+}},{{{\bf{g}}}}\right)\right]\) is defined as
$${{\mathfrak{R}}}_{i,a^{\prime} }\left[Q\left({t}_{{0}^{+}},{{{\bf{g}}}}\right)\right]=\frac{{\eta }_{i,a^{\prime} }^{\left(H\right)}({t}_{{0}^{+}},{{{\bf{g}}}})}{{\eta }_{i,a^{\prime} }^{\left(H\right)}({t}_{{0}^{-}},{{{\bf{g}}}})}$$
(9)
capturing the household’s increase in distance to each location of interest at \({t}_{{0}^{+}}\) compared to \({t}_{{0}^{-}}\).
\({Z}_{3}\) is expressed as
$${Z}_{3}=\frac{1}{{T}_{{{{\rm{R}}}}}}{\sum}_{\tau ={t}_{{0}^{+}}}^{{T}_{{{{\rm{R}}}}}}\frac{1}{{n}_{a}}{\sum}_{\begin{array}{c}a=1\end{array}}^{{n}_{a}}{\omega }_{a}\frac{1}{{N}_{{{{\rm{H}}}}}}{\sum}_{i=1}^{{N}_{{{{\rm{H}}}}}}{w}_{i}{{\mathfrak{R}}}_{i,a}\left[Q\left(\tau ,{{{\bf{g}}}}\right)\right]$$
(10)
where \({T}_{{{{\rm{R}}}}}\) represents the time at which recovery activities are completed, and \({{\mathfrak{R}}}_{i,a}\left[Q\left(\tau ,{{{\bf{g}}}}\right)\right]\) is a dynamic measure of some aspect of infrastructure performance related to the ith household during the recovery process. In the case of a topology-based analysis of road infrastructure, \({{\mathfrak{R}}}_{i,a}\left[Q\left(\tau ,{{{\bf{g}}}}\right)\right]\) is expressed as
$${{\mathfrak{R}}}_{i,a}\left[Q\left(t,{{{\bf{g}}}}\right)\right]=\frac{{\eta }_{i,a}^{\left(H\right)}(t,{{{\bf{g}}}})}{{\eta }_{i,a}^{\left(H\right)}({t}_{{0}^{-}},{{{\bf{g}}}})}$$
(11)
and is analogous to \({{\mathfrak{R}}}_{i,{a}^{{\prime} }}\left[Q\left({t}_{{0}^{+}},{{{\bf{g}}}}\right)\right]\), where the locations of interest are the same as those captured by \({{\mathfrak{R}}}_{i,a}\left[Q\left({t}_{{0}^{-}},{{{\bf{g}}}}\right)\right]\) in Eq. (7). Note that while \({{\mathfrak{R}}}_{i,{a}^{* }}{{{\boldsymbol{[}}}}{{{\boldsymbol{.}}}}{{{\boldsymbol{]}}}}\) have been described in terms of a topology-based approach for quantifying road infrastructure performance, the proposed formulation is general enough for application to any infrastructure and performance measurement approach44,49.
The constraints of the optimization are now presented. The first constraint is
$${C}_{{{{\rm{p}}}}}\le {M}_{{{{\rm{p}}}}}$$
(12)
where \({C}_{{{{\rm{p}}}}}\) is the cost of implementing a specific infrastructure expansion and \({M}_{{{{\rm{p}}}}}\) is the budget allocated to the infrastructure development process. The resulting \({G}^{\left[k\right]}\) must be a connected network, i.e.,
$$\forall {v}_{1},{v}_{2}\in {V}^{\left[k\right]},\exists \, \varphi ({v}_{1},{v}_{2})\,$$
(13)
where \({v}_{1}\) and \({v}_{2}\) are two generic nodes in \({V}^{\left[k\right]}\), and \(\varphi ({v}_{1},{v}_{2})\) is a path between them. The degree of nodes in the resulting \({G}^{\left[k\right]}\) must be less than a specified threshold to avoid impracticable infrastructure expansion (e.g., an intersection created by more than four roads in the case of road infrastructure), written as
$$\begin{array}{c}\Delta \left({G}^{\left[k\right]}\right)\le \end{array}\delta \left({G}^{\left[k\right]}\right)$$
(14)
where \(\Delta \left({G}^{\left[k\right]}\right)\) is the maximum degree of \({G}^{\left[k\right]}\) and \(\delta \left({G}^{\left[k\right]}\right)\) is a corresponding specified threshold. Additional non-negative constraints are
$${\omega }_{a}\ge 0,\forall a$$
(15)
$${w}_{i}\ge 0,\,\forall i$$
(16)
$${\omega }_{a^{\prime} }\ge 0,\,\forall a^{\prime}$$
(17)
$${\gamma }_{1},{\gamma }_{2},{\gamma }_{3}\ge 0$$
(18)
Weights \({\omega }_{a}\), \({w}_{i}\), \({\omega }_{{a}^{{\prime} }}\), and \({\gamma }_{1}\), \({\gamma }_{2}\) and \({\gamma }_{3}\) must sum to one, written as
$${\sum}_{a=1}^{{n}_{a}}{\omega }_{a}=1$$
(19)
$${\sum}_{i=1}^{{N}_{{{{\rm{H}}}}}}{w}_{i}=1$$
(20)
$${\sum}_{a^{\prime} =1}^{{n}_{a^{\prime} }}{\omega }_{a^{\prime} }=1$$
(21)
$${\gamma }_{1}+{\gamma }_{2}+{\gamma }_{3}=1$$
(22)
The final constraint of the optimization is
$$\frac{{\eta }_{k,a}^{\left(H\right)}({\tau }^{* },{{{\bf{g}}}})}{{\eta }_{k,a}^{\left(H\right)}({t}_{{0}^{-}},{{{\bf{g}}}})}\ge \xi \left({\tau }^{* }\right),\,\forall {\tau }^{* }$$
(23)
where \(\xi \left({\tau }^{* }\right)\) represents a lower threshold for infrastructure performance at time \({\tau }^{* }\), facilitating a requirement for the infrastructure to be restored to pre-hazard performance levels within a certain period from the occurrence of the hazard event.
Obtaining the final holistic expansion
The formulation described in the “Mathematical formulation” section can be classified as a combinatorial optimization problem, where the optimal infrastructure layout (topology) results from a finite set of possible infrastructure interventions, i.e., added edges, such as new roads. Figure 1 summarizes the workflow that is used to solve for the final holistic infrastructure expansion.
First, an augmented infrastructure layout is defined that includes the existing infrastructure layout and the full set of potential (candidate) edges for development. Several procedures can be used to obtain the augmented layout, such as (i) identifying bespoke candidate edges on a context-specific basis in consultation with relevant stakeholders and manually digitizing them in a geographic information system, (ii) using digitized geospatial data to define a regular grid of points and finding the least cost paths among them or (iii) using a fully-automated interactive procedural modeling approach based on tensor field theory50. Then, the subset of candidate edges to be added to the existing infrastructure maximizes the value of \(Z\) in Eq. (5) and satisfies the constraints outlined in the “Mathematical formulation” section.
However, there are computational complexities associated with this optimization problem, including nonlinearity, nonconvexity, and non-differentiability of \(Z\) in Eq. (5). Consequently, an exhaustive search is not feasible, and a heuristic approach must be used to obtain a near-optimal solution instead. Our proposed heuristic approach involves a simulated annealing-based metaheuristic procedure. Other metaheuristics, like genetic algorithms, ant colony systems, and tabu searches, could be used instead, but they typically perform worse than simulated annealing for this type of optimization problem51,52,53.
The search begins by randomly selecting a subset of candidate edges that satisfy the constraints of the “Mathematical formulation” section and using the corresponding value of \(Z\) as the initial optimization solution. The simulated annealing-based metaheuristic procedure then maximizes \(Z\) by applying small (random) changes to the decision variable \({{{\bf{g}}}}\), i.e., the edges to be added from the full candidate list. Each small perturbation involves randomly selecting one of the following actions: add, remove, or replace. If add is selected, a new edge candidate is randomly added to the current subset from the full candidate list. If remove is selected, a random candidate is removed from the current subset. If replace is selected, a candidate from the current subset is randomly substituted with a new candidate selected from the full set of candidates. If a neighboring solution (resulting from the perturbation) improves the value of \(Z\), a further search starts in the neighborhood of this point. If an improved solution cannot be found, the current solution is accepted with a certain probability, i.e., \(\exp (-Z/T)\) where \(T\) is one of the hyperparameters of the optimization search algorithm, typically known as the temperature. Infeasible solutions that violate the constraints of the “Mathematical formulation” section are avoided by adding a dynamic penalty function to the solution of the objective function54, such that Eq. (5) is rewritten as
$${Z}^{{\prime} }=\left\{\begin{array}{cc}Z\hfill\hfill & {{{\rm{if}}}}{{{\bf{g}}}}\in {\Omega }_{f}\,\\ Z+P\left({{{\bf{g}}}}\right) & {{{\rm{otherwise}}}}\end{array}\right.$$
(24)
where \({\Omega }_{f}\) represents the set of feasible solutions and the penalty function \(P\left({{{\bf{g}}}}\right)\) is introduced when any constraint of the “Mathematical formulation” section is violated.
At the end of the search, the optimization results are compiled in a list of infrastructure development layouts (expansions) ranked in terms of \(Z\) (equivalent to the optimized infrastructure set shown in Fig. 1). The performance-oriented expansion is then the highest-ranked layout of the set. Changes in the infrastructure expansion lead to variations in \({\lambda }_{r,j}\) and therefore \(u({\lambda }_{r,j})\) in Eq. (2), which ultimately produce changing values of \(\varepsilon\). The final holistic infrastructure expansion is the one with the highest value of \(Z\) that also satisfies \(\varepsilon \le {\varepsilon }_{{{{\rm{T}}}}}\), where \({\varepsilon }_{{{{\rm{T}}}}}\) is a pre-determined, end-user-specific acceptable level of unintended consequences (i.e., triggered relocations).
Data description
Case study
We use the proposed framework for designing an expansion of the road infrastructure in the 500-ha virtual urban testbed of Tomorrowville. Tomorrowville was designed to represent a typical Global South urban setting based on Nairobi (Kenya) and Kathmandu (Nepal) data55. The testbed is a geospatial database of urban features that includes information on land use, building and infrastructure (physical) characteristics, household (social) characteristics (such as income levels), and individual (social) characteristics, as well as detailed data on each person’s daily infrastructure needs, which include access (proximity) to hospitals, schools, and workplaces. We use TV0 in this study, which represents the current urban layout of Tomorrowville (more details can be found in Menteşe et al.55). TV0 contains a total of 4810 buildings and 7809 households (all assumed to be renters), of which 4236 are low-income, 1705 are mid-income, and 1868 are high-income. The network representing the existing road infrastructure of TV0 contains 1128 edges and 999 nodes.
Data for performance-oriented infrastructure expansion
The hypothetical stakeholders are assumed to (i) consider infrastructure need in terms of accessibility (proximity) to hospitals (\(h\)), schools (\(e\)), and workplaces (\(l\)), (ii) regard each infrastructure need as being equally important regardless of time, (iii) place equal importance on day-to-day and immediate post-hazard infrastructure performance but to not consider the long-term recovery process when making infrastructure expansion decisions; and (iv) hold a pro-poor vision on future urban expansion, in line with the latest thinking on disaster risk management and assessment56,57 as well as the guiding principles of the Sendai Framework for Disaster Risk Reduction58. These views are reflected in the following input parameter values used: (i) \({\gamma }_{1}={\gamma }_{2}=0.5\), (ii) \({\gamma }_{3}=0\), (iii) \({{{{\boldsymbol{\omega }}}}}_{a}={{{{\boldsymbol{\omega }}}}}_{{a}^{\prime} }=[1/3,1/3,1/3]\), and (iv) \({{{{\bf{w}}}}}_{i}=[0.7,\,0.2,\,0.1]\), where the vector entries respectively refer to low-income, mid-income, and high-income households. We assume a road is impassible if it is exposed to a flood water height of more than 0.3 m59, which leads to decreased values of \({\eta }_{i,{a}^{\prime} }^{\left(H\right)}({t}_{{0}^{+}},{{{\bf{g}}}})\) and therefore \({{\mathfrak{R}}}_{i,{a}^{\prime} }\left[Q\left({t}_{{0}^{+}},{{{\bf{g}}}}\right)\right]\) in Eq. (9). We further assume \({C}_{{{{\rm{p}}}}}\) is equal to \(\pounds \,{5000}/{{{\rm{m}}}}\), \({M}_{{{{\rm{p}}}}}\) is \(\pounds 70{{{\rm{M}}}}\,\), and \({{{{\rm{\varepsilon }}}}}_{{{{\rm{T}}}}}=750\). The augmented expansion of the road infrastructure is obtained through manual digitization of the candidate edges, hypothetically reflecting the outcome of a conversation with potential stakeholders. The resulting augmented network contains 1740 edges and 1483 nodes.
ABM data for unintended consequence quantification
We assume \({\beta }_{r,b}={\beta }_{r,s}=\,1\) for all agents, reflecting a risk-neutral outlook. \({\lambda }_{r,j}\) represent undesirable attributes, comprising the road distance from each household’s residential unit to hospitals \({\lambda }_{r,1}\), schools \({\lambda }_{r,2}\), and workplaces \({\lambda }_{r,3}\) (only for households with working individuals, such that \(n=2\) or \(u\left({\lambda }_{r,3}\right)=0\) otherwise), in line with the infrastructure needs previously identified. Note that these distances assume normal day-to-day infrastructure performance, such that a household’s willingness to pay for a residential unit does not account for natural-hazard-induced travel disruptions. No desirable attributes are considered. We further assume \({\alpha }_{b,1} \sim {{{\rm{Uniform}}}}\left(0,1\right)\), \({\alpha }_{b,2} \sim {{{\rm{Uniform}}}}\left(0,1\right)\), \({\alpha }_{b,3} \sim {{{\rm{Uniform}}}}\left(0,1\right)\), \({\alpha }_{s,1} \sim {{{\rm{Uniform}}}}\left(0,1\right)\), \({\alpha }_{s,2} \sim {{{\rm{Uniform}}}}\left(0,1\right)\) and \({\alpha }_{s,3}=0\), i.e., there are generally hetereogenous, independent preferences towards the various considered locations, except owners of residential units do not value the distance of their property (that they do not live in) to work. In the absence of more relevant data, \({H}_{r,i}\) values are based on information collected from Greater Cairo, Egypt47, which is deemed acceptable in this case given that Tomorrowville is designed to represent a general Global South urban setting. These values are quantified in terms of relative purchasing capacity that is measured over a continuum scale based on (i) the relative proportion of wealth/income across households in low-, mid-, and high-income groups47,60 and (ii) the absolute accumulated wealth/income associated with each income group61,62. This means that \({H}_{r,b}\sim {{{\rm{Uniform}}}}({a}_{b},{b}_{b})\) – where \(\left\{{a}_{b},{b}_{b}\right\}=\{0,\,0.9\}\) for low-income households, \(\left\{{a}_{b},{b}_{b}\right\}=\{{0.9, 1.96}\}\) for mid-income households, and \(\left\{{a}_{b},{b}_{b}\right\}= \{{1.96, 4.1}\}\) for high-income households – and \({H}_{r,s}\sim {{{\rm{Uniform}}}}({a}_{s},{b}_{s})\) – where \(\left\{{a}_{s},{b}_{s}\right\}=\{0,\,0.93\}\) for low-income households, \(\left\{{a}_{s},{b}_{s}\right\}=\{{0.93, 1.8}\,\}\) for mid-income households, and \(\left\{{a}_{s},{b}_{s}\right\}=\{{1.8,2.5}\}\) for high-income households – such that \({a}_{i}\) and \({b}_{i}\) represent a scaled ratio between the minimum/maximum income of a given income and agent group and the maximum income of the richest corresponding group. We perfom Monte Carlo sampling of the probability distributions to produce 100 sets of each uncertain input variable per household and compute an expected value of \(\varepsilon\), \(E(\varepsilon )\), which is then used to determine the final holistic solution (replacing \(\varepsilon\)).
Flood hazard model
The hazard event considered is similar to the 25-year mean return period pluvial fluvial flooding event presented in Jenkins et al.63. The flood simulations are generated using CAESAR-Lisflood, a model that combines the Lisflood-FP hydrological and surface flow model64 with the CAESAR landscape evolution model65. The discharge and rainfall time series are generated using moderate to peak daily data based on the Department of Hydrology and Meteorology, Nepal records, such that simulations are consistent with the Tomorrowville topography. More details on flood modeling for Tomorrowville can be found in Jenkins et al.63.

