Cadernos de Saúde Pública
ISSN 16784464
36 nº.9
Rio de Janeiro, Setembro 2020
ARTIGO
A correspondência entre a estrutura da rede de mobilidade terrestre e a propagação da COVID19 no Brasil
Vander Luis de Souza Freitas, Thais Cláudia Roma de Oliveira Konstantyner, Jeferson Feitosa Mendes, Cátia Souza do Nascimento Sepetauskas, Leonardo Bacelar Lima Santos
http://dx.doi.org/10.1590/0102311X00184820
COVID19; Vigilância em Saúde Pública; Epidemias
Introduction
The world is currently facing a global public health emergency due to the COVID19 pandemic, declared on March 11th, 2020 by the World Health Organization (WHO) ^{1}. As of June 4th, 2020, more than 6.7 million cases have been confirmed worldwide, with almost 400,000 deaths. In Brazil, the first documented case was in the city of São Paulo on February 26th, 2020. Since then, there are about 615,000 confirmed cases and 34,000 deaths in the national territory ^{2} (Worldometers COVID19: coronavirus pandemic. https://www.worldometers.info/coronavirus/, accessed on 15/May/2020; Ministério da Saúde. Painel coronavírus. https://covid.saude.gov.br/, accesssed on 14/May/2020).
The intercities mobility network serves as a proxy for the transmission network, vital for understanding outbreaks, especially in Brazil, a continentaldimension country ^{3}^{,}^{4}^{,}^{5}^{,}^{6}^{,}^{7}. The complex network approach ^{8} emerges as a natural mechanism to handle mobility data computationally, taking areas as nodes (fixed) and movements between origins and destinations as connections (flows). Some networks' measures can be used to find the structurally more vulnerable areas in the context of the current study. The degree of a node is the number of cities that it is connected to, showing the number of possible destinations. The strength captures the total number of people that travel to (or come from) such places in each time frame. From a probability perspective, the cities that receive more people are more vulnerable to SARSCoV2. The betweenness centrality, on the other hand, considers the entire network to depict the topological importance of a city in the routes that are more likely to be used.
In this context, this work aims to investigate the correspondences of the measures of networks with the emergence of cities with confirmed cases of COVID19 in two scales: Brazil and the State of São Paulo. Specifically, we analyze (i) the Brazilian intercities mobility networks under different flow thresholds to neglect the lowestfrequency travels, especially in the beginning of the outbreak, when the interiorization of the disease was not yet in progress; and (ii) the correspondence between the statistics of networks and the spreading of COVID19 in Brazil.
Methods
The most common mobility data used in studies of this nature in Brazil are the pendular travels, from the 2010 national census ^{9}. In this research, we use the Brazilian Institute of Geography and Statistics (IBGE) roads data from 2016 ^{10}, which contains the flows between cities considering terrestrial vehicles in which it is possible to buy a ticket (mainly buses and vans). This information seeks to quantify the interconnection between cities, the movement of attraction that urban centers carry out for the consumption of goods and services, and the longdistance connectivity of Brazilian cities. The North region is not included in this paper, because neither the fluvial nor the air models are covered, and their roles are crucial to understand the spreading process there, especially in the Amazon region. According to an investigation of seroprevalence of antibodies to SARSCoV2 ^{11}, Northern cities are among the ones with the highest values, and six of them are located along a 2,000km stretch of the Amazon river.
The abovecited IBGE data ^{10} contains the travel frequency (flow) between pairs of Brazilian cities/districts in a general/typical week, considering only origins and destinations, without any information about possible connections between them. The frequencies are aggregated within the round trip, which means that the number of travels from city A to city B is the same as from B to A. We produce two types of undirected networks with a different number N of nodes to capture actions in two scales (country and state):
(1) N = 4,987  Brazil without the North region: nodes are cities and edges are the flow of direct travels between them.
(2) N = 620  São Paulo State: a subset of the previous network, containing only cities within the São Paulo State. For simplicity, no further analysis is performed to evaluate the dependency of the network in relation to the state neighboring cities.
Some cities are not present in our networks, due to a simplification that IBGE does: it groups small neighboring municipalities with almost no flow into single nodes. For simplicity, and considering that such places do not contain cases in the first days of the outbreak, they are not individually accounted for in our analysis.
We focus on two versions of each network for certain flow thresholds η, the η _{0 } (η = 0) that is the original network from the IBGE data and η _{d } (η = d). The d corresponds to the higher flow threshold that produces the network with the largest diameter. The motivation behind η _{d } is to get a threshold high enough to not consider the least frequent connections and to not disregard the most frequent ones ^{6}^{,}^{12}.
We also use COVID19 data from the state daily bulletins and Brazilian Ministry of Health ^{2}, which are reported by place of residence and notification date. It shows that, until June 4th, 2020, the number of cities with at least one confirmed patient with COVID19 is 3,851 in the Brazil without the North region network, which corresponds to 77% of the nodes. The analogous for São Paulo is 535 (86% of the nodes). With this data, we track the response of each measure in detecting vulnerable cities according to the evolution of the virus spreading process, as each city notified the first case.
Complex network measures
The topological degree (k) ^{8} of a node is the number of links it has to other nodes. As here the networks are undirected, there is no distinction between incoming and outgoing edges.
In a connected graph, there is at least one shortest path σ _{vw } between any pair of nodes v and w. The betweenness centrality ^{8} (b) of a node i is the rate of those shortest paths that pass through i:
Although it is a pointwise measure, it considers nonlocal information related to all shortest paths on the network. It is worth highlighting that in this context this centrality index is not a transportation (physical) measure but a mobility (process) one. Besides, both degree and betweenness do not account for the network flows here, but the binary (weightless) networks. The diameter of a network is the distance between the farthest nodes, given by the maximum shortest path.
The strength (s) ^{8} of a node on the other hand is the accumulated flow from incident edges:
In which F _{ij } is the flow between nodes i and j.
Figure 1 Illustration of network measures: strength, degree, and betweenness.

We assess which of the computed measures (s, k, and b) of the mobility networks better approximates the spreading of COVID19 in Brazil. We compare the topranked n cities of each measure with the n cities that contain confirmed cases. We vary n from 1 to the number of cities with confirmed cases to chase the transmission dynamics. In order to verify whether the rate of correspondence between the topranked cities from the networks' measures and the cities with COVID19 cases has statistical significance, we verify what are the results of picking cities at random instead of under the measures' guidance via a hypothesis test with simulated distributions ^{13}. We perform 100,000 simulations for each n, choosing n nodes by chance and monitoring what is the rate of positive cases.
Geographical visualization
A geographical approach for complex systems analysis is especially important for mobility phenomena. Santos et al. ^{14} proposed a graph where the nodes have a known geographical location, and the edges have spatial dependence, the (geo)graph. It provides a simple tool to manage, represent, and analyze geographical complex networks in different domains ^{6}^{,}^{12} and it is used in this work. The geographical manipulation is performed with the PostgreSQL Database Management System (https://www.postgresql.org/) and its spatial extension PostGIS. Lastly, the maps are produced using the Geographical Information System ArcGIS (http://www.esri.com/software/arcgis/index.html).
Results
This section presents the results of the topological analysis for the previously mentioned networks. Under η _{0 } (original flows), the Brazil without the North region network has N = 4,987 nodes, E = 59,453 edges, <s> = 1,169.4 of average strength, <k> = 23.8 of average degree and <b> = 5,219.4 of average betweenness. When the threshold η _{d } (the higher threshold with maximum diameter) is considered, it has E = 2,482, <s> = 414, <k> = 1 and <b> = 1,385.6. The São Paulo State network, on the other hand has N = 620 nodes, E = 4,796 edges, <s> = 1,132.4 of average strength, <k> = 15.5 of average degree and <b> = 504.2 of average betweenness, under η _{0 } . The more restricted version, with η _{d } , possesses E = 486 , <s> = 535, <k> = 1.6 and <b> = 169.9.
Two nodes are connected when between them there is a nonzero flow, which means that the number of connections E decreases for increasing threshold (η). The resulting networks are undirected and, throughout the paper, both the degree and the betweenness measures do not account for the flows, but weightless edges instead. The diameter of the networks for varying η is computed and the higher threshold with maximum diameter is found for both networks: η _{d } = 507.55 for Brazil whitouth North region and η _{d }= 169.9 for São Paulo State.
Following the (geo)graphs approach, it is possible to visualize nodes and edges of the Brazilian mobility network in the geographical space for η _{d } in
Figure 2 Maps of the Brazil without the North region and São Paulo State networks and the topological strength associated to each node/city for thresholds η _{0 } and η _{d } .

Figure 3 Correspondence (rate) between the n top ranked cities for different network criteria: s, k, and b, and cities that have at least one patient with COVID19 in Brazil without the North region and São Paulo State.

According to
Highfrequency oscillations are perceived in
Interestingly, on March 31st, the highfrequency oscillations start to diminish in São Paulo State. A few days further, after April 7th, the betweenness centrality with η _{d } starts to be a bad predictor for Brazil without North region and then for São Paulo State.
Table 1 Cities with at least one case of COVID19 in Brazil (Brazil without the North region) in the order they were documented ^{8}, sidebyside with the topranked cities regarding s, k and b for η _{0 } and η _{d } .

Table 2 Cities with at least one case of COVID19 in the State of São Paulo in the order they were documented ^{8}, sidebyside with the topranked cities regarding s, k and b for η _{0 } and η _{d } .

Discussion
We present a complex networkbased analysis in the Brazilian intercities mobility networks towards the identification of cities that are vulnerable to the SARSCoV2 spreading. The networks are built with the IBGE terrestrial mobility data from 2016 that have the flow of people between cities in a general/typical week. The cities are modeled as nodes and the flows as weighted edges. The geographical graphs, (geo)graphs, are visualized within Geographical Information Systems.
Two scales are investigated, the Brazilian cities without the North region, and the State of São Paulo. The former does not account for the North due to the high number of fluvial routes and some intrinsic local characteristics that are not represented with the terrestrial data. The State of São Paulo is crucial in the ongoing pandemic since the first documented case was in the state capital, and it is currently one of the main focuses of the virus spreading.
Three network measures are studied, namely the strength, degree, and betweenness centrality, under two flow thresholds to account for different mobility intensities, the original flow data and networks with only the edges with higher weights. Other network measures were preliminarily tested, including the weighted version of the betweenness centrality. However, the integrals of the correspondence curves of
Regarding
Due to their importance in mobility, many cities of
Both s and b with η _{0 } pose good results at the beginning of the pandemics for the Brazil without North region network, but s alone started to be the best predictor from the end of April. The most important cities, due to their high flow of travelers and their role in the most used routes, are reached first, followed by those with smaller flows, probably because of the interiorization of the virus  the outbreak reaching the countryside cities. This behavior is even more pronounced in São Paulo State, in which s under η _{d } is the best option at first, neglecting lower flow venues, especially in April, but the η _{0 } started to be the best option from May onwards.
In the ongoing pandemics, from May 1st, the s index with η _{0 } is currently the best predictor and may help to figure out which countryside cities are about to receive new cases. Moreover, it may help in the following waves of the disease. In the case of another pandemic, one could first compute the strength of the networks according to the last updated data from IBGE and identify the topranked cities. In Brazil, it is enough checking on strength at the original data, as we presented, since it produces similar results as the betweenness centrality and is computationally cheaper to obtain. Regarding the State of São Paulo, one better checks on the strength index with threshold η _{d } in the first weeks and only then switch to η _{0 } . As our results show, the correspondence has statistical significance and, along with other information about the regions such as where are the first notified cases, the pandemic could be closely traced.
It is worth mentioning that the COVID19 data comes from the Brazilian Health Surveillance System, which is fed with data provided by each city in the country. The information update is a complex and dynamic process and there may be delays or errors in the data transfer. Moreover, considering the size and heterogeneity of Brazil, it is important to highlight that there are differences in the capacity to detect cases opportunely and in the quality of the information ^{19}. On the other hand, in late January 2020, almost a month before the first Brazilian COVID19, the epidemiological surveillance guidelines and the National Contingency Plan for COVID19 were published. One of the main objectives of these documents was to provide early guidance to the Brazilian Unified National Health System (SUS) service network, to act in the identification of COVID19 cases ^{20}. Besides the limitations in the health surveillance system, there is a lack of information about possible intermediate stops between origins and destinations in the IBGE data, as it gives only the travelers' initial and final positions.
As future work, we intend to analyze fluvial and aerial mobility data as well, as they include valuable information about the transport of people and goods. The former is fundamental to the discussion of the dynamics for the Brazilian North region, especially the Amazon, and the latter captures longrange connections which are relevant in a possible future moment of reemergence of the disease in the country, especially by foreign travelers. Lastly, one could check for correspondences between the measures of networks and data from other epidemics, and analyze control measures based on topological properties associated with the mobility network ^{21}.
Acknowledgments
We would like to thank Jussara R. Angelo (Oswaldo Cruz Foundation, Rio de Janeiro, Brazil) for the valuable discussions.
References
This is an openaccess article distributed under the terms of the Creative Commons Attribution License
Cadernos de Saúde Pública  Reports in Public Health
Rua Leopoldo Bulhões 1480  Rio de Janeiro RJ 21041210 Brasil
Secretaria Editorial +55 21 25982511.
cadernos@fiocruz.br