Mapping the urban world: Integrating high-resolution satellite imagery and night light data

Tracking how an economy allocates resources across space and time is difficult. Reliable data sources, particularly from developing countries, are scarce. Those that do exist (e.g., population censuses) are typically measured at low frequencies, difficult to obtain, hard to compare across countries, and available only for coarse spatial units (e.g., districts or counties). High-resolution satellite imagery offers a new, and potentially groundbreaking, way to track economic activity at a global scale.

This project aims to improve current approaches to chart urban extent across the globe by integrating night light (NTL) data with Landsat 30m resolution satellite images. By applying state-of-the-art machine-learning techniques, we hope to produce a comprehensive global mapping of urbanisation in close to real time. The methodology combines three sets of data: Landsat satellite data is trained on filtered NTL data, and assessed and validated by the largest (to our knowledge) human-labelled ground-truth dataset that marks urban land cover. The NTL and human-labelled data are critical for improving existing remote-sensing techniques to classify urban extent. At present, there are no reliable ground-truth data that span the entire globe and its full climatic and geographic heterogeneity. This has meant that the use of high-resolution daytime satellite imagery by social scientists has been confined to specific settings, such as examining deforestation in Indonesia (Burgess et al., 2012) or mapping slums in Kenya (Marx et al., 2015). Yet, the real potential for satellite data lies in planetary-scale applications (Donaldson and Storeygard, 2016).

We build a methodology to map annual urban land cover between 2000-2016 using Google Earth Engine (GEE), a cloud-based computational platform. These maps will be made available to the public through GEE and updated annually. While our methodology will produce data that maps urban extent globally, we prioritise India. Our choice of India follows from our on-going work on the country and access to geocoded data on Indian firms. However, we believe that all IGC country offices will find value in our methodology and results.The potential uses of accurate high-resolution maps of urban areas are immense. Consider a classic question by policymakers in developing countries: do place-based manufacturing policies generate localised spillovers? Investment in manufacturing is widely recognized as a driver of urbanisation (Glaeser et al., 2009). Because the creation of a large factory is a rare event, these entities are not smoothly distributed across space (Gabaix, 2011). By choosing one location over another for a major facility, a firm may affect the spatial distribution of economic activity both within and between cities. Although evidence from developed countries

The potential uses of accurate high-resolution maps of urban areas are immense. Consider a classic question by policymakers in developing countries: do place-based manufacturing policies generate localized spillovers? Investment in manufacturing is widely recognized as a driver of urbanisation (Glaeser et al., 2009). Because the creation of a large factory is a rare event, these entities are not smoothly distributed across space (Gabaix, 2011). By choosing one location over another for a major facility, a firm may affect the spatial distribution of economic activity both within and between cities. Although evidence from developed countries shows positive productivity spillovers from the arrival of new manufacturing facilities (Greenstone et al., 2010), rigorous evidence from developing countries is scant, despite the active use of place-based industrial policies in these countries (e.g., SEZs). By integrating geocoded business registries in India with satellite-based maps of urban extent, we can precisely track how spatial activity responds to large plant openings. Our method can be applied to any country for which geocoded data of firms are available.

Outputs