Gdoc/Admin
HomePoverty

Data appendix – The fight against global poverty: 200 years of progress and still a very long way to go

An online data appendix explaining the data and methods used to estimate the historical poverty trends presented in Roser and Hasell (2021)

Cite this articleReuse our work freely

This is an online data appendix explaining the data and methods used to estimate the historical poverty trends presented in Roser and Hasell (2021). For related data and research, see our topic page on Poverty.

This is an appendix providing further detail on the data and methods used in our historical reconstructions of global poverty from national accounts data, as presented in Roser and Hasell (2021).

The paper will be available online at the publisher's website.

Note that:


1) Baseline estimates

First we present the baseline poverty estimates presented in the paper.

The number and share of people living at different income thresholds.

This is an interactive version of the charts included as figure 11 in the main paper.

The share living below $5 a day, by region

Here we provide interactive versions of figures 12 and 13 of the main paper.

A single long-run series of extreme poverty combining national accounts and recent survey based estimates

This is an interactive version of figure 14 of the main paper.

It joins recent World Bank estimates of the share of people globally living below $1.90 a day from 1980, with our own historical national accounts estimates using a poverty line of $5.20. For a discussion of these two approaches to estimating poverty and how they relate to one another see the main paper.


2) Data sources

As explained in the paper, the estimates above are based on three inputs:

Here we discuss the sources used for each of these inputs.

Inequality data

Our baseline estimates are based on a combination of two datasets:

Alternative estimates (presented below) combine the historical data from van Zanden and others (2014) with two other datasets for more recent decades respectively:

GDP per capita and population data

All population data and almost all data on GDP per capita is derived purely from the 2020 release of the Maddison Project Database.

For Sub-Saharan African countries, estimates for GDP per capita for most countries prior to 1950 were obtained by applying the growth rates estimated by Prados de la Escosura (2012)2 to extend the Maddison estimates backwards (see next section on extrapolation).


3) Imputation of missing data points

The inequality, GDP per capita and population datasets listed above do not provide complete coverage. In order to produce global poverty estimates for a set of benchmark years, estimates for these three variables had to first be interpolated or extrapolated where missing for all countries for each benchmark year.

The process was as follows.

For GDP per capita and population data:

Where no observation for a given benchmark year was available, but an earlier and later observation was provided in the dataset, a datapoint was interpolated assuming a constant annual rate of growth between the available data points.

Where no observation prior to the benchmark year was available, a data point was extrapolated by applying an assumed growth rate. The growth rate applied was calculated as follows:

  1. For the Gini coefficient, missing values were replaced with the average observed across either the bloc (former Yugoslavia or USSR countries) or the region (according Maddison region definitions) in the given benchmark year.
  2. For the purposes of replication, the fully interpolated dataset is shown in the two charts below. Both charts show the same data points: GDP per capita along the horizontal axis, Gini coefficient along the vertical axis and population as bubble size. The colour indicates the source and treatment used to arrive at the GDP per capita data points and the Gini data points respectively.
  3. It should be noted that the objective of the interpolation was to provide a complete dataset of country-benchmark year observations that fall within plausible bounds, in order to derive global poverty estimates. To understand trends in particular countries, we refer you to the original data sources, listed above.

4) Deriving poverty estimates from a fitted parametric distribution

Poverty estimates for each country and benchmark year were derived by fitting a lognormal income distribution.

A lognormal distribution is defined by two parameters, μ and σ:

These are the expected value (or mean) and standard deviation of the variable's natural logarithm. These can be obtained from average incomes (GDP per capita) and the Gini coefficient as follows:

σ is obtained from the Gini coefficient given the following relationship (see for instance Jorda, Sarabia, Jäntti (2018)):3

G=2Φ(σ2)-1

where G is the Gini coefficient, Φ the cumulative standard normal distribution, and Φ−1 its inverse. Rearranging, we find:

σ=2Φ-1(G+12)

Assuming incomes, X, are distributed lognormally, the average income is given by:

X=eμ+12σ2

Rearranging we find:

μ=lnX¯-12σ2

In our approach, the average income is given by GDP per capita.

Poverty rates are then calculated, for a given poverty line, p, using the cumulative lognormal distribution defined by these two parameters:

Poverty(p)=P(Xp)=Φ((lnp)-μσ)

This yields the poverty estimates for individual countries, shown in the chart. (You can change the country in the visualization or download the data for all countries). As discussed above, the data for many countries relies on extensive interpolation or extrapolation and should not be relied on to understand trends in particular countries without consulting the underlying data sources.

World and regional poverty rates are then calculated as the population-weighted average rates across countries.

Endnotes

  1. Zanden, Jan Luiten van, Joerg Baten, Peter Foldvari, and Bas van Leeuwen. 2014. “The Changing Shape of Global Inequality 1820–2000; Exploring a New Dataset.” Review of Income and Wealth 60 (2): 279–97.

  2. Prados de la Escosura, Leandro. 2012. “OUTPUT PER HEAD IN PRE-INDEPENDENCE AFRICA: QUANTITATIVE CONJECTURES.” Economic History of Developing Regions 27 (2): 1–36.

    A working paper version is available online at Core Econ here.

  3. Jorda, Sarabia, Jäntti (2018) ‘Estimation of income inequality from grouped data’, available at arxiv.org here.

Cite this work

Our articles and data visualizations rely on work from many different people and organizations. When citing this article, please also cite the underlying data sources. This article can be cited as:

Joe Hasell (2019) - “Data appendix – The fight against global poverty: 200 years of progress and still a very long way to go” Published online at OurWorldinData.org. Retrieved from: 'https://ourworldindata.org/history-of-poverty-data-appendix' [Online Resource]

BibTeX citation

@article{owid-history-of-poverty-data-appendix,
    author = {Joe Hasell},
    title = {Data appendix – The fight against global poverty: 200 years of progress and still a very long way to go},
    journal = {Our World in Data},
    year = {2019},
    note = {https://ourworldindata.org/history-of-poverty-data-appendix}
}
Our World in Data logo

Reuse this work freely

All visualizations, data, and code produced by Our World in Data are completely open access under the Creative Commons BY license. You have the permission to use, distribute, and reproduce these in any medium, provided the source and authors are credited.

The data produced by third parties and made available by Our World in Data is subject to the license terms from the original third-party authors. We will always indicate the original source of the data in our documentation, so you should always check the license of any such third-party data before use and redistribution.

All of our charts can be embedded in any site.