Skip to content

WorldBankReader

Module for reading and processing World Bank economic data.

This module provides functionality to read and analyze various economic indicators from the World Bank database. It handles a wide range of economic data including GDP, tax rates, unemployment, inflation, and other key economic indicators.

Key Features
  • GDP and population statistics
  • Tax rates (VAT, export taxes)
  • Labor market indicators (unemployment, participation rates)
  • Inflation and price indices
  • Government debt data
  • Income inequality measures (Gini coefficients)
  • Financial sector health indicators (NPL ratios)
Example
from pathlib import Path
from macro_data.configuration.countries import Country

# Initialize reader
reader = WorldBankReader(path=Path("path/to/world_bank_data"))

# Get GDP data for a country
gdp = reader.get_historic_gdp(country=Country.USA, year=2020)

# Get VAT rate
vat = reader.get_tau_vat(country=Country.GBR, year=2020)
Note
  • Uses standardized World Bank data files
  • Handles missing data through proxies and interpolation
  • Supports data pruning for specific date ranges
  • Includes forced values for certain tax rates where data is unavailable

WorldBankReader

Reader class for World Bank economic data.

This class provides methods to read and process various economic indicators from World Bank datasets. It handles data loading, scaling, and provides access to a wide range of economic statistics.

Parameters:

Name Type Description Default
path Path

Path to directory containing World Bank data files

required

Attributes:

Name Type Description
data dict[str, DataFrame]

Dictionary of loaded World Bank datasets

files_with_codes dict[str, str]

Mapping of data categories to file names

Key Methods
  • GDP and Growth:
    • get_historic_gdp: Get historical GDP values
    • get_current_scaled_gdp: Get scaled current GDP
  • Labor Market:
    • get_unemployment_rate: Get unemployment rates
    • get_participation_rate: Get labor force participation
  • Taxes:
    • get_tau_vat: Get VAT rates
    • get_tau_exp: Get export tax rates
  • Prices and Inflation:
    • get_log_inflation: Get log inflation rates
    • get_inflation: Get raw inflation data
  • Other Indicators:
    • get_gini_coef: Get income inequality measures
    • get_central_gov_debt: Get government debt data
    • get_npl_ratios: Get non-performing loan ratios
files_with_codes = self.get_files_with_codes() instance-attribute
data = {} instance-attribute
__init__(path: Path)

Initialize the WorldBankReader with data path.

Parameters:

Name Type Description Default
path Path

Path to directory containing World Bank data files

required
Note
  • Loads data files specified in files_with_codes
  • Special handling for certain files that don't require row skipping
  • Uses ISO-8859-1 encoding for file reading
get_files_with_codes() -> dict[str, str] staticmethod

Get mapping of data categories to file names.

Returns:

Type Description
dict[str, str]

dict[str, str]: Dictionary mapping data categories to their file names, including unemployment, tax rates, GDP, and other indicators

Note

File names follow World Bank API naming conventions: - API_* files are direct World Bank indicators - Other files are supplementary data sources

get_central_gov_debt(country: str, year: int) -> float

Get central government debt for a country and year.

Parameters:

Name Type Description Default
country str

Country code (ISO 3-letter)

required
year int

Year to get debt data for

required

Returns:

Name Type Description
float float

Central government debt value

Note
  • Returns 0.0 for Argentina and Taiwan
  • Falls back to previous year's value if data not available
  • Returns 0.0 for year 1959
get_population(country: Country, year: int) -> float

Get total population for a country and year.

Parameters:

Name Type Description Default
country Country

Country to get population for

required
year int

Year to get population data for

required

Returns:

Name Type Description
float float

Total population count

Note

Uses World Bank's total population indicator (SP.POP.TOTL)

get_participation_rate(country: Country) -> pd.DataFrame

Retrieves the participation rate for a specific country and year.

Parameters:

Name Type Description Default
country Country

The country code for the desired country.

required

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame containing the participation rate for the specified country.

get_tau_vat(country: Country, year: int) -> float

Get VAT (Value Added Tax) rate for a country and year.

Parameters:

Name Type Description Default
country Country

Country to get VAT rate for

required
year int

Year to get tax rate for

required

Returns:

Name Type Description
float float

VAT rate as decimal

Note
  • Uses forced_vat values for countries with missing or unreliable data
  • Tax rate is expressed as a decimal (e.g., 0.20 for 20% VAT)
  • Returns 0.0 if data not available and country not in forced_vat
get_lcu_exports(country: Country, year: int) -> float

Retrieves the export tax rate for a specific country and year.

Parameters:

Name Type Description Default
country Country

The country code for the desired country.

required
year int

The year for the data.

required

Returns:

Name Type Description
float float

The export tax rate for the specified country and year.

get_gini_coef(country: Country, year: int) -> float

Retrieves the Gini coefficient for a specific country and year.

Parameters:

Name Type Description Default
country Country

The country code for the desired country.

required
year int

The year for the data.

required

Returns:

Name Type Description
float float

The Gini coefficient for the specified country and year.

get_historic_gdp(country: Country, year: int) -> float

Get historical GDP value for a country and year.

Parameters:

Name Type Description Default
country Country

Country to get GDP for

required
year int

Year to get GDP data for

required

Returns:

Name Type Description
float float

GDP value in local currency units (LCU)

Note
  • Uses World Bank's GDP indicator (NY.GDP.MKTP.CN)
  • Values are in current local currency units
  • Returns raw value without scaling
get_current_scaled_gdp(country: Country, year: int, rescale_factor: float = 4.0) -> float

Get scaled current GDP value for a country and year.

Parameters:

Name Type Description Default
country Country

Country to get GDP for

required
year int

Year to get GDP data for

required
rescale_factor float

Factor to scale GDP by. Defaults to 4.0 for quarterly data.

4.0

Returns:

Name Type Description
float float

Scaled GDP value in local currency units (LCU)

Note
  • Uses historic GDP values divided by rescale_factor
  • Typically used to convert annual to quarterly values
  • Values are in current local currency units
get_log_inflation(country: Country, start_year: int = 1970, end_year: int = 2024) -> pd.DataFrame

Retrieves the log inflation data for a specific country within a given time range.

Parameters:

Name Type Description Default
country Country

The country code for the desired country.

required
start_year int

The starting year for the data (default: 1970).

1970
end_year int

The ending year for the data (default: 2024).

2024

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame containing the log growth of inflation for the specified country and time range.

get_unemployment_rate(country: str) -> pd.DataFrame

Get time series of unemployment rates.

Parameters:

Name Type Description Default
country str

Country to get unemployment rates for

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with dates as index and unemployment rates as values (in decimal form)

Note
  • Returns quarterly data
  • Uses World Bank's total unemployment indicator (SL.UEM.TOTL.ZS)
  • Forward fills missing values
get_inflation(country: str) -> pd.DataFrame
get_tau_exp(country: str, year: int, default_value: float = 0.0) -> float
prune(prune_date: date) -> None

Prunes the data based on a given prune date.

Parameters:

Name Type Description Default
prune_date date

The date to prune the data. Can be an integer, string, or pandas Timestamp.

required

Returns:

Type Description
None

None

get_npl_ratios(country: Country | str) -> pd.DataFrame