WorldBankReader¶
Module for reading and processing World Bank economic data.
This module provides functionality to read and analyze various economic indicators from the World Bank database. It handles a wide range of economic data including GDP, tax rates, unemployment, inflation, and other key economic indicators.
Key Features
- GDP and population statistics
- Tax rates (VAT, export taxes)
- Labor market indicators (unemployment, participation rates)
- Inflation and price indices
- Government debt data
- Income inequality measures (Gini coefficients)
- Financial sector health indicators (NPL ratios)
Example
from pathlib import Path
from macro_data.configuration.countries import Country
# Initialize reader
reader = WorldBankReader(path=Path("path/to/world_bank_data"))
# Get GDP data for a country
gdp = reader.get_historic_gdp(country=Country.USA, year=2020)
# Get VAT rate
vat = reader.get_tau_vat(country=Country.GBR, year=2020)
Note
- Uses standardized World Bank data files
- Handles missing data through proxies and interpolation
- Supports data pruning for specific date ranges
- Includes forced values for certain tax rates where data is unavailable
WorldBankReader
¶
Reader class for World Bank economic data.
This class provides methods to read and process various economic indicators from World Bank datasets. It handles data loading, scaling, and provides access to a wide range of economic statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to directory containing World Bank data files |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
data |
dict[str, DataFrame]
|
Dictionary of loaded World Bank datasets |
files_with_codes |
dict[str, str]
|
Mapping of data categories to file names |
Key Methods
- GDP and Growth:
- get_historic_gdp: Get historical GDP values
- get_current_scaled_gdp: Get scaled current GDP
- Labor Market:
- get_unemployment_rate: Get unemployment rates
- get_participation_rate: Get labor force participation
- Taxes:
- get_tau_vat: Get VAT rates
- get_tau_exp: Get export tax rates
- Prices and Inflation:
- get_log_inflation: Get log inflation rates
- get_inflation: Get raw inflation data
- Other Indicators:
- get_gini_coef: Get income inequality measures
- get_central_gov_debt: Get government debt data
- get_npl_ratios: Get non-performing loan ratios
files_with_codes = self.get_files_with_codes()
instance-attribute
¶
data = {}
instance-attribute
¶
__init__(path: Path)
¶
Initialize the WorldBankReader with data path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to directory containing World Bank data files |
required |
Note
- Loads data files specified in files_with_codes
- Special handling for certain files that don't require row skipping
- Uses ISO-8859-1 encoding for file reading
get_files_with_codes() -> dict[str, str]
staticmethod
¶
Get mapping of data categories to file names.
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
dict[str, str]: Dictionary mapping data categories to their file names, including unemployment, tax rates, GDP, and other indicators |
Note
File names follow World Bank API naming conventions: - API_* files are direct World Bank indicators - Other files are supplementary data sources
get_central_gov_debt(country: str, year: int) -> float
¶
Get central government debt for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
str
|
Country code (ISO 3-letter) |
required |
year
|
int
|
Year to get debt data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Central government debt value |
Note
- Returns 0.0 for Argentina and Taiwan
- Falls back to previous year's value if data not available
- Returns 0.0 for year 1959
get_population(country: Country, year: int) -> float
¶
Get total population for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get population for |
required |
year
|
int
|
Year to get population data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Total population count |
Note
Uses World Bank's total population indicator (SP.POP.TOTL)
get_participation_rate(country: Country) -> pd.DataFrame
¶
Retrieves the participation rate for a specific country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
The country code for the desired country. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing the participation rate for the specified country. |
get_tau_vat(country: Country, year: int) -> float
¶
Get VAT (Value Added Tax) rate for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get VAT rate for |
required |
year
|
int
|
Year to get tax rate for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
VAT rate as decimal |
Note
- Uses forced_vat values for countries with missing or unreliable data
- Tax rate is expressed as a decimal (e.g., 0.20 for 20% VAT)
- Returns 0.0 if data not available and country not in forced_vat
get_lcu_exports(country: Country, year: int) -> float
¶
Retrieves the export tax rate for a specific country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
The country code for the desired country. |
required |
year
|
int
|
The year for the data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The export tax rate for the specified country and year. |
get_gini_coef(country: Country, year: int) -> float
¶
Retrieves the Gini coefficient for a specific country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
The country code for the desired country. |
required |
year
|
int
|
The year for the data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The Gini coefficient for the specified country and year. |
get_historic_gdp(country: Country, year: int) -> float
¶
Get historical GDP value for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get GDP for |
required |
year
|
int
|
Year to get GDP data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
GDP value in local currency units (LCU) |
Note
- Uses World Bank's GDP indicator (NY.GDP.MKTP.CN)
- Values are in current local currency units
- Returns raw value without scaling
get_current_scaled_gdp(country: Country, year: int, rescale_factor: float = 4.0) -> float
¶
Get scaled current GDP value for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get GDP for |
required |
year
|
int
|
Year to get GDP data for |
required |
rescale_factor
|
float
|
Factor to scale GDP by. Defaults to 4.0 for quarterly data. |
4.0
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Scaled GDP value in local currency units (LCU) |
Note
- Uses historic GDP values divided by rescale_factor
- Typically used to convert annual to quarterly values
- Values are in current local currency units
get_log_inflation(country: Country, start_year: int = 1970, end_year: int = 2024) -> pd.DataFrame
¶
Retrieves the log inflation data for a specific country within a given time range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
The country code for the desired country. |
required |
start_year
|
int
|
The starting year for the data (default: 1970). |
1970
|
end_year
|
int
|
The ending year for the data (default: 2024). |
2024
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing the log growth of inflation for the specified country and time range. |
get_unemployment_rate(country: str) -> pd.DataFrame
¶
Get time series of unemployment rates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
str
|
Country to get unemployment rates for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: DataFrame with dates as index and unemployment rates as values (in decimal form) |
Note
- Returns quarterly data
- Uses World Bank's total unemployment indicator (SL.UEM.TOTL.ZS)
- Forward fills missing values
get_inflation(country: str) -> pd.DataFrame
¶
get_tau_exp(country: str, year: int, default_value: float = 0.0) -> float
¶
prune(prune_date: date) -> None
¶
Prunes the data based on a given prune date.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prune_date
|
date
|
The date to prune the data. Can be an integer, string, or pandas Timestamp. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |