OECDEconData¶
Module for reading and processing OECD economic data.
This module provides comprehensive functionality to read and analyze various economic indicators from the OECD (Organisation for Economic Co-operation and Development). It handles a wide range of data including employment statistics, business demographics, tax rates, banking statistics, and macroeconomic indicators.
Key Features
- Employment statistics by industry
- Business demographics and firm size distributions
- Tax rates (corporate, personal, social insurance)
- Banking sector statistics
- Government debt and benefits data
- Interest rates and inflation
- Housing market indicators
- Unemployment and vacancy rates
- Consumption statistics by income quantile
Example
from pathlib import Path
from macro_data.configuration.countries import Country
# Initialize reader with scaling factors for each country
scale_dict = {Country("USA"): 1000, Country("GBR"): 100}
reader = OECDEconData(
path=Path("path/to/oecd_data"),
scale_dict=scale_dict
)
# Get employment by industry for a country and year
usa_employment = reader.employees_by_industry(2020, Country("USA"))
Note
- Uses standardized OECD data files
- Handles missing data through proxies and interpolation
- Supports data pruning for specific date ranges
- Includes forced values for certain tax rates where OECD data is unavailable
OECDEconData
¶
Reader class for OECD economic data.
This class provides methods to read and process various economic indicators from OECD datasets. It handles data scaling, industry mapping, and provides access to a wide range of economic statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to directory containing OECD data files |
required |
scale_dict
|
dict[Country, int]
|
Dictionary mapping countries to their scaling factors for employment numbers |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
scale_dict |
dict[Country, int]
|
Country-specific scaling factors |
industry_mapping |
dict
|
Mapping between OECD and internal industry codes |
sector_mapping |
dict
|
Mapping for input-output table sectors |
data |
dict[str, DataFrame]
|
Dictionary of loaded OECD datasets |
default_industries |
list[str]
|
List of standard industry codes |
scale_dict = scale_dict
instance-attribute
¶
industry_mapping = INDUSTRY_MAPPING
instance-attribute
¶
sector_mapping = ICIO_AGGREGATE
instance-attribute
¶
files_with_codes = self.get_files_with_codes()
instance-attribute
¶
data = {key: (pd.read_csv(path / (self.files_with_codes[key] + '.csv')))for key in (self.files_with_codes.keys())}
instance-attribute
¶
default_industries = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R_S', 'T']
instance-attribute
¶
__init__(path: Path | str, scale_dict: dict[Country, int])
¶
Initialize the OECDEconData reader with data path and scaling factors.
get_files_with_codes() -> dict[str, str]
staticmethod
¶
Get mapping of data categories to file names.
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
dict[str, str]: Dictionary mapping data categories to their file names, including employment, tax, banking, and other economic data |
employees_by_industry(year: int, country: Country) -> pd.Series
¶
Get number of employees by industry for a specific country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
year
|
int
|
Year to get employment data for |
required |
country
|
Country
|
Country to get employment data for |
required |
Returns:
| Type | Description |
|---|---|
Series
|
pd.Series: Series containing number of employees for each industry, scaled by country-specific factor |
Note
- Uses default industry classification (A through T)
- Handles missing data by using median values
- Returns scaled values based on scale_dict
read_business_demography(country: Country | Region, output: pd.Series, year: int) -> np.ndarray
¶
Read business demography data for active employer enterprises.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get data for |
required |
output
|
Series
|
Total output of each country by industry |
required |
year
|
int
|
Year to get data for |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Number of active employer enterprises by industry |
Note
- Special handling for GBR (2014-2018 only)
- Special handling for DEU (2012 onwards)
- Special handling for AUS (2010-2014)
zeta_dist(x: np.ndarray, a: float) -> np.ndarray
staticmethod
¶
Calculate normalized Zeta distribution values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Input values (firm sizes) |
required |
a
|
float
|
Shape parameter of the Zeta distribution |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Normalized probability values from Zeta distribution |
Note
Uses Riemann zeta function minus 1 (zetac) for calculations
find_sector_code(code: str) -> int | None
¶
Find internal sector code from OECD industry code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
OECD industry code |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int | None
|
Internal sector code |
Note
Uses industry_mapping dictionary for conversion
read_firm_size_zetas(country: str | Region, year: int) -> dict[int, np.ndarray] | None
¶
Calculate Zeta distribution parameters for firm sizes by sector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
str
|
Country to get firm size distribution for |
required |
year
|
int
|
Year to get firm size distribution for |
required |
Returns:
| Type | Description |
|---|---|
dict[int, ndarray] | None
|
dict[int, np.ndarray] | None: Dictionary mapping sector indices to their Zeta distribution parameters, or None if data not available |
Note
- Fits Zeta distributions to empirical firm size distributions
- Returns None if insufficient data for fitting
- Handles different size classes and industry codes
find_closest_year(df: pd.DataFrame, year: int) -> int
staticmethod
¶
Find the closest available year in DataFrame to target year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing time series data |
required |
year
|
int
|
Target year |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Closest available year to target year |
read_tau_sif(country: Country | str | Region, year: int) -> float
¶
Get social insurance tax rate (firm contribution) for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country | str
|
Country to get tax rate for |
required |
year
|
int
|
Year to get tax rate for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Social insurance tax rate as decimal |
Note
Uses force_tau_sif values for countries with missing or unreliable data
read_tau_siw(country: Country | str | Region, year: int) -> float
¶
Get social insurance tax rate (worker contribution) for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country | str
|
Country to get tax rate for |
required |
year
|
int
|
Year to get tax rate for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Social insurance tax rate as decimal |
Note
Uses force_tau_siw values for countries with missing or unreliable data
read_tau_firm(country: Country | str | Region, year: int) -> float
¶
Get corporate tax rate for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country | str
|
Country to get tax rate for |
required |
year
|
int
|
Year to get tax rate for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Corporate tax rate as decimal |
Note
Uses force_tau_firm values for countries with missing or unreliable data
read_tau_income(country: Country | Region, year: int) -> float
¶
Get personal income tax rate for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get tax rate for |
required |
year
|
int
|
Year to get tax rate for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Personal income tax rate as decimal |
Note
Uses force_tau_income values for countries with missing or unreliable data
read_short_term_interest_rates(country: Country | Region, year: int) -> float
¶
Get short-term interest rates for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get interest rates for |
required |
year
|
int
|
Year to get interest rates for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Short-term interest rate as decimal |
Note
Returns mean of monthly rates for the year
read_long_term_interest_rates(country: Country | Region, year: int) -> float
¶
Get long-term interest rates for a country and year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get interest rates for |
required |
year
|
int
|
Year to get interest rates for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Long-term interest rate as decimal |
Note
Returns mean of monthly rates for the year
get_bank_demographics(country: Country | Region, year: int, code: str) -> float
¶
Get bank demographic data for a specific metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get data for |
required |
year
|
int
|
Year to get data for |
required |
code
|
str
|
Demographic metric code |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Value for the requested demographic metric |
Note
Handles missing data by finding closest available year
read_tierone_reserves(country: Country, year: int) -> float
¶
Get Tier 1 capital reserves ratio for banks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get reserves for |
required |
year
|
int
|
Year to get reserves for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Tier 1 capital ratio as decimal |
read_number_of_banks(country: Country, year: int) -> int
¶
Get number of banks in a country for a year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get bank count for |
required |
year
|
int
|
Year to get bank count for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of banks |
read_number_of_bank_branches(country: Country, year: int) -> int
¶
Get number of bank branches in a country for a year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get branch count for |
required |
year
|
int
|
Year to get branch count for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of bank branches |
read_number_of_bank_employees(country: Country, year: int) -> int
¶
Get number of bank employees in a country for a year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get employee count for |
required |
year
|
int
|
Year to get employee count for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of bank employees |
read_bank_distributed_profit(country: Country, year: int) -> float
¶
Get distributed profit of banks in a country for a year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get profit data for |
required |
year
|
int
|
Year to get profit data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Distributed profit amount in LCU (millions) |
read_bank_retained_profit(country: Country, year: int) -> float
¶
Get retained profit of banks in a country for a year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get profit data for |
required |
year
|
int
|
Year to get profit data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Retained profit amount (millions of LCU) |
read_bank_total_assets(country: Country, year: int) -> float
¶
Get total assets of banks in a country for a year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get asset data for |
required |
year
|
int
|
Year to get asset data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Total bank assets in LCU (millions) |
unemployment_benefits_gdp_pct(country: Country | Region, year: int) -> float
¶
Get unemployment benefits as percentage of GDP.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get benefits data for |
required |
year
|
int
|
Year to get benefits data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Unemployment benefits as decimal percentage of GDP |
Note
Returns 0.0 if data not available
all_benefits_gdp_pct(country: str, year: int, average_oecd: float = 0.212) -> float
¶
Get total social benefits as percentage of GDP.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
str
|
Country to get benefits data for |
required |
year
|
int
|
Year to get benefits data for |
required |
average_oecd
|
float
|
Default OECD average to use if data missing |
0.212
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Total social benefits as decimal percentage of GDP |
Note
Uses OECD average if data not available for country
general_gov_debt(country: Country, year: int) -> float
¶
Get general government debt
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get debt data for |
required |
year
|
int
|
Year to get debt data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Government debt in LCU (millions) |
get_unemployment_rate(country: str) -> pd.DataFrame
¶
Get time series of unemployment rates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
str
|
Country to get unemployment rates for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: DataFrame with dates as index and unemployment rates as values (in decimal form) |
Note
Returns quarterly data
get_consumption_rates_by_income(country: Country) -> pd.DataFrame
¶
Get consumption rates by income quintile.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get consumption rates for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: DataFrame with income quintiles as columns and consumption rates as values |
get_house_price_index(country: str) -> pd.DataFrame
¶
Get time series of house price indices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
str
|
Country to get house price data for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: DataFrame with dates as index and price indices as values |
Note
Returns quarterly data, normalized to base year
get_vacancy_rate(country: Country) -> pd.DataFrame
¶
Get time series of job vacancy rates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get vacancy rates for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: DataFrame with dates as index and vacancy rates as values (in decimal form) |
Note
Returns quarterly data
get_household_consumption_by_income_quantile(country: Country, year: int) -> pd.DataFrame
¶
Get household consumption data by income quantile.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get consumption data for |
required |
year
|
int
|
Year to get consumption data for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: DataFrame with income quantiles and their corresponding consumption values |
get_govt_debt_usd_ppp(country: Country, year: int) -> float
¶
Get government debt in USD PPP terms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
Country
|
Country to get debt data for |
required |
year
|
int
|
Year to get debt data for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Government debt in USD PPP |
get_inflation(country: str) -> pd.DataFrame
¶
Get time series of inflation rates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
str
|
Country to get inflation data for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: DataFrame with dates as index and inflation rates as values (in decimal form) |
Note
Returns monthly data based on Producer Price Index (PPI)
get_na_growth_rates(country: str) -> pd.DataFrame
¶
Get national accounts growth rates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country
|
str
|
Country to get growth rates for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: DataFrame with dates as index and various national accounts growth rates as columns |
Note
Includes GDP, consumption, investment, and other key metrics
prune(prune_date: date)
¶
Prune data to only include entries after specified date.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prune_date
|
date
|
Date to prune data from |
required |
Returns:
| Name | Type | Description |
|---|---|---|
OECDEconData |
Self for method chaining |
Note
- Modifies data in place
- Handles both time period columns and date-based columns
- Warns if no data remains after pruning