OECDEconData¶

Module for reading and processing OECD economic data.

This module provides comprehensive functionality to read and analyze various economic indicators from the OECD (Organisation for Economic Co-operation and Development). It handles a wide range of data including employment statistics, business demographics, tax rates, banking statistics, and macroeconomic indicators.

Key Features

Employment statistics by industry
Business demographics and firm size distributions
Tax rates (corporate, personal, social insurance)
Banking sector statistics
Government debt and benefits data
Interest rates and inflation
Housing market indicators
Unemployment and vacancy rates
Consumption statistics by income quantile

Example

from pathlib import Path
from macro_data.configuration.countries import Country

# Initialize reader with scaling factors for each country
scale_dict = {Country("USA"): 1000, Country("GBR"): 100}
reader = OECDEconData(
    path=Path("path/to/oecd_data"),
    scale_dict=scale_dict
)

# Get employment by industry for a country and year
usa_employment = reader.employees_by_industry(2020, Country("USA"))

Note

Uses standardized OECD data files
Handles missing data through proxies and interpolation
Supports data pruning for specific date ranges
Includes forced values for certain tax rates where OECD data is unavailable

`OECDEconData` ¶

Reader class for OECD economic data.

This class provides methods to read and process various economic indicators from OECD datasets. It handles data scaling, industry mapping, and provides access to a wide range of economic statistics.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str`	Path to directory containing OECD data files	required
`scale_dict`	`dict[Country, int]`	Dictionary mapping countries to their scaling factors for employment numbers	required

Attributes:

Name	Type	Description
`scale_dict`	`dict[Country, int]`	Country-specific scaling factors
`industry_mapping`	`dict`	Mapping between OECD and internal industry codes
`sector_mapping`	`dict`	Mapping for input-output table sectors
`data`	`dict[str, DataFrame]`	Dictionary of loaded OECD datasets
`default_industries`	`list[str]`	List of standard industry codes

`scale_dict = scale_dict` `instance-attribute` ¶

`industry_mapping = INDUSTRY_MAPPING` `instance-attribute` ¶

`sector_mapping = ICIO_AGGREGATE` `instance-attribute` ¶

`files_with_codes = self.get_files_with_codes()` `instance-attribute` ¶

`data = {key: (pd.read_csv(path / (self.files_with_codes[key] + '.csv')))for key in (self.files_with_codes.keys())}` `instance-attribute` ¶

`default_industries = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R_S', 'T']` `instance-attribute` ¶

`init(path: Path | str, scale_dict: dict[Country, int])` ¶

Initialize the OECDEconData reader with data path and scaling factors.

`get_files_with_codes() -> dict[str, str]` `staticmethod` ¶

Get mapping of data categories to file names.

Returns:

Type	Description
`dict[str, str]`	dict[str, str]: Dictionary mapping data categories to their file names, including employment, tax, banking, and other economic data

`employees_by_industry(year: int, country: Country) -> pd.Series` ¶

Get number of employees by industry for a specific country and year.

Parameters:

Name	Type	Description	Default
`year`	`int`	Year to get employment data for	required
`country`	`Country`	Country to get employment data for	required

Returns:

Type	Description
`Series`	pd.Series: Series containing number of employees for each industry, scaled by country-specific factor

Note

Uses default industry classification (A through T)
Handles missing data by using median values
Returns scaled values based on scale_dict

`read_business_demography(country: Country | Region, output: pd.Series, year: int) -> np.ndarray` ¶

Read business demography data for active employer enterprises.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get data for	required
`output`	`Series`	Total output of each country by industry	required
`year`	`int`	Year to get data for	required

Returns:

Type	Description
`ndarray`	np.ndarray: Number of active employer enterprises by industry

Note

Special handling for GBR (2014-2018 only)
Special handling for DEU (2012 onwards)
Special handling for AUS (2010-2014)

`zeta_dist(x: np.ndarray, a: float) -> np.ndarray` `staticmethod` ¶

Calculate normalized Zeta distribution values.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	Input values (firm sizes)	required
`a`	`float`	Shape parameter of the Zeta distribution	required

Returns:

Type	Description
`ndarray`	np.ndarray: Normalized probability values from Zeta distribution

Note

Uses Riemann zeta function minus 1 (zetac) for calculations

`find_sector_code(code: str) -> int | None` ¶

Find internal sector code from OECD industry code.

Parameters:

Name	Type	Description	Default
`code`	`str`	OECD industry code	required

Returns:

Name	Type	Description
`int`	`int \| None`	Internal sector code

Note

Uses industry_mapping dictionary for conversion

`read_firm_size_zetas(country: str | Region, year: int) -> dict[int, np.ndarray] | None` ¶

Calculate Zeta distribution parameters for firm sizes by sector.

Parameters:

Name	Type	Description	Default
`country`	`str`	Country to get firm size distribution for	required
`year`	`int`	Year to get firm size distribution for	required

Returns:

Type	Description
`dict[int, ndarray] \| None`	dict[int, np.ndarray] \| None: Dictionary mapping sector indices to their Zeta distribution parameters, or None if data not available

Note

Fits Zeta distributions to empirical firm size distributions
Returns None if insufficient data for fitting
Handles different size classes and industry codes

`find_closest_year(df: pd.DataFrame, year: int) -> int` `staticmethod` ¶

Find the closest available year in DataFrame to target year.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	DataFrame containing time series data	required
`year`	`int`	Target year	required

Returns:

Name	Type	Description
`int`	`int`	Closest available year to target year

`read_tau_sif(country: Country | str | Region, year: int) -> float` ¶

Get social insurance tax rate (firm contribution) for a country and year.

Parameters:

Name	Type	Description	Default
`country`	`Country \| str`	Country to get tax rate for	required
`year`	`int`	Year to get tax rate for	required

Returns:

Name	Type	Description
`float`	`float`	Social insurance tax rate as decimal

Note

Uses force_tau_sif values for countries with missing or unreliable data

`read_tau_siw(country: Country | str | Region, year: int) -> float` ¶

Get social insurance tax rate (worker contribution) for a country and year.

Parameters:

Name	Type	Description	Default
`country`	`Country \| str`	Country to get tax rate for	required
`year`	`int`	Year to get tax rate for	required

Returns:

Name	Type	Description
`float`	`float`	Social insurance tax rate as decimal

Note

Uses force_tau_siw values for countries with missing or unreliable data

`read_tau_firm(country: Country | str | Region, year: int) -> float` ¶

Get corporate tax rate for a country and year.

Parameters:

Name	Type	Description	Default
`country`	`Country \| str`	Country to get tax rate for	required
`year`	`int`	Year to get tax rate for	required

Returns:

Name	Type	Description
`float`	`float`	Corporate tax rate as decimal

Note

Uses force_tau_firm values for countries with missing or unreliable data

`read_tau_income(country: Country | Region, year: int) -> float` ¶

Get personal income tax rate for a country and year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get tax rate for	required
`year`	`int`	Year to get tax rate for	required

Returns:

Name	Type	Description
`float`	`float`	Personal income tax rate as decimal

Note

Uses force_tau_income values for countries with missing or unreliable data

`read_short_term_interest_rates(country: Country | Region, year: int) -> float` ¶

Get short-term interest rates for a country and year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get interest rates for	required
`year`	`int`	Year to get interest rates for	required

Returns:

Name	Type	Description
`float`	`float`	Short-term interest rate as decimal

Note

Returns mean of monthly rates for the year

`read_long_term_interest_rates(country: Country | Region, year: int) -> float` ¶

Get long-term interest rates for a country and year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get interest rates for	required
`year`	`int`	Year to get interest rates for	required

Returns:

Name	Type	Description
`float`	`float`	Long-term interest rate as decimal

Note

Returns mean of monthly rates for the year

`get_bank_demographics(country: Country | Region, year: int, code: str) -> float` ¶

Get bank demographic data for a specific metric.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get data for	required
`year`	`int`	Year to get data for	required
`code`	`str`	Demographic metric code	required

Returns:

Name	Type	Description
`float`	`float`	Value for the requested demographic metric

Note

Handles missing data by finding closest available year

`read_tierone_reserves(country: Country, year: int) -> float` ¶

Get Tier 1 capital reserves ratio for banks.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get reserves for	required
`year`	`int`	Year to get reserves for	required

Returns:

Name	Type	Description
`float`	`float`	Tier 1 capital ratio as decimal

`read_number_of_banks(country: Country, year: int) -> int` ¶

Get number of banks in a country for a year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get bank count for	required
`year`	`int`	Year to get bank count for	required

Returns:

Name	Type	Description
`int`	`int`	Number of banks

`read_number_of_bank_branches(country: Country, year: int) -> int` ¶

Get number of bank branches in a country for a year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get branch count for	required
`year`	`int`	Year to get branch count for	required

Returns:

Name	Type	Description
`int`	`int`	Number of bank branches

`read_number_of_bank_employees(country: Country, year: int) -> int` ¶

Get number of bank employees in a country for a year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get employee count for	required
`year`	`int`	Year to get employee count for	required

Returns:

Name	Type	Description
`int`	`int`	Number of bank employees

`read_bank_distributed_profit(country: Country, year: int) -> float` ¶

Get distributed profit of banks in a country for a year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get profit data for	required
`year`	`int`	Year to get profit data for	required

Returns:

Name	Type	Description
`float`	`float`	Distributed profit amount in LCU (millions)

`read_bank_retained_profit(country: Country, year: int) -> float` ¶

Get retained profit of banks in a country for a year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get profit data for	required
`year`	`int`	Year to get profit data for	required

Returns:

Name	Type	Description
`float`	`float`	Retained profit amount (millions of LCU)

`read_bank_total_assets(country: Country, year: int) -> float` ¶

Get total assets of banks in a country for a year.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get asset data for	required
`year`	`int`	Year to get asset data for	required

Returns:

Name	Type	Description
`float`	`float`	Total bank assets in LCU (millions)

`unemployment_benefits_gdp_pct(country: Country | Region, year: int) -> float` ¶

Get unemployment benefits as percentage of GDP.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get benefits data for	required
`year`	`int`	Year to get benefits data for	required

Returns:

Name	Type	Description
`float`	`float`	Unemployment benefits as decimal percentage of GDP

Note

Returns 0.0 if data not available

`all_benefits_gdp_pct(country: str, year: int, average_oecd: float = 0.212) -> float` ¶

Get total social benefits as percentage of GDP.

Parameters:

Name	Type	Description	Default
`country`	`str`	Country to get benefits data for	required
`year`	`int`	Year to get benefits data for	required
`average_oecd`	`float`	Default OECD average to use if data missing	`0.212`

Returns:

Name	Type	Description
`float`	`float`	Total social benefits as decimal percentage of GDP

Note

Uses OECD average if data not available for country

`general_gov_debt(country: Country, year: int) -> float` ¶

Get general government debt

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get debt data for	required
`year`	`int`	Year to get debt data for	required

Returns:

Name	Type	Description
`float`	`float`	Government debt in LCU (millions)

`get_unemployment_rate(country: str) -> pd.DataFrame` ¶

Get time series of unemployment rates.

Parameters:

Name	Type	Description	Default
`country`	`str`	Country to get unemployment rates for	required

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with dates as index and unemployment rates as values (in decimal form)

Note

Returns quarterly data

`get_consumption_rates_by_income(country: Country) -> pd.DataFrame` ¶

Get consumption rates by income quintile.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get consumption rates for	required

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with income quintiles as columns and consumption rates as values

`get_house_price_index(country: str) -> pd.DataFrame` ¶

Get time series of house price indices.

Parameters:

Name	Type	Description	Default
`country`	`str`	Country to get house price data for	required

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with dates as index and price indices as values

Note

Returns quarterly data, normalized to base year

`get_vacancy_rate(country: Country) -> pd.DataFrame` ¶

Get time series of job vacancy rates.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get vacancy rates for	required

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with dates as index and vacancy rates as values (in decimal form)

Note

Returns quarterly data

`get_household_consumption_by_income_quantile(country: Country, year: int) -> pd.DataFrame` ¶

Get household consumption data by income quantile.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get consumption data for	required
`year`	`int`	Year to get consumption data for	required

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with income quantiles and their corresponding consumption values

`get_govt_debt_usd_ppp(country: Country, year: int) -> float` ¶

Get government debt in USD PPP terms.

Parameters:

Name	Type	Description	Default
`country`	`Country`	Country to get debt data for	required
`year`	`int`	Year to get debt data for	required

Returns:

Name	Type	Description
`float`	`float`	Government debt in USD PPP

`get_inflation(country: str) -> pd.DataFrame` ¶

Get time series of inflation rates.

Parameters:

Name	Type	Description	Default
`country`	`str`	Country to get inflation data for	required

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with dates as index and inflation rates as values (in decimal form)

Note

Returns monthly data based on Producer Price Index (PPI)

`get_na_growth_rates(country: str) -> pd.DataFrame` ¶

Get national accounts growth rates.

Parameters:

Name	Type	Description	Default
`country`	`str`	Country to get growth rates for	required

Returns:

Type	Description
`DataFrame`	pd.DataFrame: DataFrame with dates as index and various national accounts growth rates as columns

Note

Includes GDP, consumption, investment, and other key metrics

`prune(prune_date: date)` ¶

Prune data to only include entries after specified date.

Parameters:

Name	Type	Description	Default
`prune_date`	`date`	Date to prune data from	required

Returns:

Name	Type	Description
`OECDEconData`		Self for method chaining

Note

Modifies data in place
Handles both time period columns and date-based columns
Warns if no data remains after pruning

OECDEconData¶

OECDEconData ¶

scale_dict = scale_dict instance-attribute ¶

industry_mapping = INDUSTRY_MAPPING instance-attribute ¶

sector_mapping = ICIO_AGGREGATE instance-attribute ¶

files_with_codes = self.get_files_with_codes() instance-attribute ¶

data = {key: (pd.read_csv(path / (self.files_with_codes[key] + '.csv')))for key in (self.files_with_codes.keys())} instance-attribute ¶

default_industries = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R_S', 'T'] instance-attribute ¶

__init__(path: Path | str, scale_dict: dict[Country, int]) ¶

get_files_with_codes() -> dict[str, str] staticmethod ¶

employees_by_industry(year: int, country: Country) -> pd.Series ¶

read_business_demography(country: Country | Region, output: pd.Series, year: int) -> np.ndarray ¶

zeta_dist(x: np.ndarray, a: float) -> np.ndarray staticmethod ¶

find_sector_code(code: str) -> int | None ¶

read_firm_size_zetas(country: str | Region, year: int) -> dict[int, np.ndarray] | None ¶

find_closest_year(df: pd.DataFrame, year: int) -> int staticmethod ¶

read_tau_sif(country: Country | str | Region, year: int) -> float ¶

read_tau_siw(country: Country | str | Region, year: int) -> float ¶

read_tau_firm(country: Country | str | Region, year: int) -> float ¶

read_tau_income(country: Country | Region, year: int) -> float ¶

read_short_term_interest_rates(country: Country | Region, year: int) -> float ¶

read_long_term_interest_rates(country: Country | Region, year: int) -> float ¶

get_bank_demographics(country: Country | Region, year: int, code: str) -> float ¶

read_tierone_reserves(country: Country, year: int) -> float ¶

read_number_of_banks(country: Country, year: int) -> int ¶

read_number_of_bank_branches(country: Country, year: int) -> int ¶

read_number_of_bank_employees(country: Country, year: int) -> int ¶

read_bank_distributed_profit(country: Country, year: int) -> float ¶

read_bank_retained_profit(country: Country, year: int) -> float ¶

read_bank_total_assets(country: Country, year: int) -> float ¶

unemployment_benefits_gdp_pct(country: Country | Region, year: int) -> float ¶

all_benefits_gdp_pct(country: str, year: int, average_oecd: float = 0.212) -> float ¶

general_gov_debt(country: Country, year: int) -> float ¶

get_unemployment_rate(country: str) -> pd.DataFrame ¶

get_consumption_rates_by_income(country: Country) -> pd.DataFrame ¶

get_house_price_index(country: str) -> pd.DataFrame ¶

get_vacancy_rate(country: Country) -> pd.DataFrame ¶

get_household_consumption_by_income_quantile(country: Country, year: int) -> pd.DataFrame ¶

get_govt_debt_usd_ppp(country: Country, year: int) -> float ¶

get_inflation(country: str) -> pd.DataFrame ¶

get_na_growth_rates(country: str) -> pd.DataFrame ¶

prune(prune_date: date) ¶

`OECDEconData` ¶

`scale_dict = scale_dict` `instance-attribute` ¶

`industry_mapping = INDUSTRY_MAPPING` `instance-attribute` ¶

`sector_mapping = ICIO_AGGREGATE` `instance-attribute` ¶

`files_with_codes = self.get_files_with_codes()` `instance-attribute` ¶

`data = {key: (pd.read_csv(path / (self.files_with_codes[key] + '.csv')))for key in (self.files_with_codes.keys())}` `instance-attribute` ¶

`default_industries = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R_S', 'T']` `instance-attribute` ¶

`init(path: Path | str, scale_dict: dict[Country, int])` ¶

`get_files_with_codes() -> dict[str, str]` `staticmethod` ¶

`employees_by_industry(year: int, country: Country) -> pd.Series` ¶

`read_business_demography(country: Country | Region, output: pd.Series, year: int) -> np.ndarray` ¶

`zeta_dist(x: np.ndarray, a: float) -> np.ndarray` `staticmethod` ¶

`find_sector_code(code: str) -> int | None` ¶

`read_firm_size_zetas(country: str | Region, year: int) -> dict[int, np.ndarray] | None` ¶

`find_closest_year(df: pd.DataFrame, year: int) -> int` `staticmethod` ¶

`read_tau_sif(country: Country | str | Region, year: int) -> float` ¶

`read_tau_siw(country: Country | str | Region, year: int) -> float` ¶

`read_tau_firm(country: Country | str | Region, year: int) -> float` ¶

`read_tau_income(country: Country | Region, year: int) -> float` ¶

`read_short_term_interest_rates(country: Country | Region, year: int) -> float` ¶

`read_long_term_interest_rates(country: Country | Region, year: int) -> float` ¶

`get_bank_demographics(country: Country | Region, year: int, code: str) -> float` ¶

`read_tierone_reserves(country: Country, year: int) -> float` ¶

`read_number_of_banks(country: Country, year: int) -> int` ¶

`read_number_of_bank_branches(country: Country, year: int) -> int` ¶

`read_number_of_bank_employees(country: Country, year: int) -> int` ¶

`read_bank_distributed_profit(country: Country, year: int) -> float` ¶

`read_bank_retained_profit(country: Country, year: int) -> float` ¶

`read_bank_total_assets(country: Country, year: int) -> float` ¶

`unemployment_benefits_gdp_pct(country: Country | Region, year: int) -> float` ¶

`all_benefits_gdp_pct(country: str, year: int, average_oecd: float = 0.212) -> float` ¶

`general_gov_debt(country: Country, year: int) -> float` ¶

`get_unemployment_rate(country: str) -> pd.DataFrame` ¶

`get_consumption_rates_by_income(country: Country) -> pd.DataFrame` ¶

`get_house_price_index(country: str) -> pd.DataFrame` ¶

`get_vacancy_rate(country: Country) -> pd.DataFrame` ¶

`get_household_consumption_by_income_quantile(country: Country, year: int) -> pd.DataFrame` ¶

`get_govt_debt_usd_ppp(country: Country, year: int) -> float` ¶

`get_inflation(country: str) -> pd.DataFrame` ¶

`get_na_growth_rates(country: str) -> pd.DataFrame` ¶

`prune(prune_date: date)` ¶