Skip to content

OECDEconData

Module for reading and processing OECD economic data.

This module provides comprehensive functionality to read and analyze various economic indicators from the OECD (Organisation for Economic Co-operation and Development). It handles a wide range of data including employment statistics, business demographics, tax rates, banking statistics, and macroeconomic indicators.

Key Features
  • Employment statistics by industry
  • Business demographics and firm size distributions
  • Tax rates (corporate, personal, social insurance)
  • Banking sector statistics
  • Government debt and benefits data
  • Interest rates and inflation
  • Housing market indicators
  • Unemployment and vacancy rates
  • Consumption statistics by income quantile
Example
from pathlib import Path
from macro_data.configuration.countries import Country

# Initialize reader with scaling factors for each country
scale_dict = {Country("USA"): 1000, Country("GBR"): 100}
reader = OECDEconData(
    path=Path("path/to/oecd_data"),
    scale_dict=scale_dict
)

# Get employment by industry for a country and year
usa_employment = reader.employees_by_industry(2020, Country("USA"))
Note
  • Uses standardized OECD data files
  • Handles missing data through proxies and interpolation
  • Supports data pruning for specific date ranges
  • Includes forced values for certain tax rates where OECD data is unavailable

OECDEconData

Reader class for OECD economic data.

This class provides methods to read and process various economic indicators from OECD datasets. It handles data scaling, industry mapping, and provides access to a wide range of economic statistics.

Parameters:

Name Type Description Default
path Path | str

Path to directory containing OECD data files

required
scale_dict dict[Country, int]

Dictionary mapping countries to their scaling factors for employment numbers

required

Attributes:

Name Type Description
scale_dict dict[Country, int]

Country-specific scaling factors

industry_mapping dict

Mapping between OECD and internal industry codes

sector_mapping dict

Mapping for input-output table sectors

data dict[str, DataFrame]

Dictionary of loaded OECD datasets

default_industries list[str]

List of standard industry codes

scale_dict = scale_dict instance-attribute
industry_mapping = INDUSTRY_MAPPING instance-attribute
sector_mapping = ICIO_AGGREGATE instance-attribute
files_with_codes = self.get_files_with_codes() instance-attribute
data = {key: (pd.read_csv(path / (self.files_with_codes[key] + '.csv')))for key in (self.files_with_codes.keys())} instance-attribute
default_industries = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R_S', 'T'] instance-attribute
__init__(path: Path | str, scale_dict: dict[Country, int])

Initialize the OECDEconData reader with data path and scaling factors.

get_files_with_codes() -> dict[str, str] staticmethod

Get mapping of data categories to file names.

Returns:

Type Description
dict[str, str]

dict[str, str]: Dictionary mapping data categories to their file names, including employment, tax, banking, and other economic data

employees_by_industry(year: int, country: Country) -> pd.Series

Get number of employees by industry for a specific country and year.

Parameters:

Name Type Description Default
year int

Year to get employment data for

required
country Country

Country to get employment data for

required

Returns:

Type Description
Series

pd.Series: Series containing number of employees for each industry, scaled by country-specific factor

Note
  • Uses default industry classification (A through T)
  • Handles missing data by using median values
  • Returns scaled values based on scale_dict
read_business_demography(country: Country | Region, output: pd.Series, year: int) -> np.ndarray

Read business demography data for active employer enterprises.

Parameters:

Name Type Description Default
country Country

Country to get data for

required
output Series

Total output of each country by industry

required
year int

Year to get data for

required

Returns:

Type Description
ndarray

np.ndarray: Number of active employer enterprises by industry

Note
  • Special handling for GBR (2014-2018 only)
  • Special handling for DEU (2012 onwards)
  • Special handling for AUS (2010-2014)
zeta_dist(x: np.ndarray, a: float) -> np.ndarray staticmethod

Calculate normalized Zeta distribution values.

Parameters:

Name Type Description Default
x ndarray

Input values (firm sizes)

required
a float

Shape parameter of the Zeta distribution

required

Returns:

Type Description
ndarray

np.ndarray: Normalized probability values from Zeta distribution

Note

Uses Riemann zeta function minus 1 (zetac) for calculations

find_sector_code(code: str) -> int | None

Find internal sector code from OECD industry code.

Parameters:

Name Type Description Default
code str

OECD industry code

required

Returns:

Name Type Description
int int | None

Internal sector code

Note

Uses industry_mapping dictionary for conversion

read_firm_size_zetas(country: str | Region, year: int) -> dict[int, np.ndarray] | None

Calculate Zeta distribution parameters for firm sizes by sector.

Parameters:

Name Type Description Default
country str

Country to get firm size distribution for

required
year int

Year to get firm size distribution for

required

Returns:

Type Description
dict[int, ndarray] | None

dict[int, np.ndarray] | None: Dictionary mapping sector indices to their Zeta distribution parameters, or None if data not available

Note
  • Fits Zeta distributions to empirical firm size distributions
  • Returns None if insufficient data for fitting
  • Handles different size classes and industry codes
find_closest_year(df: pd.DataFrame, year: int) -> int staticmethod

Find the closest available year in DataFrame to target year.

Parameters:

Name Type Description Default
df DataFrame

DataFrame containing time series data

required
year int

Target year

required

Returns:

Name Type Description
int int

Closest available year to target year

read_tau_sif(country: Country | str | Region, year: int) -> float

Get social insurance tax rate (firm contribution) for a country and year.

Parameters:

Name Type Description Default
country Country | str

Country to get tax rate for

required
year int

Year to get tax rate for

required

Returns:

Name Type Description
float float

Social insurance tax rate as decimal

Note

Uses force_tau_sif values for countries with missing or unreliable data

read_tau_siw(country: Country | str | Region, year: int) -> float

Get social insurance tax rate (worker contribution) for a country and year.

Parameters:

Name Type Description Default
country Country | str

Country to get tax rate for

required
year int

Year to get tax rate for

required

Returns:

Name Type Description
float float

Social insurance tax rate as decimal

Note

Uses force_tau_siw values for countries with missing or unreliable data

read_tau_firm(country: Country | str | Region, year: int) -> float

Get corporate tax rate for a country and year.

Parameters:

Name Type Description Default
country Country | str

Country to get tax rate for

required
year int

Year to get tax rate for

required

Returns:

Name Type Description
float float

Corporate tax rate as decimal

Note

Uses force_tau_firm values for countries with missing or unreliable data

read_tau_income(country: Country | Region, year: int) -> float

Get personal income tax rate for a country and year.

Parameters:

Name Type Description Default
country Country

Country to get tax rate for

required
year int

Year to get tax rate for

required

Returns:

Name Type Description
float float

Personal income tax rate as decimal

Note

Uses force_tau_income values for countries with missing or unreliable data

read_short_term_interest_rates(country: Country | Region, year: int) -> float

Get short-term interest rates for a country and year.

Parameters:

Name Type Description Default
country Country

Country to get interest rates for

required
year int

Year to get interest rates for

required

Returns:

Name Type Description
float float

Short-term interest rate as decimal

Note

Returns mean of monthly rates for the year

read_long_term_interest_rates(country: Country | Region, year: int) -> float

Get long-term interest rates for a country and year.

Parameters:

Name Type Description Default
country Country

Country to get interest rates for

required
year int

Year to get interest rates for

required

Returns:

Name Type Description
float float

Long-term interest rate as decimal

Note

Returns mean of monthly rates for the year

get_bank_demographics(country: Country | Region, year: int, code: str) -> float

Get bank demographic data for a specific metric.

Parameters:

Name Type Description Default
country Country

Country to get data for

required
year int

Year to get data for

required
code str

Demographic metric code

required

Returns:

Name Type Description
float float

Value for the requested demographic metric

Note

Handles missing data by finding closest available year

read_tierone_reserves(country: Country, year: int) -> float

Get Tier 1 capital reserves ratio for banks.

Parameters:

Name Type Description Default
country Country

Country to get reserves for

required
year int

Year to get reserves for

required

Returns:

Name Type Description
float float

Tier 1 capital ratio as decimal

read_number_of_banks(country: Country, year: int) -> int

Get number of banks in a country for a year.

Parameters:

Name Type Description Default
country Country

Country to get bank count for

required
year int

Year to get bank count for

required

Returns:

Name Type Description
int int

Number of banks

read_number_of_bank_branches(country: Country, year: int) -> int

Get number of bank branches in a country for a year.

Parameters:

Name Type Description Default
country Country

Country to get branch count for

required
year int

Year to get branch count for

required

Returns:

Name Type Description
int int

Number of bank branches

read_number_of_bank_employees(country: Country, year: int) -> int

Get number of bank employees in a country for a year.

Parameters:

Name Type Description Default
country Country

Country to get employee count for

required
year int

Year to get employee count for

required

Returns:

Name Type Description
int int

Number of bank employees

read_bank_distributed_profit(country: Country, year: int) -> float

Get distributed profit of banks in a country for a year.

Parameters:

Name Type Description Default
country Country

Country to get profit data for

required
year int

Year to get profit data for

required

Returns:

Name Type Description
float float

Distributed profit amount in LCU (millions)

read_bank_retained_profit(country: Country, year: int) -> float

Get retained profit of banks in a country for a year.

Parameters:

Name Type Description Default
country Country

Country to get profit data for

required
year int

Year to get profit data for

required

Returns:

Name Type Description
float float

Retained profit amount (millions of LCU)

read_bank_total_assets(country: Country, year: int) -> float

Get total assets of banks in a country for a year.

Parameters:

Name Type Description Default
country Country

Country to get asset data for

required
year int

Year to get asset data for

required

Returns:

Name Type Description
float float

Total bank assets in LCU (millions)

unemployment_benefits_gdp_pct(country: Country | Region, year: int) -> float

Get unemployment benefits as percentage of GDP.

Parameters:

Name Type Description Default
country Country

Country to get benefits data for

required
year int

Year to get benefits data for

required

Returns:

Name Type Description
float float

Unemployment benefits as decimal percentage of GDP

Note

Returns 0.0 if data not available

all_benefits_gdp_pct(country: str, year: int, average_oecd: float = 0.212) -> float

Get total social benefits as percentage of GDP.

Parameters:

Name Type Description Default
country str

Country to get benefits data for

required
year int

Year to get benefits data for

required
average_oecd float

Default OECD average to use if data missing

0.212

Returns:

Name Type Description
float float

Total social benefits as decimal percentage of GDP

Note

Uses OECD average if data not available for country

general_gov_debt(country: Country, year: int) -> float

Get general government debt

Parameters:

Name Type Description Default
country Country

Country to get debt data for

required
year int

Year to get debt data for

required

Returns:

Name Type Description
float float

Government debt in LCU (millions)

get_unemployment_rate(country: str) -> pd.DataFrame

Get time series of unemployment rates.

Parameters:

Name Type Description Default
country str

Country to get unemployment rates for

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with dates as index and unemployment rates as values (in decimal form)

Note

Returns quarterly data

get_consumption_rates_by_income(country: Country) -> pd.DataFrame

Get consumption rates by income quintile.

Parameters:

Name Type Description Default
country Country

Country to get consumption rates for

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with income quintiles as columns and consumption rates as values

get_house_price_index(country: str) -> pd.DataFrame

Get time series of house price indices.

Parameters:

Name Type Description Default
country str

Country to get house price data for

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with dates as index and price indices as values

Note

Returns quarterly data, normalized to base year

get_vacancy_rate(country: Country) -> pd.DataFrame

Get time series of job vacancy rates.

Parameters:

Name Type Description Default
country Country

Country to get vacancy rates for

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with dates as index and vacancy rates as values (in decimal form)

Note

Returns quarterly data

get_household_consumption_by_income_quantile(country: Country, year: int) -> pd.DataFrame

Get household consumption data by income quantile.

Parameters:

Name Type Description Default
country Country

Country to get consumption data for

required
year int

Year to get consumption data for

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with income quantiles and their corresponding consumption values

get_govt_debt_usd_ppp(country: Country, year: int) -> float

Get government debt in USD PPP terms.

Parameters:

Name Type Description Default
country Country

Country to get debt data for

required
year int

Year to get debt data for

required

Returns:

Name Type Description
float float

Government debt in USD PPP

get_inflation(country: str) -> pd.DataFrame

Get time series of inflation rates.

Parameters:

Name Type Description Default
country str

Country to get inflation data for

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with dates as index and inflation rates as values (in decimal form)

Note

Returns monthly data based on Producer Price Index (PPI)

get_na_growth_rates(country: str) -> pd.DataFrame

Get national accounts growth rates.

Parameters:

Name Type Description Default
country str

Country to get growth rates for

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame with dates as index and various national accounts growth rates as columns

Note

Includes GDP, consumption, investment, and other key metrics

prune(prune_date: date)

Prune data to only include entries after specified date.

Parameters:

Name Type Description Default
prune_date date

Date to prune data from

required

Returns:

Name Type Description
OECDEconData

Self for method chaining

Note
  • Modifies data in place
  • Handles both time period columns and date-based columns
  • Warns if no data remains after pruning