Skip to content

SyntheticPopulation

The SyntheticPopulation module provides data structures and utilities for preprocessing and organizing population data that will be used to initialize behavioral models in the simulation package.

SyntheticPopulation

The SyntheticPopulation class is an abstract base class that provides a framework for collecting and organizing population data. It is not used for simulating population behavior - it only handles data preprocessing.

Key Features

  • Household and individual data management
  • Income and wealth computation
  • Consumption and investment patterns
  • Labor market integration
  • Social transfer processing
  • Data validation and cleaning

Attributes

  • country_name (str): Country identifier
  • country_name_short (str): Short country code
  • scale (int): Population scaling factor
  • year (int): Reference year
  • industries (list[str]): List of industries
  • individual_data (pd.DataFrame): Individual-level data containing:
  • Demographics (age, gender, education)
  • Employment status and industry
  • Income sources
  • Household and firm associations
  • household_data (pd.DataFrame): Household-level data containing:
  • Household composition
  • Income and wealth components
  • Housing tenure and property
  • Financial assets and debt
  • Consumption patterns

Abstract Methods

compute_household_income

@abstractmethod
def compute_household_income(
    self,
    total_social_transfers: float,
    independents: Optional[list[str]] = None
) -> None

Computes household income from all sources:

  • Employee income
  • Social transfers
  • Rental income
  • Financial asset returns

compute_household_wealth

@abstractmethod
def compute_household_wealth(
    self,
    independents: Optional[list[str]] = None
) -> None

Computes household wealth components:

  • Real assets (property, vehicles, businesses)
  • Financial assets (deposits, investments)
  • Debt obligations
  • Net wealth position

set_debt_installments

@abstractmethod
def set_debt_installments(
    self,
    consumption_installments: np.ndarray,
    ce_installments: np.ndarray,
    mortgage_installments: np.ndarray
) -> None

Sets household debt payment schedules:

  • Consumption loan payments
  • Consumer electronics installments
  • Mortgage payments

set_household_saving_rates

@abstractmethod
def set_household_saving_rates(
    self,
    independents: Optional[list[str]] = None
) -> None

Computes household saving rates based on:

  • Income levels
  • Wealth position
  • Household characteristics

SyntheticHFCSPopulation

The SyntheticHFCSPopulation class is a concrete implementation that preprocesses population data using the Household Finance and Consumption Survey (HFCS) as its primary data source.

Key Features

  • HFCS data integration
  • Household sampling and scaling
  • Industry employment allocation
  • Wealth and income modeling
  • Consumption pattern estimation

Factory Methods

from_readers

@classmethod
def from_readers(
    cls,
    readers: DataReaders,
    country_name: Country,
    country_name_short: str,
    scale: int,
    year: int,
    quarter: int,
    industry_data: dict[str, pd.DataFrame],
    industries: list[str],
    total_unemployment_benefits: float,
    exogenous_data: ExogenousCountryData,
    rent_as_fraction_of_unemployment_rate: float = 0.25,
    n_quantiles: int = 5,
    population_ratio: float = 1.0,
    exch_rate: float = 1.0,
    proxied_country: str | Country = None,
    yearly_factor: float = 4.0
) -> "SyntheticHFCSPopulation"

Creates a synthetic population using HFCS data and additional sources.

Parameters:

  • readers (DataReaders): Data source readers
  • country_name (Country): Target country
  • country_name_short (str): Country code
  • scale (int): Population scaling factor
  • year (int): Reference year
  • quarter (int): Reference quarter
  • industry_data (dict): Industry-level data
  • industries (list[str]): Target industries
  • total_unemployment_benefits (float): Total benefits to distribute
  • exogenous_data (ExogenousCountryData): External economic data
  • rent_as_fraction_of_unemployment_rate (float): Rent parameter
  • n_quantiles (int): Income quantiles for analysis
  • population_ratio (float): Population scaling ratio
  • exch_rate (float): Exchange rate for currency conversion
  • proxied_country (str|Country): Proxy country for missing data
  • yearly_factor (float): Annual to sub-annual conversion factor

Returns:

  • SyntheticHFCSPopulation: Configured population instance

Usage Example

from macro_data import DataReaders, ExogenousCountryData
from macro_data.processing.synthetic_population import SyntheticHFCSPopulation

# Initialize data readers and configuration
readers = DataReaders.from_raw_data(...)
exogenous_data = ExogenousCountryData(...)
industry_data = {...}

# Create synthetic population for France in 2023 Q1
france_population = SyntheticHFCSPopulation.from_readers(
    country_name="FRA",
    country_name_short="FR",
    scale=1000,
    year=2023,
    quarter=1,
    readers=readers,
    industry_data=industry_data,
    industries=["C10T12", "C13T15"],
    total_unemployment_benefits=1e9,
    exogenous_data=exogenous_data
)

# Compute household wealth and income
france_population.compute_household_wealth()
france_population.compute_household_income(total_social_transfers=5e8)

Module for managing synthetic population data in macroeconomic simulations.

This module provides the abstract base class for synthetic population generation and management. It defines the core data structures and interfaces for representing households and individuals in a macroeconomic simulation, with a focus on:

  1. Population Structure:
  2. Household composition and relationships
  3. Individual demographics and employment
  4. Financial status and wealth distribution
  5. Housing tenure and property ownership

  6. Economic Attributes:

  7. Income sources (employment, transfers, assets)
  8. Wealth composition (real assets, financial assets)
  9. Debt obligations and credit relationships
  10. Consumption and saving patterns

  11. Data Management:

  12. Data validation and cleaning
  13. Missing value imputation
  14. Statistical modeling of relationships
  15. Scale factor adjustments

  16. Market Relationships:

  17. Employment connections with firms
  18. Banking relationships
  19. Housing market participation
  20. Investment behavior

The module supports both EU and non-EU country data, with capabilities for: - Data harmonization across sources - Consistent initialization of economic relationships - Preservation of aggregate economic constraints - Environmental impact tracking

Note

This module focuses on preprocessing and organizing population data for initialization. The actual behavioral dynamics are implemented in the simulation package.

Example
from macro_data.processing.synthetic_population import SyntheticPopulation

class CustomPopulation(SyntheticPopulation):
    def __init__(self, country_name, scale, ...):
        super().__init__(...)

    def compute_household_income(self, total_social_transfers):
        # Custom income computation logic
        pass

    def compute_household_wealth(self):
        # Custom wealth computation logic
        pass

SyntheticPopulation

Represents a synthetic population for a specific country and year.

The household data is a pandas data frame with the following columns
  • Type: The type of the household (1: single, 2: couple, 3: single parent, 4: couple with children).
  • Corresponding Individuals ID: The IDs of the individuals in the household.
  • Corresponding Bank ID: The ID of the bank the household is associated with.
  • Corresponding Inhabited House ID: The ID of the house the household inhabits.
  • Corresponding Renters: The IDs of the individuals in the household who rent.
  • Corresponding Property Owner: The IDs of the individuals in the household who own property.
  • Corresponding Additionally Owned Houses ID: The IDs of the houses the household owns.
  • Income: The total income of the household.
  • Employee Income: The income of the household from employment.
  • Regular Social Transfers: The income of the household from social transfers.
  • Rental Income from Real Estate: The income of the household from rental of real estate.
  • Income from Financial Assets: The income of the household from financial assets.
  • Saving Rate: The saving rate of the household.
  • Rent Paid: The rent paid by the household.
  • Rent Imputed: The imputed rent of the household.
  • Wealth: The total wealth of the household.
  • Net Wealth: The net wealth of the household.
  • Wealth in Real Assets: The wealth of the household in real assets.
  • Value of the Main Residence: The value of the main residence of the household.
  • Value of other Properties: The value of other properties of the household.
  • Wealth Other Real Assets: The wealth of the household in other real assets.
  • Wealth in Deposits: The wealth of the household in deposits.
  • Wealth in Other Financial Assets: The wealth of the household in other financial assets.
  • Wealth in Financial Assets: The wealth of the household in financial assets.
  • Outstanding Balance of HMR Mortgages: The outstanding balance of the household's HMR mortgages.
  • Outstanding Balance of Mortgages on other Properties: The outstanding balance of the household's mortgages on other properties.
  • Outstanding Balance of other Non-Mortgage Loans: The outstanding balance of the household's other non-mortgage loans.
  • Debt: The total debt of the household.
  • Debt Installments: The debt installments of the household (monthly payments of debt).
  • Tenure Status of the Main Residence: The tenure status of the main residence of the household.
  • Number of Properties other than Household Main Residence: The number of properties other than the household's main residence.
The individual data is a pandas data frame with the following columns
  • Gender: The gender of the individual (1: male, 2: female)
  • Age: The age of the individual.
  • Education: The education level of the individual (ISCED classification).
  • Activity Status: The activity status of the individual (1: employed, 2: unemployed, 3: not economically active).
  • Employment Industry: The industry of the individual's employment.
  • Employee Income: The income of the individual from employment.
  • Income from Unemployment Benefits: The income of the individual from unemployment benefits.
  • Income: The total income of the individual.
  • Corresponding Household ID: The ID of the household the individual belongs to.
  • Corresponding Firm ID: The ID of the firm the individual works for.

Attributes:

Name Type Description
country_name str

The name of the country.

country_name_short str

The short name or code of the country.

scale int

The scale of the synthetic population.

year int

The year of the synthetic population.

industries list[str]

The list of industries in the country.

individual_data DataFrame

The data frame containing individual-level data.

household_data DataFrame

The data frame containing household-level data.

social_housing_rent float

The rent for social housing.

coefficient_fa_income float

The coefficient for family allowance income.

consumption_weights ndarray

The weights for household consumption.

consumption_weights_by_income ndarray

The weights for household consumption based on income.

saving_rates_model LinearRegression

The model for household saving rates.

social_transfers_model LinearRegression

The model for social transfers.

wealth_distribution_model LinearRegression

The model for wealth distribution.

country_name = country_name instance-attribute
country_name_short = country_name_short instance-attribute
scale = scale instance-attribute
year = year instance-attribute
industries = industries instance-attribute
individual_data = individual_data instance-attribute
household_data = household_data instance-attribute
social_housing_rent = social_housing_rent instance-attribute
coefficient_fa_income = coefficient_fa_income instance-attribute
consumption_weights = consumption_weights instance-attribute
consumption_weights_by_income = consumption_weights_by_income instance-attribute
investment = investment instance-attribute
saving_rates_model = saving_rates_model instance-attribute
social_transfers_model = social_transfers_model instance-attribute
wealth_distribution_model = wealth_distribution_model instance-attribute
yearly_factor = yearly_factor instance-attribute
industry_consumption_before_vat property

Calculate household consumption by industry before VAT.

This property computes the pre-tax consumption allocation across industries based on household income and consumption weights.

Returns:

Type Description

np.ndarray: Matrix of household consumption by industry before VAT

investment_weights: np.ndarray property

Calculate normalized investment weights by industry.

This property computes the share of investment allocated to each industry, ensuring the weights sum to 1.

Returns:

Type Description
ndarray

np.ndarray: Normalized investment weights by industry

number_of_households: int property

Get the total number of households.

Returns:

Name Type Description
int int

Number of rows in household_data

number_employees_by_industry: np.ndarray property

Calculate the number of employed individuals by industry.

Returns:

Type Description
ndarray

np.ndarray: Array of employee counts for each industry

total_emissions: float property

Calculate total household emissions.

Returns:

Name Type Description
float float

Sum of consumption and investment emissions

__init__(country_name: str, country_name_short: str, scale: int, year: int, industries: list[str], individual_data: pd.DataFrame, household_data: pd.DataFrame, social_housing_rent: float, coefficient_fa_income: float, consumption_weights: np.ndarray, consumption_weights_by_income: np.ndarray, investment: np.ndarray, saving_rates_model: LinearRegression, social_transfers_model: LinearRegression, wealth_distribution_model: LinearRegression, yearly_factor: float = 4.0) abstractmethod
set_individual_labour_inputs(firm_production: np.ndarray, firm_employees: pd.DataFrame, unemployment_labour_inputs_fraction: float = 0.3, override: bool = True) -> None

Set individual labor input values based on employment status and firm production.

This method assigns labor input values to individuals based on their employment status: 1. Employed: Inputs proportional to income within their firm 2. Unemployed: Fraction of industry mean inputs 3. Inactive: Zero inputs

Parameters:

Name Type Description Default
firm_production ndarray

Production values for each firm

required
firm_employees DataFrame

Mapping of employees to firms

required
unemployment_labour_inputs_fraction float

Fraction of mean industry inputs for unemployed. Defaults to 0.3.

0.3
override bool

Whether to override existing values with uniform inputs. Defaults to True.

True
compute_household_income(total_social_transfers: float, independents: Optional[list[str]] = None) -> None abstractmethod

Compute and update household income from all sources.

This method should: 1. Calculate employee income from individual data 2. Process social transfers based on household characteristics 3. Include rental income from real estate 4. Add income from financial assets 5. Update the household_data DataFrame with results

Parameters:

Name Type Description Default
total_social_transfers float

Total social transfers to be distributed across households based on their characteristics.

required
independents Optional[list[str]]

List of independent variables to use in social transfer allocation models. Defaults to None.

None
set_consumption_weights(consumption_weights: np.ndarray) -> None

Set the consumption weights for household expenditure allocation.

Parameters:

Name Type Description Default
consumption_weights ndarray

New consumption weights by industry

required
set_debt_installments(consumption_installments: np.ndarray, ce_installments: np.ndarray, mortgage_installments: np.ndarray) -> None abstractmethod

Set household debt installment payments.

This method should: 1. Process consumption loan payments 2. Handle consumer electronics installments 3. Account for mortgage payments 4. Update total debt service in household_data 5. Ensure payment consistency with loan balances

Parameters:

Name Type Description Default
consumption_installments ndarray

Monthly payments for consumption loans

required
ce_installments ndarray

Monthly payments for consumer electronics

required
mortgage_installments ndarray

Monthly payments for mortgages

required
set_household_saving_rates(independents: Optional[list[str]] = None) -> None abstractmethod

Compute and set household saving rates.

This method should: 1. Process consumption share data 2. Handle missing values through imputation 3. Fit saving rate models using household characteristics 4. Ensure rates are economically reasonable 5. Update the household_data DataFrame

Parameters:

Name Type Description Default
independents Optional[list[str]]

List of independent variables to use in saving rate models. Defaults to None.

None
compute_household_wealth(independents: Optional[list[str]] = None) -> None abstractmethod

Compute and update household wealth components.

This method should: 1. Calculate real asset wealth (property, vehicles, businesses) 2. Process financial asset wealth (deposits, investments) 3. Account for all debt types (mortgages, loans) 4. Compute net wealth positions 5. Update wealth distribution models

Parameters:

Name Type Description Default
independents Optional[list[str]]

List of independent variables to use in wealth distribution models. Defaults to None.

None
set_income() -> None

Set total individual income by combining employment and unemployment benefits.

This method: 1. Fills missing values with zeros 2. Combines employee income and unemployment benefits 3. Updates the Income column in individual_data

restrict() -> None abstractmethod

Restrict household data to essential columns.

This method should: 1. Filter household_data to RESTRICT_COLS 2. Ensure data consistency after restriction 3. Preserve key relationships 4. Handle any missing required columns

normalise_household_consumption(iot_hh_consumption: np.ndarray | pd.Series, vat: float, positive_saving_rates_only: bool = True, independents: Optional[list[str]] = None) -> None abstractmethod

Normalize household consumption to match aggregate targets.

This method should: 1. Scale consumption to match IOT totals 2. Account for VAT in consumption values 3. Maintain reasonable saving rates 4. Preserve consumption patterns by income group 5. Update household consumption shares

Parameters:

Name Type Description Default
iot_hh_consumption ndarray | Series

Target household consumption from IOT

required
vat float

Value-added tax rate

required
positive_saving_rates_only bool

Whether to enforce positive saving rates. Defaults to True.

True
independents Optional[list[str]]

Independent variables for consumption models. Defaults to None.

None
set_household_investment_rates(capital_formation_taxrate: float, default_investment_rates: np.ndarray | float = 0.2) -> None

Initialize household investment rates.

This method sets initial investment rates for households, which can be later adjusted through normalization to match aggregate targets.

Parameters:

Name Type Description Default
capital_formation_taxrate float

Tax rate on capital formation

required
default_investment_rates ndarray | float

Initial investment rates. Defaults to 0.2.

0.2
normalise_household_investment(tau_cf: float, iot_hh_investment: np.ndarray | pd.Series, positive_investment_rates: bool = True) -> None
get_current_hh_investment_by_industry(tau_cf: float) -> np.ndarray

Calculate current household investment by industry.

This method computes the current investment allocation across industries based on household income, investment rates, and industry weights.

Parameters:

Name Type Description Default
tau_cf float

Capital formation tax rate

required

Returns:

Type Description
ndarray

np.ndarray: Current investment values by industry

match_consumption_weights_by_income(weights_by_income: np.ndarray | pd.DataFrame, iot_hh_consumption: pd.Series, vat: float, consumption_variance: float = 0.1) -> None
set_wealth_distribution_function(independents: Optional[list[str]] = None) -> None
add_emissions(emission_factors_array: np.ndarray, emitting_indices: list[int] | np.ndarray, tau_cf: float) -> None

Calculate and add emissions data to household records.

This method computes emissions from: 1. Household consumption 2. Investment activities 3. Specific fuel types (coal, gas, oil, refined products)

Parameters:

Name Type Description Default
emission_factors_array ndarray

Emission factors by source

required
emitting_indices list[int] | ndarray

Indices of emitting sectors

required
tau_cf float

Capital formation tax rate

required