SyntheticPopulation¶

The SyntheticPopulation module provides data structures and utilities for preprocessing and organizing population data that will be used to initialize behavioral models in the simulation package.

SyntheticPopulation¶

The SyntheticPopulation class is an abstract base class that provides a framework for collecting and organizing population data. It is not used for simulating population behavior - it only handles data preprocessing.

Key Features¶

Household and individual data management
Income and wealth computation
Consumption and investment patterns
Labor market integration
Social transfer processing
Data validation and cleaning

Attributes¶

country_name (str): Country identifier
country_name_short (str): Short country code
scale (int): Population scaling factor
year (int): Reference year
industries (list[str]): List of industries
individual_data (pd.DataFrame): Individual-level data containing:
Demographics (age, gender, education)
Employment status and industry
Income sources
Household and firm associations
household_data (pd.DataFrame): Household-level data containing:
Household composition
Income and wealth components
Housing tenure and property
Financial assets and debt
Consumption patterns

Abstract Methods¶

compute_household_income¶

@abstractmethod
def compute_household_income(
    self,
    total_social_transfers: float,
    independents: Optional[list[str]] = None
) -> None

Computes household income from all sources:

Employee income
Social transfers
Rental income
Financial asset returns

compute_household_wealth¶

@abstractmethod
def compute_household_wealth(
    self,
    independents: Optional[list[str]] = None
) -> None

Computes household wealth components:

Real assets (property, vehicles, businesses)
Financial assets (deposits, investments)
Debt obligations
Net wealth position

set_debt_installments¶

@abstractmethod
def set_debt_installments(
    self,
    consumption_installments: np.ndarray,
    ce_installments: np.ndarray,
    mortgage_installments: np.ndarray
) -> None

Sets household debt payment schedules:

Consumption loan payments
Consumer electronics installments
Mortgage payments

set_household_saving_rates¶

@abstractmethod
def set_household_saving_rates(
    self,
    independents: Optional[list[str]] = None
) -> None

Computes household saving rates based on:

Income levels
Wealth position
Household characteristics

SyntheticHFCSPopulation¶

The SyntheticHFCSPopulation class is a concrete implementation that preprocesses population data using the Household Finance and Consumption Survey (HFCS) as its primary data source.

Key Features¶

HFCS data integration
Household sampling and scaling
Industry employment allocation
Wealth and income modeling
Consumption pattern estimation

Factory Methods¶

from_readers¶

@classmethod
def from_readers(
    cls,
    readers: DataReaders,
    country_name: Country,
    country_name_short: str,
    scale: int,
    year: int,
    quarter: int,
    industry_data: dict[str, pd.DataFrame],
    industries: list[str],
    total_unemployment_benefits: float,
    exogenous_data: ExogenousCountryData,
    rent_as_fraction_of_unemployment_rate: float = 0.25,
    n_quantiles: int = 5,
    population_ratio: float = 1.0,
    exch_rate: float = 1.0,
    proxied_country: str | Country = None,
    yearly_factor: float = 4.0
) -> "SyntheticHFCSPopulation"

Creates a synthetic population using HFCS data and additional sources.

Parameters:

readers (DataReaders): Data source readers
country_name (Country): Target country
country_name_short (str): Country code
scale (int): Population scaling factor
year (int): Reference year
quarter (int): Reference quarter
industry_data (dict): Industry-level data
industries (list[str]): Target industries
total_unemployment_benefits (float): Total benefits to distribute
exogenous_data (ExogenousCountryData): External economic data
rent_as_fraction_of_unemployment_rate (float): Rent parameter
n_quantiles (int): Income quantiles for analysis
population_ratio (float): Population scaling ratio
exch_rate (float): Exchange rate for currency conversion
proxied_country (str|Country): Proxy country for missing data
yearly_factor (float): Annual to sub-annual conversion factor

Returns:

SyntheticHFCSPopulation: Configured population instance

Usage Example¶

from macro_data import DataReaders, ExogenousCountryData
from macro_data.processing.synthetic_population import SyntheticHFCSPopulation

# Initialize data readers and configuration
readers = DataReaders.from_raw_data(...)
exogenous_data = ExogenousCountryData(...)
industry_data = {...}

# Create synthetic population for France in 2023 Q1
france_population = SyntheticHFCSPopulation.from_readers(
    country_name="FRA",
    country_name_short="FR",
    scale=1000,
    year=2023,
    quarter=1,
    readers=readers,
    industry_data=industry_data,
    industries=["C10T12", "C13T15"],
    total_unemployment_benefits=1e9,
    exogenous_data=exogenous_data
)

# Compute household wealth and income
france_population.compute_household_wealth()
france_population.compute_household_income(total_social_transfers=5e8)

Module for managing synthetic population data in macroeconomic simulations.

This module provides the abstract base class for synthetic population generation and management. It defines the core data structures and interfaces for representing households and individuals in a macroeconomic simulation, with a focus on:

Population Structure:
Household composition and relationships
Individual demographics and employment
Financial status and wealth distribution
Housing tenure and property ownership
Economic Attributes:
Income sources (employment, transfers, assets)
Wealth composition (real assets, financial assets)
Debt obligations and credit relationships
Consumption and saving patterns
Data Management:
Data validation and cleaning
Missing value imputation
Statistical modeling of relationships
Scale factor adjustments
Market Relationships:
Employment connections with firms
Banking relationships
Housing market participation
Investment behavior

The module supports both EU and non-EU country data, with capabilities for: - Data harmonization across sources - Consistent initialization of economic relationships - Preservation of aggregate economic constraints - Environmental impact tracking

Note

This module focuses on preprocessing and organizing population data for initialization. The actual behavioral dynamics are implemented in the simulation package.

Example

from macro_data.processing.synthetic_population import SyntheticPopulation

class CustomPopulation(SyntheticPopulation):
    def __init__(self, country_name, scale, ...):
        super().__init__(...)

    def compute_household_income(self, total_social_transfers):
        # Custom income computation logic
        pass

    def compute_household_wealth(self):
        # Custom wealth computation logic
        pass

`SyntheticPopulation` ¶

Represents a synthetic population for a specific country and year.

The household data is a pandas data frame with the following columns

Type: The type of the household (1: single, 2: couple, 3: single parent, 4: couple with children).
Corresponding Individuals ID: The IDs of the individuals in the household.
Corresponding Bank ID: The ID of the bank the household is associated with.
Corresponding Inhabited House ID: The ID of the house the household inhabits.
Corresponding Renters: The IDs of the individuals in the household who rent.
Corresponding Property Owner: The IDs of the individuals in the household who own property.
Corresponding Additionally Owned Houses ID: The IDs of the houses the household owns.
Income: The total income of the household.
Employee Income: The income of the household from employment.
Regular Social Transfers: The income of the household from social transfers.
Rental Income from Real Estate: The income of the household from rental of real estate.
Income from Financial Assets: The income of the household from financial assets.
Saving Rate: The saving rate of the household.
Rent Paid: The rent paid by the household.
Rent Imputed: The imputed rent of the household.
Wealth: The total wealth of the household.
Net Wealth: The net wealth of the household.
Wealth in Real Assets: The wealth of the household in real assets.
Value of the Main Residence: The value of the main residence of the household.
Value of other Properties: The value of other properties of the household.
Wealth Other Real Assets: The wealth of the household in other real assets.
Wealth in Deposits: The wealth of the household in deposits.
Wealth in Other Financial Assets: The wealth of the household in other financial assets.
Wealth in Financial Assets: The wealth of the household in financial assets.
Outstanding Balance of HMR Mortgages: The outstanding balance of the household's HMR mortgages.
Outstanding Balance of Mortgages on other Properties: The outstanding balance of the household's mortgages on other properties.
Outstanding Balance of other Non-Mortgage Loans: The outstanding balance of the household's other non-mortgage loans.
Debt: The total debt of the household.
Debt Installments: The debt installments of the household (monthly payments of debt).
Tenure Status of the Main Residence: The tenure status of the main residence of the household.
Number of Properties other than Household Main Residence: The number of properties other than the household's main residence.

The individual data is a pandas data frame with the following columns

Gender: The gender of the individual (1: male, 2: female)
Age: The age of the individual.
Education: The education level of the individual (ISCED classification).
Activity Status: The activity status of the individual (1: employed, 2: unemployed, 3: not economically active).
Employment Industry: The industry of the individual's employment.
Employee Income: The income of the individual from employment.
Income from Unemployment Benefits: The income of the individual from unemployment benefits.
Income: The total income of the individual.
Corresponding Household ID: The ID of the household the individual belongs to.
Corresponding Firm ID: The ID of the firm the individual works for.

Attributes:

Name	Type	Description
`country_name`	`str`	The name of the country.
`country_name_short`	`str`	The short name or code of the country.
`scale`	`int`	The scale of the synthetic population.
`year`	`int`	The year of the synthetic population.
`industries`	`list[str]`	The list of industries in the country.
`individual_data`	`DataFrame`	The data frame containing individual-level data.
`household_data`	`DataFrame`	The data frame containing household-level data.
`social_housing_rent`	`float`	The rent for social housing.
`coefficient_fa_income`	`float`	The coefficient for family allowance income.
`consumption_weights`	`ndarray`	The weights for household consumption.
`consumption_weights_by_income`	`ndarray`	The weights for household consumption based on income.
`saving_rates_model`	`LinearRegression`	The model for household saving rates.
`social_transfers_model`	`LinearRegression`	The model for social transfers.
`wealth_distribution_model`	`LinearRegression`	The model for wealth distribution.

`country_name = country_name` `instance-attribute` ¶

`country_name_short = country_name_short` `instance-attribute` ¶

`scale = scale` `instance-attribute` ¶

`year = year` `instance-attribute` ¶

`industries = industries` `instance-attribute` ¶

`individual_data = individual_data` `instance-attribute` ¶

`household_data = household_data` `instance-attribute` ¶

`social_housing_rent = social_housing_rent` `instance-attribute` ¶

`coefficient_fa_income = coefficient_fa_income` `instance-attribute` ¶

`consumption_weights = consumption_weights` `instance-attribute` ¶

`consumption_weights_by_income = consumption_weights_by_income` `instance-attribute` ¶

`investment = investment` `instance-attribute` ¶

`saving_rates_model = saving_rates_model` `instance-attribute` ¶

`social_transfers_model = social_transfers_model` `instance-attribute` ¶

`wealth_distribution_model = wealth_distribution_model` `instance-attribute` ¶

`yearly_factor = yearly_factor` `instance-attribute` ¶

`industry_consumption_before_vat` `property` ¶

Calculate household consumption by industry before VAT.

This property computes the pre-tax consumption allocation across industries based on household income and consumption weights.

Returns:

Type	Description
	np.ndarray: Matrix of household consumption by industry before VAT

`investment_weights: np.ndarray` `property` ¶

Calculate normalized investment weights by industry.

This property computes the share of investment allocated to each industry, ensuring the weights sum to 1.

Returns:

Type	Description
`ndarray`	np.ndarray: Normalized investment weights by industry

`number_of_households: int` `property` ¶

Get the total number of households.

Returns:

Name	Type	Description
`int`	`int`	Number of rows in household_data

`number_employees_by_industry: np.ndarray` `property` ¶

Calculate the number of employed individuals by industry.

Returns:

Type	Description
`ndarray`	np.ndarray: Array of employee counts for each industry

`total_emissions: float` `property` ¶

Calculate total household emissions.

Returns:

Name	Type	Description
`float`	`float`	Sum of consumption and investment emissions

`init(country_name: str, country_name_short: str, scale: int, year: int, industries: list[str], individual_data: pd.DataFrame, household_data: pd.DataFrame, social_housing_rent: float, coefficient_fa_income: float, consumption_weights: np.ndarray, consumption_weights_by_income: np.ndarray, investment: np.ndarray, saving_rates_model: LinearRegression, social_transfers_model: LinearRegression, wealth_distribution_model: LinearRegression, yearly_factor: float = 4.0)` `abstractmethod` ¶

`set_individual_labour_inputs(firm_production: np.ndarray, firm_employees: pd.DataFrame, unemployment_labour_inputs_fraction: float = 0.3, override: bool = True) -> None` ¶

Set individual labor input values based on employment status and firm production.

This method assigns labor input values to individuals based on their employment status: 1. Employed: Inputs proportional to income within their firm 2. Unemployed: Fraction of industry mean inputs 3. Inactive: Zero inputs

Parameters:

Name	Type	Description	Default
`firm_production`	`ndarray`	Production values for each firm	required
`firm_employees`	`DataFrame`	Mapping of employees to firms	required
`unemployment_labour_inputs_fraction`	`float`	Fraction of mean industry inputs for unemployed. Defaults to 0.3.	`0.3`
`override`	`bool`	Whether to override existing values with uniform inputs. Defaults to True.	`True`

`compute_household_income(total_social_transfers: float, independents: Optional[list[str]] = None) -> None` `abstractmethod` ¶

Compute and update household income from all sources.

This method should: 1. Calculate employee income from individual data 2. Process social transfers based on household characteristics 3. Include rental income from real estate 4. Add income from financial assets 5. Update the household_data DataFrame with results

Parameters:

Name	Type	Description	Default
`total_social_transfers`	`float`	Total social transfers to be distributed across households based on their characteristics.	required
`independents`	`Optional[list[str]]`	List of independent variables to use in social transfer allocation models. Defaults to None.	`None`

`set_consumption_weights(consumption_weights: np.ndarray) -> None` ¶

Set the consumption weights for household expenditure allocation.

Parameters:

Name	Type	Description	Default
`consumption_weights`	`ndarray`	New consumption weights by industry	required

`set_debt_installments(consumption_installments: np.ndarray, ce_installments: np.ndarray, mortgage_installments: np.ndarray) -> None` `abstractmethod` ¶

Set household debt installment payments.

This method should: 1. Process consumption loan payments 2. Handle consumer electronics installments 3. Account for mortgage payments 4. Update total debt service in household_data 5. Ensure payment consistency with loan balances

Parameters:

Name	Type	Description	Default
`consumption_installments`	`ndarray`	Monthly payments for consumption loans	required
`ce_installments`	`ndarray`	Monthly payments for consumer electronics	required
`mortgage_installments`	`ndarray`	Monthly payments for mortgages	required

`set_household_saving_rates(independents: Optional[list[str]] = None) -> None` `abstractmethod` ¶

Compute and set household saving rates.

This method should: 1. Process consumption share data 2. Handle missing values through imputation 3. Fit saving rate models using household characteristics 4. Ensure rates are economically reasonable 5. Update the household_data DataFrame

Parameters:

Name	Type	Description	Default
`independents`	`Optional[list[str]]`	List of independent variables to use in saving rate models. Defaults to None.	`None`

`compute_household_wealth(independents: Optional[list[str]] = None) -> None` `abstractmethod` ¶

Compute and update household wealth components.

This method should: 1. Calculate real asset wealth (property, vehicles, businesses) 2. Process financial asset wealth (deposits, investments) 3. Account for all debt types (mortgages, loans) 4. Compute net wealth positions 5. Update wealth distribution models

Parameters:

Name	Type	Description	Default
`independents`	`Optional[list[str]]`	List of independent variables to use in wealth distribution models. Defaults to None.	`None`

`set_income() -> None` ¶

Set total individual income by combining employment and unemployment benefits.

This method: 1. Fills missing values with zeros 2. Combines employee income and unemployment benefits 3. Updates the Income column in individual_data

`restrict() -> None` `abstractmethod` ¶

Restrict household data to essential columns.

This method should: 1. Filter household_data to RESTRICT_COLS 2. Ensure data consistency after restriction 3. Preserve key relationships 4. Handle any missing required columns

`normalise_household_consumption(iot_hh_consumption: np.ndarray | pd.Series, vat: float, positive_saving_rates_only: bool = True, independents: Optional[list[str]] = None) -> None` `abstractmethod` ¶

Normalize household consumption to match aggregate targets.

This method should: 1. Scale consumption to match IOT totals 2. Account for VAT in consumption values 3. Maintain reasonable saving rates 4. Preserve consumption patterns by income group 5. Update household consumption shares

Parameters:

Name	Type	Description	Default
`iot_hh_consumption`	`ndarray \| Series`	Target household consumption from IOT	required
`vat`	`float`	Value-added tax rate	required
`positive_saving_rates_only`	`bool`	Whether to enforce positive saving rates. Defaults to True.	`True`
`independents`	`Optional[list[str]]`	Independent variables for consumption models. Defaults to None.	`None`

`set_household_investment_rates(capital_formation_taxrate: float, default_investment_rates: np.ndarray | float = 0.2) -> None` ¶

Initialize household investment rates.

This method sets initial investment rates for households, which can be later adjusted through normalization to match aggregate targets.

Parameters:

Name	Type	Description	Default
`capital_formation_taxrate`	`float`	Tax rate on capital formation	required
`default_investment_rates`	`ndarray \| float`	Initial investment rates. Defaults to 0.2.	`0.2`

`normalise_household_investment(tau_cf: float, iot_hh_investment: np.ndarray | pd.Series, positive_investment_rates: bool = True) -> None` ¶

`get_current_hh_investment_by_industry(tau_cf: float) -> np.ndarray` ¶

Calculate current household investment by industry.

This method computes the current investment allocation across industries based on household income, investment rates, and industry weights.

Parameters:

Name	Type	Description	Default
`tau_cf`	`float`	Capital formation tax rate	required

Returns:

Type	Description
`ndarray`	np.ndarray: Current investment values by industry

`match_consumption_weights_by_income(weights_by_income: np.ndarray | pd.DataFrame, iot_hh_consumption: pd.Series, vat: float, consumption_variance: float = 0.1) -> None` ¶

`set_wealth_distribution_function(independents: Optional[list[str]] = None) -> None` ¶

`add_emissions(emission_factors_array: np.ndarray, emitting_indices: list[int] | np.ndarray, tau_cf: float) -> None` ¶

Calculate and add emissions data to household records.

This method computes emissions from: 1. Household consumption 2. Investment activities 3. Specific fuel types (coal, gas, oil, refined products)

Parameters:

Name	Type	Description	Default
`emission_factors_array`	`ndarray`	Emission factors by source	required
`emitting_indices`	`list[int] \| ndarray`	Indices of emitting sectors	required
`tau_cf`	`float`	Capital formation tax rate	required

SyntheticPopulation¶