SyntheticPopulation¶
The SyntheticPopulation module provides data structures and utilities for preprocessing and organizing population data that will be used to initialize behavioral models in the simulation package.
SyntheticPopulation¶
The SyntheticPopulation class is an abstract base class that provides a framework for collecting and organizing population data. It is not used for simulating population behavior - it only handles data preprocessing.
Key Features¶
- Household and individual data management
- Income and wealth computation
- Consumption and investment patterns
- Labor market integration
- Social transfer processing
- Data validation and cleaning
Attributes¶
country_name(str): Country identifiercountry_name_short(str): Short country codescale(int): Population scaling factoryear(int): Reference yearindustries(list[str]): List of industriesindividual_data(pd.DataFrame): Individual-level data containing:- Demographics (age, gender, education)
- Employment status and industry
- Income sources
- Household and firm associations
household_data(pd.DataFrame): Household-level data containing:- Household composition
- Income and wealth components
- Housing tenure and property
- Financial assets and debt
- Consumption patterns
Abstract Methods¶
compute_household_income¶
@abstractmethod
def compute_household_income(
self,
total_social_transfers: float,
independents: Optional[list[str]] = None
) -> None
Computes household income from all sources:
- Employee income
- Social transfers
- Rental income
- Financial asset returns
compute_household_wealth¶
@abstractmethod
def compute_household_wealth(
self,
independents: Optional[list[str]] = None
) -> None
Computes household wealth components:
- Real assets (property, vehicles, businesses)
- Financial assets (deposits, investments)
- Debt obligations
- Net wealth position
set_debt_installments¶
@abstractmethod
def set_debt_installments(
self,
consumption_installments: np.ndarray,
ce_installments: np.ndarray,
mortgage_installments: np.ndarray
) -> None
Sets household debt payment schedules:
- Consumption loan payments
- Consumer electronics installments
- Mortgage payments
set_household_saving_rates¶
@abstractmethod
def set_household_saving_rates(
self,
independents: Optional[list[str]] = None
) -> None
Computes household saving rates based on:
- Income levels
- Wealth position
- Household characteristics
SyntheticHFCSPopulation¶
The SyntheticHFCSPopulation class is a concrete implementation that preprocesses population data using the Household Finance and Consumption Survey (HFCS) as its primary data source.
Key Features¶
- HFCS data integration
- Household sampling and scaling
- Industry employment allocation
- Wealth and income modeling
- Consumption pattern estimation
Factory Methods¶
from_readers¶
@classmethod
def from_readers(
cls,
readers: DataReaders,
country_name: Country,
country_name_short: str,
scale: int,
year: int,
quarter: int,
industry_data: dict[str, pd.DataFrame],
industries: list[str],
total_unemployment_benefits: float,
exogenous_data: ExogenousCountryData,
rent_as_fraction_of_unemployment_rate: float = 0.25,
n_quantiles: int = 5,
population_ratio: float = 1.0,
exch_rate: float = 1.0,
proxied_country: str | Country = None,
yearly_factor: float = 4.0
) -> "SyntheticHFCSPopulation"
Creates a synthetic population using HFCS data and additional sources.
Parameters:
readers(DataReaders): Data source readerscountry_name(Country): Target countrycountry_name_short(str): Country codescale(int): Population scaling factoryear(int): Reference yearquarter(int): Reference quarterindustry_data(dict): Industry-level dataindustries(list[str]): Target industriestotal_unemployment_benefits(float): Total benefits to distributeexogenous_data(ExogenousCountryData): External economic datarent_as_fraction_of_unemployment_rate(float): Rent parametern_quantiles(int): Income quantiles for analysispopulation_ratio(float): Population scaling ratioexch_rate(float): Exchange rate for currency conversionproxied_country(str|Country): Proxy country for missing datayearly_factor(float): Annual to sub-annual conversion factor
Returns:
SyntheticHFCSPopulation: Configured population instance
Usage Example¶
from macro_data import DataReaders, ExogenousCountryData
from macro_data.processing.synthetic_population import SyntheticHFCSPopulation
# Initialize data readers and configuration
readers = DataReaders.from_raw_data(...)
exogenous_data = ExogenousCountryData(...)
industry_data = {...}
# Create synthetic population for France in 2023 Q1
france_population = SyntheticHFCSPopulation.from_readers(
country_name="FRA",
country_name_short="FR",
scale=1000,
year=2023,
quarter=1,
readers=readers,
industry_data=industry_data,
industries=["C10T12", "C13T15"],
total_unemployment_benefits=1e9,
exogenous_data=exogenous_data
)
# Compute household wealth and income
france_population.compute_household_wealth()
france_population.compute_household_income(total_social_transfers=5e8)
Module for managing synthetic population data in macroeconomic simulations.
This module provides the abstract base class for synthetic population generation and management. It defines the core data structures and interfaces for representing households and individuals in a macroeconomic simulation, with a focus on:
- Population Structure:
- Household composition and relationships
- Individual demographics and employment
- Financial status and wealth distribution
-
Housing tenure and property ownership
-
Economic Attributes:
- Income sources (employment, transfers, assets)
- Wealth composition (real assets, financial assets)
- Debt obligations and credit relationships
-
Consumption and saving patterns
-
Data Management:
- Data validation and cleaning
- Missing value imputation
- Statistical modeling of relationships
-
Scale factor adjustments
-
Market Relationships:
- Employment connections with firms
- Banking relationships
- Housing market participation
- Investment behavior
The module supports both EU and non-EU country data, with capabilities for: - Data harmonization across sources - Consistent initialization of economic relationships - Preservation of aggregate economic constraints - Environmental impact tracking
Note
This module focuses on preprocessing and organizing population data for initialization. The actual behavioral dynamics are implemented in the simulation package.
Example
from macro_data.processing.synthetic_population import SyntheticPopulation
class CustomPopulation(SyntheticPopulation):
def __init__(self, country_name, scale, ...):
super().__init__(...)
def compute_household_income(self, total_social_transfers):
# Custom income computation logic
pass
def compute_household_wealth(self):
# Custom wealth computation logic
pass
SyntheticPopulation
¶
Represents a synthetic population for a specific country and year.
The household data is a pandas data frame with the following columns
- Type: The type of the household (1: single, 2: couple, 3: single parent, 4: couple with children).
- Corresponding Individuals ID: The IDs of the individuals in the household.
- Corresponding Bank ID: The ID of the bank the household is associated with.
- Corresponding Inhabited House ID: The ID of the house the household inhabits.
- Corresponding Renters: The IDs of the individuals in the household who rent.
- Corresponding Property Owner: The IDs of the individuals in the household who own property.
- Corresponding Additionally Owned Houses ID: The IDs of the houses the household owns.
- Income: The total income of the household.
- Employee Income: The income of the household from employment.
- Regular Social Transfers: The income of the household from social transfers.
- Rental Income from Real Estate: The income of the household from rental of real estate.
- Income from Financial Assets: The income of the household from financial assets.
- Saving Rate: The saving rate of the household.
- Rent Paid: The rent paid by the household.
- Rent Imputed: The imputed rent of the household.
- Wealth: The total wealth of the household.
- Net Wealth: The net wealth of the household.
- Wealth in Real Assets: The wealth of the household in real assets.
- Value of the Main Residence: The value of the main residence of the household.
- Value of other Properties: The value of other properties of the household.
- Wealth Other Real Assets: The wealth of the household in other real assets.
- Wealth in Deposits: The wealth of the household in deposits.
- Wealth in Other Financial Assets: The wealth of the household in other financial assets.
- Wealth in Financial Assets: The wealth of the household in financial assets.
- Outstanding Balance of HMR Mortgages: The outstanding balance of the household's HMR mortgages.
- Outstanding Balance of Mortgages on other Properties: The outstanding balance of the household's mortgages on other properties.
- Outstanding Balance of other Non-Mortgage Loans: The outstanding balance of the household's other non-mortgage loans.
- Debt: The total debt of the household.
- Debt Installments: The debt installments of the household (monthly payments of debt).
- Tenure Status of the Main Residence: The tenure status of the main residence of the household.
- Number of Properties other than Household Main Residence: The number of properties other than the household's main residence.
The individual data is a pandas data frame with the following columns
- Gender: The gender of the individual (1: male, 2: female)
- Age: The age of the individual.
- Education: The education level of the individual (ISCED classification).
- Activity Status: The activity status of the individual (1: employed, 2: unemployed, 3: not economically active).
- Employment Industry: The industry of the individual's employment.
- Employee Income: The income of the individual from employment.
- Income from Unemployment Benefits: The income of the individual from unemployment benefits.
- Income: The total income of the individual.
- Corresponding Household ID: The ID of the household the individual belongs to.
- Corresponding Firm ID: The ID of the firm the individual works for.
Attributes:
| Name | Type | Description |
|---|---|---|
country_name |
str
|
The name of the country. |
country_name_short |
str
|
The short name or code of the country. |
scale |
int
|
The scale of the synthetic population. |
year |
int
|
The year of the synthetic population. |
industries |
list[str]
|
The list of industries in the country. |
individual_data |
DataFrame
|
The data frame containing individual-level data. |
household_data |
DataFrame
|
The data frame containing household-level data. |
social_housing_rent |
float
|
The rent for social housing. |
coefficient_fa_income |
float
|
The coefficient for family allowance income. |
consumption_weights |
ndarray
|
The weights for household consumption. |
consumption_weights_by_income |
ndarray
|
The weights for household consumption based on income. |
saving_rates_model |
LinearRegression
|
The model for household saving rates. |
social_transfers_model |
LinearRegression
|
The model for social transfers. |
wealth_distribution_model |
LinearRegression
|
The model for wealth distribution. |
country_name = country_name
instance-attribute
¶
country_name_short = country_name_short
instance-attribute
¶
scale = scale
instance-attribute
¶
year = year
instance-attribute
¶
industries = industries
instance-attribute
¶
individual_data = individual_data
instance-attribute
¶
household_data = household_data
instance-attribute
¶
social_housing_rent = social_housing_rent
instance-attribute
¶
coefficient_fa_income = coefficient_fa_income
instance-attribute
¶
consumption_weights = consumption_weights
instance-attribute
¶
consumption_weights_by_income = consumption_weights_by_income
instance-attribute
¶
investment = investment
instance-attribute
¶
saving_rates_model = saving_rates_model
instance-attribute
¶
social_transfers_model = social_transfers_model
instance-attribute
¶
wealth_distribution_model = wealth_distribution_model
instance-attribute
¶
yearly_factor = yearly_factor
instance-attribute
¶
industry_consumption_before_vat
property
¶
Calculate household consumption by industry before VAT.
This property computes the pre-tax consumption allocation across industries based on household income and consumption weights.
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Matrix of household consumption by industry before VAT |
investment_weights: np.ndarray
property
¶
Calculate normalized investment weights by industry.
This property computes the share of investment allocated to each industry, ensuring the weights sum to 1.
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Normalized investment weights by industry |
number_of_households: int
property
¶
Get the total number of households.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of rows in household_data |
number_employees_by_industry: np.ndarray
property
¶
Calculate the number of employed individuals by industry.
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Array of employee counts for each industry |
total_emissions: float
property
¶
Calculate total household emissions.
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Sum of consumption and investment emissions |
__init__(country_name: str, country_name_short: str, scale: int, year: int, industries: list[str], individual_data: pd.DataFrame, household_data: pd.DataFrame, social_housing_rent: float, coefficient_fa_income: float, consumption_weights: np.ndarray, consumption_weights_by_income: np.ndarray, investment: np.ndarray, saving_rates_model: LinearRegression, social_transfers_model: LinearRegression, wealth_distribution_model: LinearRegression, yearly_factor: float = 4.0)
abstractmethod
¶
set_individual_labour_inputs(firm_production: np.ndarray, firm_employees: pd.DataFrame, unemployment_labour_inputs_fraction: float = 0.3, override: bool = True) -> None
¶
Set individual labor input values based on employment status and firm production.
This method assigns labor input values to individuals based on their employment status: 1. Employed: Inputs proportional to income within their firm 2. Unemployed: Fraction of industry mean inputs 3. Inactive: Zero inputs
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
firm_production
|
ndarray
|
Production values for each firm |
required |
firm_employees
|
DataFrame
|
Mapping of employees to firms |
required |
unemployment_labour_inputs_fraction
|
float
|
Fraction of mean industry inputs for unemployed. Defaults to 0.3. |
0.3
|
override
|
bool
|
Whether to override existing values with uniform inputs. Defaults to True. |
True
|
compute_household_income(total_social_transfers: float, independents: Optional[list[str]] = None) -> None
abstractmethod
¶
Compute and update household income from all sources.
This method should: 1. Calculate employee income from individual data 2. Process social transfers based on household characteristics 3. Include rental income from real estate 4. Add income from financial assets 5. Update the household_data DataFrame with results
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
total_social_transfers
|
float
|
Total social transfers to be distributed across households based on their characteristics. |
required |
independents
|
Optional[list[str]]
|
List of independent variables to use in social transfer allocation models. Defaults to None. |
None
|
set_consumption_weights(consumption_weights: np.ndarray) -> None
¶
Set the consumption weights for household expenditure allocation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
consumption_weights
|
ndarray
|
New consumption weights by industry |
required |
set_debt_installments(consumption_installments: np.ndarray, ce_installments: np.ndarray, mortgage_installments: np.ndarray) -> None
abstractmethod
¶
Set household debt installment payments.
This method should: 1. Process consumption loan payments 2. Handle consumer electronics installments 3. Account for mortgage payments 4. Update total debt service in household_data 5. Ensure payment consistency with loan balances
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
consumption_installments
|
ndarray
|
Monthly payments for consumption loans |
required |
ce_installments
|
ndarray
|
Monthly payments for consumer electronics |
required |
mortgage_installments
|
ndarray
|
Monthly payments for mortgages |
required |
set_household_saving_rates(independents: Optional[list[str]] = None) -> None
abstractmethod
¶
Compute and set household saving rates.
This method should: 1. Process consumption share data 2. Handle missing values through imputation 3. Fit saving rate models using household characteristics 4. Ensure rates are economically reasonable 5. Update the household_data DataFrame
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
independents
|
Optional[list[str]]
|
List of independent variables to use in saving rate models. Defaults to None. |
None
|
compute_household_wealth(independents: Optional[list[str]] = None) -> None
abstractmethod
¶
Compute and update household wealth components.
This method should: 1. Calculate real asset wealth (property, vehicles, businesses) 2. Process financial asset wealth (deposits, investments) 3. Account for all debt types (mortgages, loans) 4. Compute net wealth positions 5. Update wealth distribution models
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
independents
|
Optional[list[str]]
|
List of independent variables to use in wealth distribution models. Defaults to None. |
None
|
set_income() -> None
¶
Set total individual income by combining employment and unemployment benefits.
This method: 1. Fills missing values with zeros 2. Combines employee income and unemployment benefits 3. Updates the Income column in individual_data
restrict() -> None
abstractmethod
¶
Restrict household data to essential columns.
This method should: 1. Filter household_data to RESTRICT_COLS 2. Ensure data consistency after restriction 3. Preserve key relationships 4. Handle any missing required columns
normalise_household_consumption(iot_hh_consumption: np.ndarray | pd.Series, vat: float, positive_saving_rates_only: bool = True, independents: Optional[list[str]] = None) -> None
abstractmethod
¶
Normalize household consumption to match aggregate targets.
This method should: 1. Scale consumption to match IOT totals 2. Account for VAT in consumption values 3. Maintain reasonable saving rates 4. Preserve consumption patterns by income group 5. Update household consumption shares
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
iot_hh_consumption
|
ndarray | Series
|
Target household consumption from IOT |
required |
vat
|
float
|
Value-added tax rate |
required |
positive_saving_rates_only
|
bool
|
Whether to enforce positive saving rates. Defaults to True. |
True
|
independents
|
Optional[list[str]]
|
Independent variables for consumption models. Defaults to None. |
None
|
set_household_investment_rates(capital_formation_taxrate: float, default_investment_rates: np.ndarray | float = 0.2) -> None
¶
Initialize household investment rates.
This method sets initial investment rates for households, which can be later adjusted through normalization to match aggregate targets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
capital_formation_taxrate
|
float
|
Tax rate on capital formation |
required |
default_investment_rates
|
ndarray | float
|
Initial investment rates. Defaults to 0.2. |
0.2
|
normalise_household_investment(tau_cf: float, iot_hh_investment: np.ndarray | pd.Series, positive_investment_rates: bool = True) -> None
¶
get_current_hh_investment_by_industry(tau_cf: float) -> np.ndarray
¶
Calculate current household investment by industry.
This method computes the current investment allocation across industries based on household income, investment rates, and industry weights.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tau_cf
|
float
|
Capital formation tax rate |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Current investment values by industry |
match_consumption_weights_by_income(weights_by_income: np.ndarray | pd.DataFrame, iot_hh_consumption: pd.Series, vat: float, consumption_variance: float = 0.1) -> None
¶
set_wealth_distribution_function(independents: Optional[list[str]] = None) -> None
¶
add_emissions(emission_factors_array: np.ndarray, emitting_indices: list[int] | np.ndarray, tau_cf: float) -> None
¶
Calculate and add emissions data to household records.
This method computes emissions from: 1. Household consumption 2. Investment activities 3. Specific fuel types (coal, gas, oil, refined products)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
emission_factors_array
|
ndarray
|
Emission factors by source |
required |
emitting_indices
|
list[int] | ndarray
|
Indices of emitting sectors |
required |
tau_cf
|
float
|
Capital formation tax rate |
required |