Skip to content

CompustatFirmsReader

This module provides functionality for reading and processing Compustat firm-level financial data. It handles both annual and quarterly data, with support for multiple countries and automatic currency conversion.

Key Features: - Read and merge annual and quarterly Compustat data - Handle multiple countries and currencies - Automatic missing value imputation - Support for proxy country data - Currency conversion capabilities

The module processes various financial metrics including: - Employment data - Balance sheet items (assets, liabilities, equity) - Income statement items (revenue, profits) - Operational data (inventory)

Example
from pathlib import Path
from macro_data.readers.population_data.compustat_firms_reader import CompustatFirmsReader
from macro_data.configuration.countries import Country

# Initialize reader with raw data
reader = CompustatFirmsReader.from_raw_data(
    year=2020,
    quarter=4,
    raw_annual_path=Path("path/to/annual.csv"),
    raw_quarterly_path=Path("path/to/quarterly.csv"),
    countries=["USA", "GBR", Country.FRANCE]
)

# Get firm data for a specific country
usa_firms = reader.get_firm_data("USA")

# Get proxy data with currency conversion
proxy_firms = reader.get_proxied_firm_data(
    proxy_country="GBR",
    exchange_rate=1.25
)
Note

Missing values are imputed using scikit-learn's IterativeImputer.

CompustatFirmsReader

A class for reading and processing Compustat firm-level financial data.

This class handles the reading and processing of Compustat data, including: - Merging annual and quarterly data - Filtering by country and time period - Imputing missing values - Currency conversion for international comparisons

Parameters

data : pd.DataFrame Processed Compustat data with standardized columns

Attributes

data : pd.DataFrame Processed firm-level data indexed by country numerical_columns : list[str] List of columns containing monetary values

Notes
  • Missing values are imputed using scikit-learn's IterativeImputer
  • All monetary values are in their original currencies
data = data instance-attribute
numerical_columns property

Get the list of columns containing monetary values.

Returns

list[str] Names of columns containing monetary values

__init__(data: pd.DataFrame)
from_raw_data(year: int, quarter: int, raw_annual_path: Path | str, raw_quarterly_path: Path | str, countries: list[str | Country]) classmethod

Create a CompustatFirmsReader instance from raw Compustat files.

This method: 1. Reads annual and quarterly data 2. Filters for specific time period 3. Merges the datasets 4. Processes and cleans the data 5. Imputes missing values

Parameters

year : int Year to filter data for quarter : int Quarter to filter data for (1-4) raw_annual_path : Path | str Path to annual Compustat data file raw_quarterly_path : Path | str Path to quarterly Compustat data file countries : list[str | Country] List of countries to include in the data

Returns

CompustatFirmsReader Initialized reader with processed data

Notes
  • Data is filtered to match the specified year and quarter
  • Countries can be specified as strings or Country enum values
  • Missing values are imputed across all numeric columns
get_firm_data(country: str | Country | Region) -> pd.DataFrame

Get firm-level data for a specific country.

Parameters

country : str | Country Country to get data for (string or Country enum)

Returns

pd.DataFrame Firm-level data for the specified country

get_proxied_firm_data(proxy_country: str | Country, exchange_rate: float) -> pd.DataFrame

Get firm-level data from a proxy country with currency conversion.

This method is useful when direct data for a country is not available and data from another country needs to be used as a proxy.

Parameters

proxy_country : str | Country Country to use as proxy (string or Country enum) exchange_rate : float Exchange rate to convert monetary values

Returns

pd.DataFrame Converted firm-level data from the proxy country

Notes
  • Only monetary values are converted
  • Non-monetary fields (e.g., employee counts) are unchanged