CompustatFirmsReader¶
This module provides functionality for reading and processing Compustat firm-level financial data. It handles both annual and quarterly data, with support for multiple countries and automatic currency conversion.
Key Features: - Read and merge annual and quarterly Compustat data - Handle multiple countries and currencies - Automatic missing value imputation - Support for proxy country data - Currency conversion capabilities
The module processes various financial metrics including: - Employment data - Balance sheet items (assets, liabilities, equity) - Income statement items (revenue, profits) - Operational data (inventory)
Example
from pathlib import Path
from macro_data.readers.population_data.compustat_firms_reader import CompustatFirmsReader
from macro_data.configuration.countries import Country
# Initialize reader with raw data
reader = CompustatFirmsReader.from_raw_data(
year=2020,
quarter=4,
raw_annual_path=Path("path/to/annual.csv"),
raw_quarterly_path=Path("path/to/quarterly.csv"),
countries=["USA", "GBR", Country.FRANCE]
)
# Get firm data for a specific country
usa_firms = reader.get_firm_data("USA")
# Get proxy data with currency conversion
proxy_firms = reader.get_proxied_firm_data(
proxy_country="GBR",
exchange_rate=1.25
)
Note
Missing values are imputed using scikit-learn's IterativeImputer.
CompustatFirmsReader
¶
A class for reading and processing Compustat firm-level financial data.
This class handles the reading and processing of Compustat data, including: - Merging annual and quarterly data - Filtering by country and time period - Imputing missing values - Currency conversion for international comparisons
Parameters¶
data : pd.DataFrame Processed Compustat data with standardized columns
Attributes¶
data : pd.DataFrame Processed firm-level data indexed by country numerical_columns : list[str] List of columns containing monetary values
Notes¶
- Missing values are imputed using scikit-learn's IterativeImputer
- All monetary values are in their original currencies
data = data
instance-attribute
¶
numerical_columns
property
¶
Get the list of columns containing monetary values.
Returns¶
list[str] Names of columns containing monetary values
__init__(data: pd.DataFrame)
¶
from_raw_data(year: int, quarter: int, raw_annual_path: Path | str, raw_quarterly_path: Path | str, countries: list[str | Country])
classmethod
¶
Create a CompustatFirmsReader instance from raw Compustat files.
This method: 1. Reads annual and quarterly data 2. Filters for specific time period 3. Merges the datasets 4. Processes and cleans the data 5. Imputes missing values
Parameters¶
year : int Year to filter data for quarter : int Quarter to filter data for (1-4) raw_annual_path : Path | str Path to annual Compustat data file raw_quarterly_path : Path | str Path to quarterly Compustat data file countries : list[str | Country] List of countries to include in the data
Returns¶
CompustatFirmsReader Initialized reader with processed data
Notes¶
- Data is filtered to match the specified year and quarter
- Countries can be specified as strings or Country enum values
- Missing values are imputed across all numeric columns
get_firm_data(country: str | Country | Region) -> pd.DataFrame
¶
get_proxied_firm_data(proxy_country: str | Country, exchange_rate: float) -> pd.DataFrame
¶
Get firm-level data from a proxy country with currency conversion.
This method is useful when direct data for a country is not available and data from another country needs to be used as a proxy.
Parameters¶
proxy_country : str | Country Country to use as proxy (string or Country enum) exchange_rate : float Exchange rate to convert monetary values
Returns¶
pd.DataFrame Converted firm-level data from the proxy country
Notes¶
- Only monetary values are converted
- Non-monetary fields (e.g., employee counts) are unchanged