CompustatBanksReader¶
This module provides functionality for reading and processing Compustat bank-level financial data. It handles quarterly bank financial statements with support for multiple countries and automatic currency conversion.
Key Features: - Read and process quarterly Compustat bank data - Handle multiple countries and currencies - Automatic missing value imputation - Support for proxy data using US banks - Currency conversion capabilities
The module processes various banking metrics including: - Balance sheet items (assets, liabilities, equity) - Funding sources (deposits, debt) - Debt dynamics (issuance, reduction) - Income data
Example
from pathlib import Path
from macro_data.readers.population_data.compustat_banks_reader import CompustatBanksReader
from macro_data.configuration.countries import Country
# Initialize reader with raw data
reader = CompustatBanksReader.from_raw_data(
year=2020,
quarter=4,
raw_quarterly_path=Path("path/to/quarterly.csv"),
countries=["GBR", Country.FRANCE],
proxy_with_us=True # Include US banks for proxying
)
# Get bank data for a specific country
uk_banks = reader.get_country_data(
country="GBR",
exchange_rate=1.25
)
# Get proxy data using another country's banks
proxy_banks = reader.get_proxied_country_data(
proxy_country="USA",
exchange_rate=1.25
)
Note
Missing values are imputed using scikit-learn's IterativeImputer, with imputation performed separately for each country's banks.
CompustatBanksReader
¶
A class for reading and processing Compustat bank-level financial data.
This class handles the reading and processing of Compustat bank data, including: - Reading quarterly financial statements - Filtering by country and time period - Imputing missing values - Currency conversion for international comparisons
Parameters¶
data : pd.DataFrame Processed Compustat bank data with standardized columns
Attributes¶
data : pd.DataFrame Processed bank-level data indexed by country numerical_columns : list[str] List of columns containing monetary values
Notes¶
- Missing values are imputed separately for each country's banks
- All monetary values are in their original currencies unless converted
data = data
instance-attribute
¶
numerical_columns
property
¶
Get the list of columns containing monetary values.
Returns¶
list[str] Names of columns containing monetary values that are present in the data
__init__(data: pd.DataFrame)
¶
from_raw_data(year: int, quarter: int, raw_quarterly_path: Path | str, countries: list[str | Country], proxy_with_us: bool = True)
classmethod
¶
Create a CompustatBanksReader instance from raw Compustat data.
This method: 1. Reads quarterly bank data 2. Filters for specific time period 3. Processes and cleans the data 4. Imputes missing values by country
Parameters¶
year : int Year to filter data for quarter : int Quarter to filter data for (1-4) raw_quarterly_path : Path | str Path to quarterly Compustat data file countries : list[str | Country] List of countries to include in the data proxy_with_us : bool, optional Whether to include US banks for proxying (default: True)
Returns¶
CompustatBanksReader Initialized reader with processed data
Notes¶
- Data is filtered to match the specified year and quarter
- Countries can be specified as strings or Country enum values
- Missing values are imputed separately for each country
- US banks are included if proxy_with_us is True
get_country_data(country: str, exchange_rate: float) -> pd.DataFrame
¶
Get bank-level data for a specific country.
For non-US countries, this method returns US bank data converted to the target country's currency using the provided exchange rate.
Parameters¶
country : str Country to get data for exchange_rate : float Exchange rate for currency conversion (if using US proxy)
Returns¶
pd.DataFrame Bank-level data for the specified country
Notes¶
- Returns actual data for US banks
- Returns converted US bank data for other countries
get_proxied_country_data(proxy_country: str | Country, exchange_rate: float)
¶
Get bank-level data from a proxy country with currency conversion.
This method is useful when direct data for a country is not available and data from another country needs to be used as a proxy.
Parameters¶
proxy_country : str | Country Country to use as proxy (string or Country enum) exchange_rate : float Exchange rate to convert monetary values
Returns¶
pd.DataFrame Converted bank-level data from the proxy country
Notes¶
- Only monetary values are converted
- Non-monetary fields are unchanged