Python API Reference¶
All public symbols are available from the top-level package:
Meta¶
bsp.__app_name__¶
constant
bsp.__version__¶
constant
Namespaced report backend¶
bsp.db¶
module — bank_statement_parser.modules.reports_db
SQLite-backed report classes and export helpers.
Statement processing¶
bsp.Statement¶
class — bank_statement_parser.modules.statements
Represents a single bank statement PDF with data extraction and validation.
bsp.StatementBatch¶
class — bank_statement_parser.modules.statements
Handles batch processing of multiple bank statement PDFs.
bsp.process_pdf_statement()¶
function — bank_statement_parser.modules.statements
Process a single bank statement PDF and save results to parquet files.
bsp.copy_statements_to_project()¶
function — bank_statement_parser.modules.statements
Copy processed statement PDFs into the project statements/ directory.
bsp.delete_temp_files()¶
function — bank_statement_parser.modules.statements
Delete temporary parquet files created during batch processing.
Low-level persistence helpers¶
bsp.update_parquet()¶
function — bank_statement_parser.modules.parquet
Update parquet files with processed results from all PDFs in a batch.
bsp.update_db()¶
function — bank_statement_parser.modules.database
Insert processed batch results into the SQLite database.
Data structures¶
bsp.PdfResult¶
class — bank_statement_parser.modules.data
Top-level result returned by :func:~bank_statement_parser.modules.statements.process_pdf_statement.
bsp.Success¶
class — bank_statement_parser.modules.data
Payload for a fully-validated PDF result.
bsp.Review¶
class — bank_statement_parser.modules.data
Payload for a PDF where extraction succeeded but CAB validation failed.
bsp.Failure¶
class — bank_statement_parser.modules.data
Payload for a PDF result where no usable statement data was produced.
bsp.StatementInfo¶
class — bank_statement_parser.modules.data
Statement-level metadata extracted from a successfully validated PDF.
bsp.ParquetFiles¶
class — bank_statement_parser.modules.data
Paths to the statement-level temporary Parquet files written on the SUCCESS path.
Debug / diagnostics¶
bsp.debug_pdf_statement()¶
function — bank_statement_parser.modules.debug
Re-process a single failing PDF and write a debug.json diagnostic file.
bsp.debug_statements()¶
function — bank_statement_parser.modules.debug
Re-process all failing statements from a completed batch and write debug files.
Errors¶
bsp.StatementError¶
class — bank_statement_parser.modules.errors
bsp.ProjectDatabaseMissing¶
class — bank_statement_parser.modules.errors
bsp.ProjectConfigMissing¶
class — bank_statement_parser.modules.errors
Config helpers¶
bsp.copy_default_import_config()¶
function — bank_statement_parser.modules.import_config
Copy all default import TOML configuration files to a destination directory.
bsp.copy_project_folders()¶
function — bank_statement_parser.modules.paths
Copy the project folder structure (directories only) to a destination.
bsp.validate_or_initialise_project()¶
function — bank_statement_parser.modules.paths
Validate an existing project or initialise a new one at project_path.
bsp.ProjectPaths¶
class — bank_statement_parser.modules.paths
All file-system paths for a bank_statement_parser project, derived from a single root directory.
Low-level PDF helpers¶
bsp.pdf_open()¶
function — bank_statement_parser.modules.pdf_functions
Open a PDF file and return the PDF object with performance logging.
bsp.page_crop()¶
function — bank_statement_parser.modules.pdf_functions
Crop a PDF page to the specified bounding box coordinates, with smart defaults.
bsp.page_text()¶
function — bank_statement_parser.modules.pdf_functions
Extract all text content from a PDF page.
bsp.region_search()¶
function — bank_statement_parser.modules.pdf_functions
Search for a regex pattern within a PDF region and return the first match text.
bsp.get_table_from_region()¶
function — bank_statement_parser.modules.pdf_functions
Extract a structured table from a PDF region using configurable extraction settings.
Data-mart / database¶
bsp.build_datamart()¶
function — bank_statement_parser.data
Empty and rebuild all mart tables (DimTime, DimAccount, DimStatement, FactTransaction, FactBalance) from the raw source tables.
bsp.create_db()¶
function — bank_statement_parser.data
Create (or recreate) the raw SQLite database with all tables and indexes.
bsp.Housekeeping¶
class — bank_statement_parser.data
Orphan-detection and cascaded-delete helper for the raw SQLite database.
PDF anonymisation¶
bsp.anonymise_pdf()¶
function — bank_statement_parser.modules.anonymise
Anonymise a single PDF using exclusion-based full-page letter scrambling.
bsp.anonymise_folder()¶
function — bank_statement_parser.modules.anonymise
Anonymise all PDFs matching pattern in folder_path using exclusion-based scrambling.
Forex / currency conversion¶
bsp.get_exchange_rates()¶
function — bank_statement_parser.modules.forex
Fetch daily USD-based exchange rates and persist them to exchange_rates.
bsp.ForexApiConfig¶
class — bank_statement_parser.modules.data
Configuration for the forex exchange-rate fetching service.
Testing harness¶
bsp.TestHarness¶
class — bank_statement_parser.testing
Programmatic test environment for integration testing by dependent projects.
bsp.TestGateFailure¶
class — bank_statement_parser.modules.errors
Raised when bsp's own pytest suite fails during TestHarness.setup().
DB Report Backend¶
bsp.db exposes report classes and export functions backed by the SQLite star-schema.
# SQLite backend
flat = bsp.db.FlatTransaction().all.collect()
bsp.db.export_csv()
bsp.db.export_excel()
bsp.db.export_json()
bsp.db.export_reporting_data()
Report classes¶
Available in bsp.db:
| Class | Description |
|---|---|
FlatTransaction |
Denormalised transaction view joining all dimensions. |
FactBalance |
Daily balance series per account (forward-filled). |
DimTime |
Calendar dimension — one row per date in the transaction range. |
DimStatement |
Statement dimension — one row per parsed PDF statement. |
DimAccount |
Account dimension — one row per unique account. |
FactTransaction |
Transaction fact table — one row per transaction. |
GapReport |
Gap detection report — flags missing statement periods. |
Export helpers¶
Available in bsp.db:
export_csv()¶
export_csv(folder: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None
Write report data to CSV files in folder.
Each table is written as a separate .csv file named after its logical
table name (e.g. transactions.csv, or statement_dimension.csv,
account_dimension.csv, etc. for type="multi").
When filename_timestamp is True:
type="single": the timestamp is appended to the filename, e.g.transactions_20250331143022.csv.type="multi": files are written into amulti_20250331143022/sub-folder inside folder with their original names.
When filename_timestamp is False:
type="single": files are written directly to folder with their original names, e.g.transactions.csv.type="multi": files are written into amulti/sub-folder inside folder with their original names.
Args:
folder— Directory to write CSV files into. WhenNonethe project'sexport/csv/directory (resolved via project_path) is used and created automatically if absent.type— Export preset —"single"(flat transactions table) or"multi"(separate star-schema tables for loading into a database). Defaults to"single".project_path— Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project whenNone.batch_id— Optional batch identifier to filter report data to a single batch. WhenNoneall rows are exported.filename_timestamp— WhenTrue, append a human-readable timestamp (yyyymmddHHMMSS) to the filename (single) or create a timestamped sub-folder (multi). Defaults toFalse.
export_excel()¶
export_excel(path: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None
Write report data to an Excel workbook at path.
Each table is written as a separate worksheet. For type="single" a
single transactions sheet is written; for type="multi" six sheets
are written (statement_dimension, account_dimension,
calendar_dimension, transaction_measures,
daily_account_balances, missing_statement_report).
Filename conventions:
type="single", no timestamp:transactions.xlsxtype="single", with timestamp:transactions_20250331143022.xlsxtype="multi", no timestamp:transactions_multi.xlsxtype="multi", with timestamp:transactions_multi_20250331143022.xlsx
Worksheet names are never modified by the timestamp or type logic.
Args:
path— Full file path for the output.xlsxworkbook. WhenNonethe file is written toexport/excel/transactions.xlsxinside the project directory resolved via project_path.type— Export preset —"single"(flat transactions table) or"multi"(separate star-schema sheets for loading into a database). Defaults to"single".project_path— Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project whenNone.batch_id— Optional batch identifier to filter report data to a single batch. WhenNoneall rows are exported.filename_timestamp— WhenTrue, append a human-readable timestamp (yyyymmddHHMMSS) to the workbook filename. Worksheet names are unaffected. Defaults toFalse.
export_json()¶
export_json(folder: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None
Write report data to JSON files in folder.
Each table is written as a separate .json file containing a JSON array
of row objects, named after its logical table name (e.g.
transactions.json, or statement_dimension.json,
account_dimension.json, etc. for type="multi").
When filename_timestamp is True:
type="single": the timestamp is appended to the filename, e.g.transactions_20250331143022.json.type="multi": files are written into amulti_20250331143022/sub-folder inside folder with their original names.
When filename_timestamp is False:
type="single": files are written directly to folder with their original names, e.g.transactions.json.type="multi": files are written into amulti/sub-folder inside folder with their original names.
Args:
folder— Directory to write JSON files into. WhenNonethe project'sexport/json/directory (resolved via project_path) is used and created automatically if absent.type— Export preset —"single"(flat transactions table) or"multi"(separate star-schema tables for loading into a database). Defaults to"single".project_path— Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project whenNone.batch_id— Optional batch identifier to filter report data to a single batch. WhenNoneall rows are exported.filename_timestamp— WhenTrue, append a human-readable timestamp (yyyymmddHHMMSS) to the filename (single) or create a timestamped sub-folder (multi). Defaults toFalse.
export_reporting_data()¶
Write CSV reporting feeds to the project's reporting/data/ sub-directories.
Calls :func:export_csv twice — once with type="single" writing to
reporting/data/single/ and once with type="multi" writing to
reporting/data/multi/ (created as a sub-folder of reporting/data/
by the multi-export logic). Both directories are created automatically if
absent.
This produces a stable set of CSV files that external reporting tools (e.g. Power BI, Tableau, Excel) can point at directly without needing to know about the full export machinery.
Args:
project_path— Optional project root directory. Falls back to the bundled default project whenNone.
Example:
import bank_statement_parser as bsp
from pathlib import Path
bsp.db.export_reporting_data(project_path=Path("/my/project"))
# Writes:
# /my/project/reporting/data/single/transactions.csv
# /my/project/reporting/data/multi/statement_dimension.csv
# /my/project/reporting/data/multi/account_dimension.csv
# /my/project/reporting/data/multi/calendar_dimension.csv
# /my/project/reporting/data/multi/transaction_measures.csv
# /my/project/reporting/data/multi/daily_account_balances.csv
# /my/project/reporting/data/multi/missing_statement_report.csv