Skip to content

Python API Reference

All public symbols are available from the top-level package:

import bank_statement_parser as bsp

Meta

bsp.__app_name__

constant

bsp.__version__

constant

Namespaced report backend

bsp.db

modulebank_statement_parser.modules.reports_db

SQLite-backed report classes and export helpers.

Statement processing

bsp.Statement

classbank_statement_parser.modules.statements

Represents a single bank statement PDF with data extraction and validation.

bsp.StatementBatch

classbank_statement_parser.modules.statements

Handles batch processing of multiple bank statement PDFs.

bsp.process_pdf_statement()

functionbank_statement_parser.modules.statements

Process a single bank statement PDF and save results to parquet files.

bsp.copy_statements_to_project()

functionbank_statement_parser.modules.statements

Copy processed statement PDFs into the project statements/ directory.

bsp.delete_temp_files()

functionbank_statement_parser.modules.statements

Delete temporary parquet files created during batch processing.

Low-level persistence helpers

bsp.update_parquet()

functionbank_statement_parser.modules.parquet

Update parquet files with processed results from all PDFs in a batch.

bsp.update_db()

functionbank_statement_parser.modules.database

Insert processed batch results into the SQLite database.

Data structures

bsp.PdfResult

classbank_statement_parser.modules.data

Top-level result returned by :func:~bank_statement_parser.modules.statements.process_pdf_statement.

bsp.Success

classbank_statement_parser.modules.data

Payload for a fully-validated PDF result.

bsp.Review

classbank_statement_parser.modules.data

Payload for a PDF where extraction succeeded but CAB validation failed.

bsp.Failure

classbank_statement_parser.modules.data

Payload for a PDF result where no usable statement data was produced.

bsp.StatementInfo

classbank_statement_parser.modules.data

Statement-level metadata extracted from a successfully validated PDF.

bsp.ParquetFiles

classbank_statement_parser.modules.data

Paths to the statement-level temporary Parquet files written on the SUCCESS path.

Debug / diagnostics

bsp.debug_pdf_statement()

functionbank_statement_parser.modules.debug

Re-process a single failing PDF and write a debug.json diagnostic file.

bsp.debug_statements()

functionbank_statement_parser.modules.debug

Re-process all failing statements from a completed batch and write debug files.

Errors

bsp.StatementError

classbank_statement_parser.modules.errors

bsp.ProjectDatabaseMissing

classbank_statement_parser.modules.errors

bsp.ProjectConfigMissing

classbank_statement_parser.modules.errors

Config helpers

bsp.copy_default_import_config()

functionbank_statement_parser.modules.import_config

Copy all default import TOML configuration files to a destination directory.

bsp.copy_project_folders()

functionbank_statement_parser.modules.paths

Copy the project folder structure (directories only) to a destination.

bsp.validate_or_initialise_project()

functionbank_statement_parser.modules.paths

Validate an existing project or initialise a new one at project_path.

bsp.ProjectPaths

classbank_statement_parser.modules.paths

All file-system paths for a bank_statement_parser project, derived from a single root directory.

Low-level PDF helpers

bsp.pdf_open()

functionbank_statement_parser.modules.pdf_functions

Open a PDF file and return the PDF object with performance logging.

bsp.page_crop()

functionbank_statement_parser.modules.pdf_functions

Crop a PDF page to the specified bounding box coordinates, with smart defaults.

bsp.page_text()

functionbank_statement_parser.modules.pdf_functions

Extract all text content from a PDF page.

functionbank_statement_parser.modules.pdf_functions

Search for a regex pattern within a PDF region and return the first match text.

bsp.get_table_from_region()

functionbank_statement_parser.modules.pdf_functions

Extract a structured table from a PDF region using configurable extraction settings.

Data-mart / database

bsp.build_datamart()

functionbank_statement_parser.data

Empty and rebuild all mart tables (DimTime, DimAccount, DimStatement, FactTransaction, FactBalance) from the raw source tables.

bsp.create_db()

functionbank_statement_parser.data

Create (or recreate) the raw SQLite database with all tables and indexes.

bsp.Housekeeping

classbank_statement_parser.data

Orphan-detection and cascaded-delete helper for the raw SQLite database.

PDF anonymisation

bsp.anonymise_pdf()

functionbank_statement_parser.modules.anonymise

Anonymise a single PDF using exclusion-based full-page letter scrambling.

bsp.anonymise_folder()

functionbank_statement_parser.modules.anonymise

Anonymise all PDFs matching pattern in folder_path using exclusion-based scrambling.

Forex / currency conversion

bsp.get_exchange_rates()

functionbank_statement_parser.modules.forex

Fetch daily USD-based exchange rates and persist them to exchange_rates.

bsp.ForexApiConfig

classbank_statement_parser.modules.data

Configuration for the forex exchange-rate fetching service.

Testing harness

bsp.TestHarness

classbank_statement_parser.testing

Programmatic test environment for integration testing by dependent projects.

bsp.TestGateFailure

classbank_statement_parser.modules.errors

Raised when bsp's own pytest suite fails during TestHarness.setup().

DB Report Backend

bsp.db exposes report classes and export functions backed by the SQLite star-schema.

# SQLite backend
flat = bsp.db.FlatTransaction().all.collect()
bsp.db.export_csv()
bsp.db.export_excel()
bsp.db.export_json()
bsp.db.export_reporting_data()

Report classes

Available in bsp.db:

Class Description
FlatTransaction Denormalised transaction view joining all dimensions.
FactBalance Daily balance series per account (forward-filled).
DimTime Calendar dimension — one row per date in the transaction range.
DimStatement Statement dimension — one row per parsed PDF statement.
DimAccount Account dimension — one row per unique account.
FactTransaction Transaction fact table — one row per transaction.
GapReport Gap detection report — flags missing statement periods.

Export helpers

Available in bsp.db:

export_csv()

export_csv(folder: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None

Write report data to CSV files in folder.

Each table is written as a separate .csv file named after its logical table name (e.g. transactions.csv, or statement_dimension.csv, account_dimension.csv, etc. for type="multi").

When filename_timestamp is True:

  • type="single": the timestamp is appended to the filename, e.g. transactions_20250331143022.csv.
  • type="multi": files are written into a multi_20250331143022/ sub-folder inside folder with their original names.

When filename_timestamp is False:

  • type="single": files are written directly to folder with their original names, e.g. transactions.csv.
  • type="multi": files are written into a multi/ sub-folder inside folder with their original names.

Args:

  • folder — Directory to write CSV files into. When None the project's export/csv/ directory (resolved via project_path) is used and created automatically if absent.
  • type — Export preset — "single" (flat transactions table) or "multi" (separate star-schema tables for loading into a database). Defaults to "single".
  • project_path — Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project when None.
  • batch_id — Optional batch identifier to filter report data to a single batch. When None all rows are exported.
  • filename_timestamp — When True, append a human-readable timestamp (yyyymmddHHMMSS) to the filename (single) or create a timestamped sub-folder (multi). Defaults to False.

export_excel()

export_excel(path: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None

Write report data to an Excel workbook at path.

Each table is written as a separate worksheet. For type="single" a single transactions sheet is written; for type="multi" six sheets are written (statement_dimension, account_dimension, calendar_dimension, transaction_measures, daily_account_balances, missing_statement_report).

Filename conventions:

  • type="single", no timestamp: transactions.xlsx
  • type="single", with timestamp: transactions_20250331143022.xlsx
  • type="multi", no timestamp: transactions_multi.xlsx
  • type="multi", with timestamp: transactions_multi_20250331143022.xlsx

Worksheet names are never modified by the timestamp or type logic.

Args:

  • path — Full file path for the output .xlsx workbook. When None the file is written to export/excel/transactions.xlsx inside the project directory resolved via project_path.
  • type — Export preset — "single" (flat transactions table) or "multi" (separate star-schema sheets for loading into a database). Defaults to "single".
  • project_path — Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project when None.
  • batch_id — Optional batch identifier to filter report data to a single batch. When None all rows are exported.
  • filename_timestamp — When True, append a human-readable timestamp (yyyymmddHHMMSS) to the workbook filename. Worksheet names are unaffected. Defaults to False.

export_json()

export_json(folder: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None

Write report data to JSON files in folder.

Each table is written as a separate .json file containing a JSON array of row objects, named after its logical table name (e.g. transactions.json, or statement_dimension.json, account_dimension.json, etc. for type="multi").

When filename_timestamp is True:

  • type="single": the timestamp is appended to the filename, e.g. transactions_20250331143022.json.
  • type="multi": files are written into a multi_20250331143022/ sub-folder inside folder with their original names.

When filename_timestamp is False:

  • type="single": files are written directly to folder with their original names, e.g. transactions.json.
  • type="multi": files are written into a multi/ sub-folder inside folder with their original names.

Args:

  • folder — Directory to write JSON files into. When None the project's export/json/ directory (resolved via project_path) is used and created automatically if absent.
  • type — Export preset — "single" (flat transactions table) or "multi" (separate star-schema tables for loading into a database). Defaults to "single".
  • project_path — Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project when None.
  • batch_id — Optional batch identifier to filter report data to a single batch. When None all rows are exported.
  • filename_timestamp — When True, append a human-readable timestamp (yyyymmddHHMMSS) to the filename (single) or create a timestamped sub-folder (multi). Defaults to False.

export_reporting_data()

export_reporting_data(project_path: Path | None = None) -> None

Write CSV reporting feeds to the project's reporting/data/ sub-directories.

Calls :func:export_csv twice — once with type="single" writing to reporting/data/single/ and once with type="multi" writing to reporting/data/multi/ (created as a sub-folder of reporting/data/ by the multi-export logic). Both directories are created automatically if absent.

This produces a stable set of CSV files that external reporting tools (e.g. Power BI, Tableau, Excel) can point at directly without needing to know about the full export machinery.

Args:

  • project_path — Optional project root directory. Falls back to the bundled default project when None.

Example:

    import bank_statement_parser as bsp
    from pathlib import Path

    bsp.db.export_reporting_data(project_path=Path("/my/project"))
    # Writes:
    #   /my/project/reporting/data/single/transactions.csv
    #   /my/project/reporting/data/multi/statement_dimension.csv
    #   /my/project/reporting/data/multi/account_dimension.csv
    #   /my/project/reporting/data/multi/calendar_dimension.csv
    #   /my/project/reporting/data/multi/transaction_measures.csv
    #   /my/project/reporting/data/multi/daily_account_balances.csv
    #   /my/project/reporting/data/multi/missing_statement_report.csv