Skip to content

Export Options

bank_statement_parser supports exporting report data in CSV, Excel, and JSON formats, with two export presets and a dedicated reporting feed for external BI tools. Exports are generated automatically by bsp process or can be triggered manually via the Python API.

Export Presets

Preset Description
simple (default) A single flat transactions table joining all dimensions. Best for spreadsheet analysis.
full Separate star-schema tables (accounts, calendar, statements, transactions, balances, gaps) for loading into an external database or BI tool.

CLI Usage

# Default: export simple preset in both CSV and Excel from database
bsp process --pdfs ~/statements

# Export full star-schema tables as CSV only
bsp process --pdfs ~/statements --export-type full --export-format csv

# Skip export entirely
bsp process --pdfs ~/statements --no-export
Option Choices Default Description
--export-type simple, full simple Export preset
--export-format excel, csv, json, all, reporting all Output file format
--no-export off Skip the export step entirely

Python API

Export functions

bsp.db provides export_csv(), export_excel(), export_json(), and export_reporting_data():

export_csv()

export_csv(folder: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None

Write report data to CSV files in folder.

Each table is written as a separate .csv file named after its logical table name (e.g. transactions.csv, or statement_dimension.csv, account_dimension.csv, etc. for type="multi").

When filename_timestamp is True:

  • type="single": the timestamp is appended to the filename, e.g. transactions_20250331143022.csv.
  • type="multi": files are written into a multi_20250331143022/ sub-folder inside folder with their original names.

When filename_timestamp is False:

  • type="single": files are written directly to folder with their original names, e.g. transactions.csv.
  • type="multi": files are written into a multi/ sub-folder inside folder with their original names.

Args:

  • folder — Directory to write CSV files into. When None the project's export/csv/ directory (resolved via project_path) is used and created automatically if absent.
  • type — Export preset — "single" (flat transactions table) or "multi" (separate star-schema tables for loading into a database). Defaults to "single".
  • project_path — Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project when None.
  • batch_id — Optional batch identifier to filter report data to a single batch. When None all rows are exported.
  • filename_timestamp — When True, append a human-readable timestamp (yyyymmddHHMMSS) to the filename (single) or create a timestamped sub-folder (multi). Defaults to False.

export_excel()

export_excel(path: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None

Write report data to an Excel workbook at path.

Each table is written as a separate worksheet. For type="single" a single transactions sheet is written; for type="multi" six sheets are written (statement_dimension, account_dimension, calendar_dimension, transaction_measures, daily_account_balances, missing_statement_report).

Filename conventions:

  • type="single", no timestamp: transactions.xlsx
  • type="single", with timestamp: transactions_20250331143022.xlsx
  • type="multi", no timestamp: transactions_multi.xlsx
  • type="multi", with timestamp: transactions_multi_20250331143022.xlsx

Worksheet names are never modified by the timestamp or type logic.

Args:

  • path — Full file path for the output .xlsx workbook. When None the file is written to export/excel/transactions.xlsx inside the project directory resolved via project_path.
  • type — Export preset — "single" (flat transactions table) or "multi" (separate star-schema sheets for loading into a database). Defaults to "single".
  • project_path — Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project when None.
  • batch_id — Optional batch identifier to filter report data to a single batch. When None all rows are exported.
  • filename_timestamp — When True, append a human-readable timestamp (yyyymmddHHMMSS) to the workbook filename. Worksheet names are unaffected. Defaults to False.

export_json()

export_json(folder: Path | None = None, type: Literal['single', 'multi'] = 'single', project_path: Path | None = None, batch_id: str | None = None, filename_timestamp: bool = False) -> None

Write report data to JSON files in folder.

Each table is written as a separate .json file containing a JSON array of row objects, named after its logical table name (e.g. transactions.json, or statement_dimension.json, account_dimension.json, etc. for type="multi").

When filename_timestamp is True:

  • type="single": the timestamp is appended to the filename, e.g. transactions_20250331143022.json.
  • type="multi": files are written into a multi_20250331143022/ sub-folder inside folder with their original names.

When filename_timestamp is False:

  • type="single": files are written directly to folder with their original names, e.g. transactions.json.
  • type="multi": files are written into a multi/ sub-folder inside folder with their original names.

Args:

  • folder — Directory to write JSON files into. When None the project's export/json/ directory (resolved via project_path) is used and created automatically if absent.
  • type — Export preset — "single" (flat transactions table) or "multi" (separate star-schema tables for loading into a database). Defaults to "single".
  • project_path — Optional project root used to resolve the default export folder and data sources. Falls back to the bundled default project when None.
  • batch_id — Optional batch identifier to filter report data to a single batch. When None all rows are exported.
  • filename_timestamp — When True, append a human-readable timestamp (yyyymmddHHMMSS) to the filename (single) or create a timestamped sub-folder (multi). Defaults to False.

export_reporting_data()

export_reporting_data(project_path: Path | None = None) -> None

Write CSV reporting feeds to the project's reporting/data/ sub-directories.

Calls :func:export_csv twice — once with type="single" writing to reporting/data/single/ and once with type="multi" writing to reporting/data/multi/ (created as a sub-folder of reporting/data/ by the multi-export logic). Both directories are created automatically if absent.

This produces a stable set of CSV files that external reporting tools (e.g. Power BI, Tableau, Excel) can point at directly without needing to know about the full export machinery.

Args:

  • project_path — Optional project root directory. Falls back to the bundled default project when None.

Example:

    import bank_statement_parser as bsp
    from pathlib import Path

    bsp.db.export_reporting_data(project_path=Path("/my/project"))
    # Writes:
    #   /my/project/reporting/data/single/transactions.csv
    #   /my/project/reporting/data/multi/statement_dimension.csv
    #   /my/project/reporting/data/multi/account_dimension.csv
    #   /my/project/reporting/data/multi/calendar_dimension.csv
    #   /my/project/reporting/data/multi/transaction_measures.csv
    #   /my/project/reporting/data/multi/daily_account_balances.csv
    #   /my/project/reporting/data/multi/missing_statement_report.csv

Usage examples

import bank_statement_parser as bsp

# Export simple CSV from database backend (default project)
bsp.db.export_csv()

# Export full star-schema tables to Excel
bsp.db.export_excel(type='full')

# Export JSON
bsp.db.export_json()

# Write stable CSV feeds for BI tools (simple + full presets)
bsp.db.export_reporting_data()

# Export to a custom directory
from pathlib import Path
bsp.db.export_csv(folder=Path('~/exports'))
bsp.db.export_excel(path=Path('~/exports/report.xlsx'))

Report Classes

All report classes expose a .all attribute containing a pl.LazyFrame. Call .collect() to materialise the data.

import bank_statement_parser as bsp

# Read from the DB backend
df = bsp.db.FlatTransaction().all.collect()

Available classes

FlatTransaction

Denormalised transaction view joining all dimensions. One row per transaction with date, account, statement, and value columns.

FactBalance

Daily balance series per account. Forward-filled from statement data to cover every calendar date.

DimTime

Calendar dimension. One row per date spanning the full transaction date range, with year, quarter, month, week, and day attributes.

DimStatement

Statement dimension. One row per parsed PDF statement with statement date, filename, and batch timestamp.

DimAccount

Account dimension. One row per unique account with company, type, number, sort code, and holder.

FactTransaction

Transaction fact table. One row per transaction with foreign keys to dimension tables.

GapReport

Gap detection report. Flags periods where the closing balance of one statement does not match the opening balance of the next. Access .gaps for filtered gap rows only.

Output Files

Simple preset

Format File Contents
CSV export/csv/transactions_table.csv Flat transaction table
Excel export/excel/transactions.xlsx Sheet: transactions_table
JSON export/json/transactions_table.json Flat transaction table (JSON array)

Full preset

Format Files / Sheets Contents
CSV export/csv/statement.csv, account.csv, calendar.csv, transactions.csv, balances.csv, gaps.csv Separate star-schema tables
Excel export/excel/transactions.xlsx with sheets: statement, account, calendar, transactions, balances, gaps Star-schema workbook
JSON export/json/statement.json, account.json, calendar.json, transactions.json, balances.json, gaps.json Separate star-schema tables (JSON arrays)

Reporting feed (--export-format reporting)

Preset Path Contents
simple reporting/data/simple/transactions_table.csv Flat transaction table
full reporting/data/full/statement.csv, account.csv, calendar.csv, transactions.csv, balances.csv, gaps.csv Star-schema CSV feeds