Changelog¶

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]¶

Added __len__ and __repr__ functions to the Section class. (#90)
Added flag to extract_simple_table and extract_table functions to remove duplicate header rows. (#89)
You can now specify element_ordering when instantiating a PDFDocument. This defaults to the old behaviour or left to right, top to bottom. (#95)

Published to PyPI as py-pdf-parser.
Documentation is now hosted here. (#71)
Added new examples to the documentation. (#74)
Font filtering now caches the elements by font. (#73) (updated in #78)
Font filtering now caches the elements by font. (#73)
The visualise tool now draws an outline around each section on the page. (#69) (updated in #80)

This product is now complete enough for the needs of Optimor Ltd, however jstockwin is going to continue development as a personal project. The repository has been moved from optimor/py-pdf-parser to jstockwin/py-pdf-parser.

It is now possible to specify font_size_precision when instantiating a PDFDocument. This is the number of decimal places the font size will be rounded to. (#60)
extract_simple_table now allows extracting tables with gaps, provided there is at least one full row and one full column. This is only the case if you pass allow_gaps=True, otherwise the original logic of raising an exception if there a gap remains. You can optionally pass a reference_element which must be in both a full row and a full column, this defaults to the first (top-left) element. (#57)

Font sizes are now float not int. The font_size_precision in the additions defaults to 1, and as such all fonts will change to have a single decimal place. To keep the old behaviour, you can pass font_size_precision=0 when instantiating your PDFDocument.

Initial version of the product. Note: The version is less than 1, so this product should not yet be considered stable. API changes and other breaking changes are possible, if not likely.