Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Changed

  • Removed unused PyYAML dependency (#262)

[0.10.1] - 2021-10-12

Fixed

  • The visualise function properly uses the elements parameter in order to filter visualised elements. (#256)

Changed

  • Various dependency updates

[0.10.0] - 2021-07-01

  • [BREAKING] Changes from using pyqt5 to using tkinter for the visualise tool. This means we don’t need the python3-dev as a requirement, and seems to solve endless issues with pyqt5 not finding the correct qt bindings. This is a potential breaking change, although the visualise tool is only in the development version. No code changes are needed, but you will need tkinter installed for visualise to still work.
  • Changed python version from 3.6 to 3.8 in .readthedocs.yml.

[0.9.0] - 2021-06-09

Changed

  • Various dependency updates (matplotlib, pyqt5)
  • Removed all but the tests dockerfile for simplicity. Use Docker BuildKit. We will no longer be pushing images to DockerHub on release. (#203)

[0.8.0] - 2021-05-12

Changed

  • Various dependency updates
  • Updated CI to avoid login issue (#182)

[0.7.0] - 2021-01-15

Changed

  • Ensure we only accept LTTextBoxes at the top level (not LTTextLines) (#155)

[0.6.0] - 2020-12-11

Added

  • Enabled dependabot which should help to keep packages up to date (#124)

Changed

  • Various dependency updates

Fixed

  • Fixed a typo in simple memo example in the documentation. (#121)

[0.5.0] - 2020-07-05

Added

  • New functions on ElementList, move_forwards_from and move_backwards_from, to allow moving forwards and backwards from a certain element in the list easily. (#113)

Changed

  • When the layout parameter all_texts is True, the text inside figures is now also returned as elements in the document. (#99)

Fixed

  • Passing a tolerance less than the width/height of an element no longer causes an error. The tolerance is now capped at half the width/height of the element. (#103)

[0.4.0] - 2020-06-22

Added

  • Added __len__ and __repr__ functions to the Section class. (#90)
  • Added flag to extract_simple_table and extract_table functions to remove duplicate header rows. (#89)
  • You can now specify element_ordering when instantiating a PDFDocument. This defaults to the old behaviour or left to right, top to bottom. (#95)

Changed

  • Advanced layout analysis is now disabled by default. (#88)

[0.3.0] - 2020-05-14

Added

  • Published to PyPI as py-pdf-parser.
  • Documentation is now hosted here. (#71)
  • Added new examples to the documentation. (#74)
  • Font filtering now caches the elements by font. (#73) (updated in #78)
  • Font filtering now caches the elements by font. (#73)
  • The visualise tool now draws an outline around each section on the page. (#69) (updated in #80)

Changed

  • This product is now complete enough for the needs of Optimor Ltd, however jstockwin is going to continue development as a personal project. The repository has been moved from optimor/py-pdf-parser to jstockwin/py-pdf-parser.

[0.2.0] - 2020-04-17

Added

  • It is now possible to specify font_size_precision when instantiating a PDFDocument. This is the number of decimal places the font size will be rounded to. (#60)
  • extract_simple_table now allows extracting tables with gaps, provided there is at least one full row and one full column. This is only the case if you pass allow_gaps=True, otherwise the original logic of raising an exception if there a gap remains. You can optionally pass a reference_element which must be in both a full row and a full column, this defaults to the first (top-left) element. (#57)

Changed

  • Font sizes are now float not int. The font_size_precision in the additions defaults to 1, and as such all fonts will change to have a single decimal place. To keep the old behaviour, you can pass font_size_precision=0 when instantiating your PDFDocument.

Fixed

  • Improved performance of extract_simple_table, which is now much faster. (#65)

[0.1.0] - 2020-04-08

Added

  • Initial version of the product. Note: The version is less than 1, so this product should not yet be considered stable. API changes and other breaking changes are possible, if not likely.