Loaders

class py_pdf_parser.loaders.Page

This is used to pass PDF Miner elements of a page when instantiating PDFDocument.

Parameters:
  • width (int) – The width of the page.
  • height (int) – The height of the page.
  • elements (list) – A list of PDF Miner elements (LTTextBox) on the page.
elements

Alias for field number 2

height

Alias for field number 1

width

Alias for field number 0

py_pdf_parser.loaders.load(pdf_file: IO, pdf_file_path: Optional[str] = None, la_params: Optional[Dict[KT, VT]] = None, **kwargs) → py_pdf_parser.components.PDFDocument

Loads the pdf file into a PDFDocument.

Parameters:
  • pdf_file (io) – The PDF file.
  • la_params (dict) – The layout parameters passed to PDF Miner for analysis. See the PDFMiner documentation here: https://pdfminersix.readthedocs.io/en/latest/reference/composable.html#laparams. Note that py_pdf_parser will re-order the elements it receives from PDFMiner so options relating to element ordering will have no effect.
  • pdf_file_path (str, optional) – Passed to PDFDocument. See the documentation for PDFDocument.
  • kwargs – Passed to PDFDocument. See the documentation for PDFDocument.
Returns:

A PDFDocument with the file loaded.

Return type:

PDFDocument

py_pdf_parser.loaders.load_file(path_to_file: str, la_params: Optional[Dict[KT, VT]] = None, **kwargs) → py_pdf_parser.components.PDFDocument

Loads a file according to the specified file path.

All other arguments are passed to load, see the documentation for load.

Returns:A PDFDocument with the specified file loaded.
Return type:PDFDocument