Loaders¶

class py_pdf_parser.loaders.Page¶

This is used to pass PDF Miner elements of a page when instantiating PDFDocument.

Parameters:	width (int) – The width of the page. height (int) – The height of the page. elements (list) – A list of PDF Miner elements (LTTextBox) on the page.

elements¶: Alias for field number 2

height¶: Alias for field number 1

width¶: Alias for field number 0

py_pdf_parser.loaders.load(pdf_file: IO, pdf_file_path: Optional[str] = None, la_params: Optional[Dict] = None, **kwargs) → py_pdf_parser.components.PDFDocument¶

Loads the pdf file into a PDFDocument.

Parameters:	pdf_file (io) – The PDF file. la_params (dict) – The layout parameters passed to PDF Miner for analysis. See the PDFMiner documentation here: https://pdfminersix.readthedocs.io/en/latest/api/composable.html#laparams. Note that py_pdf_parser will re-order the elements it receives from PDFMiner so options relating to element ordering will have no effect. pdf_file_path (str, optional) – Passed to PDFDocument. See the documentation for PDFDocument. kwargs – Passed to PDFDocument. See the documentation for PDFDocument.
Returns:	A PDFDocument with the file loaded.
Return type:	PDFDocument

py_pdf_parser.loaders.load_file(path_to_file: str, la_params: Optional[Dict] = None, **kwargs) → py_pdf_parser.components.PDFDocument¶

Loads a file according to the specified file path.

All other arguments are passed to load, see the documentation for load.

Returns:	A PDFDocument with the specified file loaded.
Return type:	PDFDocument