Loaders¶
-
class
py_pdf_parser.loaders.
Page
¶ This is used to pass PDF Miner elements of a page when instantiating PDFDocument.
Parameters: - width (int) – The width of the page.
- height (int) – The height of the page.
- elements (list) – A list of PDF Miner elements (LTTextBox) on the page.
-
elements
¶ Alias for field number 2
-
height
¶ Alias for field number 1
-
width
¶ Alias for field number 0
-
py_pdf_parser.loaders.
load
(pdf_file: IO, pdf_file_path: Optional[str] = None, la_params: Optional[Dict] = None, **kwargs) → py_pdf_parser.components.PDFDocument¶ Loads the pdf file into a PDFDocument.
Parameters: - pdf_file (io) – The PDF file.
- la_params (dict) – The layout parameters passed to PDF Miner for analysis. See the PDFMiner documentation here: https://pdfminersix.readthedocs.io/en/latest/api/composable.html#laparams. Note that py_pdf_parser will re-order the elements it receives from PDFMiner so options relating to element ordering will have no effect.
- pdf_file_path (str, optional) – Passed to PDFDocument. See the documentation for PDFDocument.
- kwargs – Passed to PDFDocument. See the documentation for PDFDocument.
Returns: A PDFDocument with the file loaded.
Return type:
-
py_pdf_parser.loaders.
load_file
(path_to_file: str, la_params: Optional[Dict] = None, **kwargs) → py_pdf_parser.components.PDFDocument¶ Loads a file according to the specified file path.
All other arguments are passed to load, see the documentation for load.
Returns: A PDFDocument with the specified file loaded. Return type: PDFDocument