Sectioning

class py_pdf_parser.sectioning.Section(document: PDFDocument, name: str, unique_name: str, start_element: PDFElement, end_element: PDFElement)

A continuous group of elements within a document.

A section is intended to label a group of elements. Said elements must be continuous in the document.

Warning

You should not instantiate a Section class yourself, but should call create_section from the Sectioning class below.

Parameters:
  • document (PDFDocument) – A reference to the document.
  • name (str) – The name of the section.
  • unique_name (str) – Multiple sections can have the same name, but a unique name will be generated by the Sectioning class.
  • start_element (PDFElement) – The first element in the section.
  • end_element (PDFElement) – The last element in the section.
elements

All the elements in the section.

Returns:All the elements in the section.
Return type:ElementList
class py_pdf_parser.sectioning.Sectioning(document: PDFDocument)

A sectioning utilities class, made available on all PDFDocuments as .sectioning.

create_section(name: str, start_element: PDFElement, end_element: PDFElement, include_last_element: bool = True) → Section

Creates a new section with the specified name.

Creates a new section with the specified name, starting at start_element and ending at end_element (inclusive). The unique name will be set to name_<idx> where <idx> is the number of existing sections with that name.

Parameters:
  • name (str) – The name of the new section.
  • start_element (PDFElement) – The first element in the section.
  • end_element (PDFElement) – The last element in the section.
  • include_last_element (bool) – Whether the end_element should be included in the section, or only the elements which are strictly before the end element. Default: True (i.e. include end_element).
Returns:

The created section.

Return type:

Section

Raises:

InvalidSectionError – If a the created section would be invalid. This is usually because the end_element comes after the start element.

get_section(unique_name: str) → py_pdf_parser.sectioning.Section

Returns the section with the given unique name.

Raises:SectionNotFoundError – If there is no section with the given unique_name.
get_sections_with_name(name: str) → Generator[py_pdf_parser.sectioning.Section, None, None]

Returns a list of all sections with the given name.

sections

Returns the list of all created Sections.