MCPcopy
hub / github.com/ArtifexSoftware/pdf2docx / parse_block

Method parse_block

pdf2docx/layout/Blocks.py:258–271  ·  view source on GitHub ↗

Group lines into text block.

(self, max_line_spacing_ratio:float, line_break_free_space_ratio:float, new_paragraph_free_space_ratio:float)

Source from the content-addressed store, hash-verified

256
257
258 def parse_block(self, max_line_spacing_ratio:float, line_break_free_space_ratio:float, new_paragraph_free_space_ratio:float):
259 '''Group lines into text block.'''
260 # sort in normal reading order
261 self.sort_in_reading_order_plus()
262
263 # join lines with similar properties, e.g. spacing, together into text block
264 blocks = self._join_lines_vertically(max_line_spacing_ratio)
265
266 # split text block by checking text
267 blocks = self._split_text_block_vertically(blocks,
268 line_break_free_space_ratio,
269 new_paragraph_free_space_ratio)
270
271 self.reset(blocks)
272
273
274 def parse_text_format(self, rects, delete_end_line_hyphen:bool):

Callers 1

_parse_paragraphMethod · 0.80

Calls 4

resetMethod · 0.80

Tested by

no test coverage detected