MCPcopy
hub / github.com/ArtifexSoftware/pdf2docx / _parse_text_format

Method _parse_text_format

pdf2docx/text/TextSpan.py:272–331  ·  view source on GitHub ↗

Parse text style based on the position to a rect shape. Args: rect (Shape): Target rect shape representing potential text style. horizontal (bool, optional): Horizontal text direction. Defaults to True. Returns: bool: Parsed text style successful

(self, rect:Shape, horizontal:bool=True)

Source from the content-addressed store, hash-verified

270
271
272 def _parse_text_format(self, rect:Shape, horizontal:bool=True):
273 """Parse text style based on the position to a rect shape.
274
275 Args:
276 rect (Shape): Target rect shape representing potential text style.
277 horizontal (bool, optional): Horizontal text direction. Defaults to True.
278
279 Returns:
280 bool: Parsed text style successfully or not.
281 """
282
283 # Skip table border/shading
284 if rect.equal_to_type(RectType.BORDER) or rect.equal_to_type(RectType.SHADING):
285 return False
286
287 # set hyperlink
288 if rect.equal_to_type(RectType.HYPERLINK):
289 self.style.append({
290 'type': rect.type,
291 'color': rect.color,
292 'uri': rect.uri
293 })
294 return True
295
296 # considering text direction
297 idx = 1 if horizontal else 0
298
299 # recognize text format based on rect and the span it applying to
300 # region height
301 h_rect = rect.bbox[idx+2] - rect.bbox[idx]
302 h_span = self.bbox[idx+2] - self.bbox[idx]
303
304 # distance to span bottom border
305 d = abs(self.bbox[idx+2] - rect.bbox[idx])
306
307 # highlight: both the rect height and overlap must be large enough
308 if h_rect >= 0.5*h_span:
309 # In general, highlight color isn't white
310 if rect.color != rgb_value((1,1,1)) and \
311 self.get_main_bbox(rect, constants.FACTOR_MAJOR):
312 rect.type = RectType.HIGHLIGHT
313
314 # near to bottom of span? yes, underline
315 elif d <= 0.25*h_span:
316 rect.type = RectType.UNDERLINE
317
318 # near to center of span? yes, strike-through-line
319 elif 0.35*h_span < d < 0.75*h_span:
320 rect.type = RectType.STRIKE
321
322 # check rect type again
323 if not rect.is_determined: return False
324
325 style = {
326 'type': rect.type,
327 'color': rect.color
328 }
329 self.style.append(style)

Callers 1

splitMethod · 0.80

Calls 4

rgb_valueFunction · 0.85
equal_to_typeMethod · 0.80
get_main_bboxMethod · 0.80
appendMethod · 0.45

Tested by

no test coverage detected