Method get_word_list

utils/advanced_ocr.py:289–318 · view source on GitHub ↗

从图像中获取词语列表 Args: image_path: 图像文件路径 Returns: List[str]: 词语列表

(self, image_path: str)

Source from the content-addressed store, hash-verified

287	return None
288
289	def get_word_list(self, image_path: str) -> List[str]:
290	"""
291	从图像中获取词语列表
292
293	Args:
294	image_path: 图像文件路径
295
296	Returns:
297	List[str]: 词语列表
298	"""
299	text, backup_text = self.extract_text_from_image(image_path)
300	words = []
301
302	if text:
303	processed = self.process_text(text)
304	words.extend(processed.words)
305	if processed.cleaned:
306	words.append(processed.cleaned)
307	if processed.no_spaces:
308	words.append(processed.no_spaces)
309
310	if backup_text and backup_text != text:
311	processed_backup = self.process_text(backup_text)
312	words.extend(processed_backup.words)
313	if processed_backup.cleaned:
314	words.append(processed_backup.cleaned)
315	if processed_backup.no_spaces:
316	words.append(processed_backup.no_spaces)
317
318	return list(set(words))
319
320	def extract_xml_text(self, xml_content: str) -> str:
321	"""从XML内容中提取可视文本"""

nothing calls this directly

extract_text_from_imageMethod · 0.95

process_textMethod · 0.95

no test coverage detected