MCPcopy
hub / github.com/unclecode/crawl4ai / remove_empty_and_low_word_count_elements

Function remove_empty_and_low_word_count_elements

crawl4ai/utils.py:376–383  ·  view source on GitHub ↗
(node, word_count_threshold)

Source from the content-addressed store, hash-verified

374
375 # Recursively remove empty elements, their parent elements, and elements with word count below threshold
376 def remove_empty_and_low_word_count_elements(node, word_count_threshold):
377 for child in node.contents:
378 if isinstance(child, element.Tag):
379 remove_empty_and_low_word_count_elements(child, word_count_threshold)
380 word_count = len(child.get_text(strip=True).split())
381 if (len(child.contents) == 0 and not child.get_text(strip=True)) or word_count < word_count_threshold:
382 child.decompose()
383 return node
384
385 body = remove_empty_and_low_word_count_elements(body, word_count_threshold)
386

Callers 1

get_content_of_websiteFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…