MCPcopy Index your code
hub / github.com/karpathy/reader3 / clean_html_content

Function clean_html_content

reader3.py:72–86  ·  view source on GitHub ↗
(soup: BeautifulSoup)

Source from the content-addressed store, hash-verified

70# --- Utilities ---
71
72def clean_html_content(soup: BeautifulSoup) -> BeautifulSoup:
73
74 # Remove dangerous/useless tags
75 for tag in soup(['script', 'style', 'iframe', 'video', 'nav', 'form', 'button']):
76 tag.decompose()
77
78 # Remove HTML comments
79 for comment in soup.find_all(string=lambda text: isinstance(text, Comment)):
80 comment.extract()
81
82 # Remove input tags
83 for tag in soup.find_all('input'):
84 tag.decompose()
85
86 return soup
87
88
89def extract_plain_text(soup: BeautifulSoup) -> str:

Callers 1

process_epubFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected