MCPcopy
hub / github.com/codelucas/newspaper / div_to_para

Method div_to_para

newspaper/cleaners.py:215–232  ·  view source on GitHub ↗
(self, doc, dom_type)

Source from the content-addressed store, hash-verified

213 self.parser.replaceTag(div, 'p')
214
215 def div_to_para(self, doc, dom_type):
216 bad_divs = 0
217 else_divs = 0
218 divs = self.parser.getElementsByTag(doc, tag=dom_type)
219 tags = ['a', 'blockquote', 'dl', 'div', 'img', 'ol', 'p',
220 'pre', 'table', 'ul']
221 for div in divs:
222 items = self.parser.getElementsByTags(div, tags)
223 if div is not None and len(items) == 0:
224 self.replace_with_para(doc, div)
225 bad_divs += 1
226 elif div is not None:
227 replaceNodes = self.get_replacement_nodes(doc, div)
228 div.clear()
229 for c, n in enumerate(replaceNodes):
230 div.insert(c, n)
231 else_divs += 1
232 return doc

Callers 1

cleanMethod · 0.95

Calls 4

replace_with_paraMethod · 0.95
get_replacement_nodesMethod · 0.95
getElementsByTagMethod · 0.80
getElementsByTagsMethod · 0.80

Tested by

no test coverage detected