MCPcopy
hub / github.com/codelucas/newspaper / get_replacement_nodes

Method get_replacement_nodes

newspaper/cleaners.py:180–210  ·  view source on GitHub ↗
(self, doc, div)

Source from the content-addressed store, hash-verified

178 next_node = self.parser.nextSibling(next_node)
179
180 def get_replacement_nodes(self, doc, div):
181 replacement_text = []
182 nodes_to_return = []
183 nodes_to_remove = []
184 kids = self.parser.childNodesWithText(div)
185 for kid in kids:
186 # The node is a <p> and already has some replacement text
187 if self.parser.getTag(kid) == 'p' and len(replacement_text) > 0:
188 new_node = self.get_flushed_buffer(
189 ''.join(replacement_text), doc)
190 nodes_to_return.append(new_node)
191 replacement_text = []
192 nodes_to_return.append(kid)
193 # The node is a text node
194 elif self.parser.isTextNode(kid):
195 kid_text = self.parser.getText(kid)
196 self.replace_walk_left_right(kid, kid_text, replacement_text,
197 nodes_to_remove)
198 else:
199 nodes_to_return.append(kid)
200
201 # flush out anything still remaining
202 if(len(replacement_text) > 0):
203 new_node = self.get_flushed_buffer(''.join(replacement_text), doc)
204 nodes_to_return.append(new_node)
205 replacement_text = []
206
207 for n in nodes_to_remove:
208 self.parser.remove(n)
209
210 return nodes_to_return
211
212 def replace_with_para(self, doc, div):
213 self.parser.replaceTag(div, 'p')

Callers 1

div_to_paraMethod · 0.95

Calls 9

get_flushed_bufferMethod · 0.95
childNodesWithTextMethod · 0.80
getTagMethod · 0.80
joinMethod · 0.80
appendMethod · 0.80
isTextNodeMethod · 0.80
getTextMethod · 0.80
removeMethod · 0.80

Tested by

no test coverage detected