MCPcopy
hub / github.com/codelucas/newspaper / parse

Method parse

newspaper/source.py:185–194  ·  view source on GitHub ↗

Sets the lxml root, also sets lxml roots of all children links, also sets description

(self)

Source from the content-addressed store, hash-verified

183 self.feeds = [f for f in self.feeds if f.rss]
184
185 def parse(self):
186 """Sets the lxml root, also sets lxml roots of all
187 children links, also sets description
188 """
189 # TODO: This is a terrible idea, ill try to fix it when i'm more rested
190 self.doc = self.config.get_parser().fromstring(self.html)
191 if self.doc is None:
192 print('[Source parse ERR]', self.url)
193 return
194 self.set_description()
195
196 def parse_categories(self):
197 """Parse out the lxml root in each category

Callers 6

buildMethod · 0.95
test_cache_categoriesMethod · 0.95
hotFunction · 0.45
parse_feedsMethod · 0.45
parse_articlesMethod · 0.45
test_parse_htmlMethod · 0.45

Calls 3

set_descriptionMethod · 0.95
fromstringMethod · 0.80
get_parserMethod · 0.80

Tested by 2

test_cache_categoriesMethod · 0.76
test_parse_htmlMethod · 0.36