MCPcopy
hub / github.com/Jack-Cherish/python-spider / parse_doc

Function parse_doc

baiduwenku_pro_1.py:25–41  ·  view source on GitHub ↗
(content)

Source from the content-addressed store, hash-verified

23
24
25def parse_doc(content):
26 result = ''
27 url_list = re.findall('(https.*?0.json.*?)\\\\x22}', content)
28 url_list = [addr.replace("\\\\\\/", "/") for addr in url_list]
29 for url in url_list[:-5]:
30 content = fetch_url(url)
31 y = 0
32 txtlists = re.findall('"c":"(.*?)".*?"y":(.*?),', content)
33 for item in txtlists:
34 if not y == item[1]:
35 y = item[1]
36 n = '\n'
37 else:
38 n = ''
39 result += n
40 result += item[0].encode('utf-8').decode('unicode_escape', 'ignore')
41 return result
42
43
44def parse_txt(doc_id):

Callers 1

mainFunction · 0.85

Calls 2

fetch_urlFunction · 0.85
replaceMethod · 0.80

Tested by

no test coverage detected