MCPcopy Index your code
hub / github.com/Jack-Cherish/python-spider / parse_txt

Function parse_txt

baiduwenku_pro_1.py:44–56  ·  view source on GitHub ↗
(doc_id)

Source from the content-addressed store, hash-verified

42
43
44def parse_txt(doc_id):
45 content_url = 'https://wenku.baidu.com/api/doc/getdocinfo?callback=cb&doc_id=' + doc_id
46 content = fetch_url(content_url)
47 md5 = re.findall('"md5sum":"(.*?)"', content)[0]
48 pn = re.findall('"totalPageNum":"(.*?)"', content)[0]
49 rsign = re.findall('"rsign":"(.*?)"', content)[0]
50 content_url = 'https://wkretype.bdimg.com/retype/text/' + doc_id + '?rn=' + pn + '&type=txt' + md5 + '&rsign=' + rsign
51 content = json.loads(fetch_url(content_url))
52 result = ''
53 for item in content:
54 for i in item['parags']:
55 result += i['c'].replace('\\r', '\r').replace('\\n', '\n')
56 return result
57
58
59def parse_other(doc_id):

Callers 1

mainFunction · 0.85

Calls 2

fetch_urlFunction · 0.85
replaceMethod · 0.80

Tested by

no test coverage detected