MCPcopy
hub / github.com/yhangf/PythonCrawler / get_info_page_url

Function get_info_page_url

spiderFile/get_tj_accident_info.py:21–26  ·  view source on GitHub ↗
(url, session)

Source from the content-addressed store, hash-verified

19 return page_url_list
20
21async def get_info_page_url(url, session):
22 regex = re.compile("<a href='./(.*?)'\s+title=")
23 async with session.get(url) as response:
24 html = await response.text()
25 url_part_list = re.findall(regex, html)
26 return url_part_list
27
28async def get_all_info_page_url(root_url, page_url_list):
29 tasks, all_info_page_url_list = [], []

Callers 1

get_all_info_page_urlFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected