MCPcopy
hub / github.com/yhangf/PythonCrawler / get_all_info_page_url

Function get_all_info_page_url

spiderFile/get_tj_accident_info.py:28–38  ·  view source on GitHub ↗
(root_url, page_url_list)

Source from the content-addressed store, hash-verified

26 return url_part_list
27
28async def get_all_info_page_url(root_url, page_url_list):
29 tasks, all_info_page_url_list = [], []
30 # 控制协程并发量
31 async with asyncio.Semaphore(50) as semaphore:
32 async with aiohttp.ClientSession() as session:
33 for url in page_url_list:
34 tasks.append(get_info_page_url(url, session))
35 done, pendding = await asyncio.wait(tasks)
36 all_info_page_url_list = [root_url+url_part for r in done
37 for url_part in r.result()]
38 return all_info_page_url_list
39
40
41def get_data(url):

Callers 1

Calls 1

get_info_page_urlFunction · 0.85

Tested by

no test coverage detected