MCPcopy Index your code
hub / github.com/blacklanternsecurity/bbot / api_page_iter

Method api_page_iter

bbot/modules/base.py:1359–1420  ·  view source on GitHub ↗

An asynchronous generator function for iterating through paginated API data. This function continuously makes requests to a specified API URL, incrementing the page number or applying a custom pagination function, and yields the received data one page at a time. It

(self, url, page_size=100, _json=True, next_key=None, iter_key=None, **requests_kwargs)

Source from the content-addressed store, hash-verified

1357 return getattr(r, "is_success", False) or getattr(r, "status_code", 0) == 404
1358
1359 async def api_page_iter(self, url, page_size=100, _json=True, next_key=None, iter_key=None, **requests_kwargs):
1360 """
1361 An asynchronous generator function for iterating through paginated API data.
1362
1363 This function continuously makes requests to a specified API URL, incrementing the page number
1364 or applying a custom pagination function, and yields the received data one page at a time.
1365 It is well-suited for APIs that provide paginated results.
1366
1367 Args:
1368 url (str): The initial API URL. Can contain placeholders for 'page', 'page_size', and 'offset'.
1369 page_size (int, optional): The number of items per page. Defaults to 100.
1370 json (bool, optional): If True, attempts to deserialize the response content to a JSON object. Defaults to True.
1371 next_key (callable, optional): A function that takes the last page's data and returns the URL for the next page. Defaults to None.
1372 iter_key (callable, optional): A function that builds each new request based on the page number, page size, and offset. Defaults to a simple implementation that autoreplaces {page} and {page_size} in the url.
1373 **requests_kwargs: Arbitrary keyword arguments that will be forwarded to the HTTP request function.
1374
1375 Yields:
1376 dict or httpx.Response: If 'json' is True, yields a dictionary containing the parsed JSON data. Otherwise, yields the raw HTTP response.
1377
1378 Note:
1379 The loop will continue indefinitely unless manually stopped. Make sure to break out of the loop once the last page has been received.
1380
1381 Examples:
1382 >>> agen = api_page_iter('https://api.example.com/data?page={page}&page_size={page_size}')
1383 >>> try:
1384 >>> async for page in agen:
1385 >>> subdomains = page["subdomains"]
1386 >>> self.hugesuccess(subdomains)
1387 >>> if not subdomains:
1388 >>> break
1389 >>> finally:
1390 >>> await agen.aclose()
1391 """
1392 page = 1
1393 offset = 0
1394 result = None
1395 if iter_key is None:
1396 iter_key = self._prepare_api_iter_req
1397 while 1:
1398 if result and callable(next_key):
1399 try:
1400 new_url = next_key(result)
1401 except Exception as e:
1402 self.debug(f"Failed to extract next page of results from {url}: {e}")
1403 self.debug(traceback.format_exc())
1404 else:
1405 new_url, new_kwargs = iter_key(url, page, page_size, offset, **requests_kwargs)
1406 result = await self.api_request(new_url, **new_kwargs)
1407 if result is None:
1408 self.verbose(f"api_page_iter() got no response for {new_url}")
1409 break
1410 try:
1411 if _json:
1412 result = result.json()
1413 yield result
1414 except Exception:
1415 self.warning(f'Error in api_page_iter() for url: "{new_url}"')
1416 self.trace(traceback.format_exc())

Callers

nothing calls this directly

Calls 6

debugMethod · 0.95
api_requestMethod · 0.95
verboseMethod · 0.95
warningMethod · 0.95
traceMethod · 0.95
jsonMethod · 0.45

Tested by

no test coverage detected