MCPcopy
hub / github.com/alirezamika/autoscraper / get_result_exact

Method get_result_exact

autoscraper/auto_scraper.py:545–609  ·  view source on GitHub ↗

Gets exact results based on the previously learned rules. Parameters: ---------- url: str, optional URL of the target web page. You should either pass url or html or both. html: str, optional An HTML string can also be passed instead

(
        self,
        url=None,
        html=None,
        soup=None,
        request_args=None,
        grouped=False,
        group_by_alias=False,
        unique=None,
        attr_fuzz_ratio=1.0,
        keep_blank=False,
    )

Source from the content-addressed store, hash-verified

543 )
544
545 def get_result_exact(
546 self,
547 url=None,
548 html=None,
549 soup=None,
550 request_args=None,
551 grouped=False,
552 group_by_alias=False,
553 unique=None,
554 attr_fuzz_ratio=1.0,
555 keep_blank=False,
556 ):
557 """
558 Gets exact results based on the previously learned rules.
559
560 Parameters:
561 ----------
562 url: str, optional
563 URL of the target web page. You should either pass url or html or both.
564
565 html: str, optional
566 An HTML string can also be passed instead of URL.
567 You should either pass url or html or both.
568
569 request_args: dict, optional
570 A dictionary used to specify a set of additional request parameters used by requests
571 module. You can specify proxy URLs, custom headers etc.
572
573 grouped: bool, optional, defaults to False
574 If set to True, the result will be a dictionary with the rule_ids as keys
575 and a list of scraped data per rule as values.
576
577 group_by_alias: bool, optional, defaults to False
578 If set to True, the result will be a dictionary with the rule alias as keys
579 and a list of scraped data per alias as values.
580
581 unique: bool, optional, defaults to True for non grouped results and
582 False for grouped results.
583 If set to True, will remove duplicates from returned result list.
584
585 attr_fuzz_ratio: float in range [0, 1], optional, defaults to 1.0
586 The fuzziness ratio threshold for matching html tag attributes.
587
588 keep_blank: bool, optional, defaults to False
589 If set to True, missing values will be returned as empty strings.
590
591 Returns:
592 --------
593 List of exact results scraped from the web page.
594 Dictionary if grouped=True or group_by_alias=True.
595 """
596
597 func = self._get_result_with_stack_index_based
598 return self._get_result_by_func(
599 func,
600 url,
601 html,
602 soup,

Callers 1

get_resultMethod · 0.95

Calls 1

_get_result_by_funcMethod · 0.95

Tested by

no test coverage detected