Gets exact results based on the previously learned rules. Parameters: ---------- url: str, optional URL of the target web page. You should either pass url or html or both. html: str, optional An HTML string can also be passed instead
(
self,
url=None,
html=None,
soup=None,
request_args=None,
grouped=False,
group_by_alias=False,
unique=None,
attr_fuzz_ratio=1.0,
keep_blank=False,
)
| 543 | ) |
| 544 | |
| 545 | def get_result_exact( |
| 546 | self, |
| 547 | url=None, |
| 548 | html=None, |
| 549 | soup=None, |
| 550 | request_args=None, |
| 551 | grouped=False, |
| 552 | group_by_alias=False, |
| 553 | unique=None, |
| 554 | attr_fuzz_ratio=1.0, |
| 555 | keep_blank=False, |
| 556 | ): |
| 557 | """ |
| 558 | Gets exact results based on the previously learned rules. |
| 559 | |
| 560 | Parameters: |
| 561 | ---------- |
| 562 | url: str, optional |
| 563 | URL of the target web page. You should either pass url or html or both. |
| 564 | |
| 565 | html: str, optional |
| 566 | An HTML string can also be passed instead of URL. |
| 567 | You should either pass url or html or both. |
| 568 | |
| 569 | request_args: dict, optional |
| 570 | A dictionary used to specify a set of additional request parameters used by requests |
| 571 | module. You can specify proxy URLs, custom headers etc. |
| 572 | |
| 573 | grouped: bool, optional, defaults to False |
| 574 | If set to True, the result will be a dictionary with the rule_ids as keys |
| 575 | and a list of scraped data per rule as values. |
| 576 | |
| 577 | group_by_alias: bool, optional, defaults to False |
| 578 | If set to True, the result will be a dictionary with the rule alias as keys |
| 579 | and a list of scraped data per alias as values. |
| 580 | |
| 581 | unique: bool, optional, defaults to True for non grouped results and |
| 582 | False for grouped results. |
| 583 | If set to True, will remove duplicates from returned result list. |
| 584 | |
| 585 | attr_fuzz_ratio: float in range [0, 1], optional, defaults to 1.0 |
| 586 | The fuzziness ratio threshold for matching html tag attributes. |
| 587 | |
| 588 | keep_blank: bool, optional, defaults to False |
| 589 | If set to True, missing values will be returned as empty strings. |
| 590 | |
| 591 | Returns: |
| 592 | -------- |
| 593 | List of exact results scraped from the web page. |
| 594 | Dictionary if grouped=True or group_by_alias=True. |
| 595 | """ |
| 596 | |
| 597 | func = self._get_result_with_stack_index_based |
| 598 | return self._get_result_by_func( |
| 599 | func, |
| 600 | url, |
| 601 | html, |
| 602 | soup, |
no test coverage detected