A helper class that represents a string that can be attacked. Models that take multiple sentences as input separate them by ``SPLIT_TOKEN``. Attacks "see" the entire input, joined into one string, without the split token. ``AttackedText`` instances that were perturbed from other ``Atta
| 25 | |
| 26 | |
| 27 | class AttackedText: |
| 28 | """A helper class that represents a string that can be attacked. |
| 29 | |
| 30 | Models that take multiple sentences as input separate them by ``SPLIT_TOKEN``. |
| 31 | Attacks "see" the entire input, joined into one string, without the split token. |
| 32 | |
| 33 | ``AttackedText`` instances that were perturbed from other ``AttackedText`` |
| 34 | objects contain a pointer to the previous text |
| 35 | (``attack_attrs["previous_attacked_text"]``), so that the full chain of |
| 36 | perturbations might be reconstructed by using this key to form a linked |
| 37 | list. |
| 38 | |
| 39 | Args: |
| 40 | text (string): The string that this AttackedText represents |
| 41 | attack_attrs (dict): Dictionary of various attributes stored |
| 42 | during the course of an attack. |
| 43 | """ |
| 44 | |
| 45 | SPLIT_TOKEN = "<SPLIT>" |
| 46 | |
| 47 | def __init__(self, text_input, attack_attrs=None): |
| 48 | # Read in ``text_input`` as a string or OrderedDict. |
| 49 | if isinstance(text_input, str): |
| 50 | self._text_input = OrderedDict([("text", text_input)]) |
| 51 | elif isinstance(text_input, OrderedDict): |
| 52 | self._text_input = text_input |
| 53 | else: |
| 54 | raise TypeError( |
| 55 | f"Invalid text_input type {type(text_input)} (required str or OrderedDict)" |
| 56 | ) |
| 57 | # Process input lazily. |
| 58 | self._words = None |
| 59 | self._words_per_input = None |
| 60 | self._pos_tags = None |
| 61 | self._ner_tags = None |
| 62 | # Format text inputs. |
| 63 | self._text_input = OrderedDict([(k, v) for k, v in self._text_input.items()]) |
| 64 | if attack_attrs is None: |
| 65 | self.attack_attrs = dict() |
| 66 | elif isinstance(attack_attrs, dict): |
| 67 | self.attack_attrs = attack_attrs |
| 68 | else: |
| 69 | raise TypeError(f"Invalid type for attack_attrs: {type(attack_attrs)}") |
| 70 | # Indices of words from the *original* text. Allows us to map |
| 71 | # indices between original text and this text, and vice-versa. |
| 72 | self.attack_attrs.setdefault("original_index_map", np.arange(self.num_words)) |
| 73 | # A list of all indices in *this* text that have been modified. |
| 74 | self.attack_attrs.setdefault("modified_indices", set()) |
| 75 | |
| 76 | def __eq__(self, other: AttackedText) -> bool: |
| 77 | """Compares two AttackedText instances. |
| 78 | |
| 79 | Note: Does not compute true equality across attack attributes. |
| 80 | We found this caused large performance issues with caching, |
| 81 | and it's actually much faster (cache-wise) to just compare |
| 82 | by the text, and this works for lots of use cases. |
| 83 | """ |
| 84 | if not (self.text == other.text): |
no outgoing calls