r""" A TextFixerConfig object stores configuration options for ftfy. It's implemented as a namedtuple with defaults, so you can instantiate it by providing the values to change from their defaults as keyword arguments. For example, to disable 'unescape_html' and keep the rest of the
| 89 | |
| 90 | |
| 91 | class TextFixerConfig(NamedTuple): |
| 92 | r""" |
| 93 | A TextFixerConfig object stores configuration options for ftfy. |
| 94 | |
| 95 | It's implemented as a namedtuple with defaults, so you can instantiate |
| 96 | it by providing the values to change from their defaults as keyword arguments. |
| 97 | For example, to disable 'unescape_html' and keep the rest of the defaults:: |
| 98 | |
| 99 | TextFixerConfig(unescape_html=False) |
| 100 | |
| 101 | Here are the options and their default values: |
| 102 | |
| 103 | - `unescape_html`: "auto" |
| 104 | |
| 105 | Configures whether to replace HTML entities such as & with the character |
| 106 | they represent. "auto" says to do this by default, but disable it when a |
| 107 | literal < character appears, indicating that the input is actual HTML and |
| 108 | entities should be preserved. The value can be True, to always enable this |
| 109 | fixer, or False, to always disable it. |
| 110 | |
| 111 | - `remove_terminal_escapes`: True |
| 112 | |
| 113 | Removes "ANSI" terminal escapes, such as for changing the color of text in a |
| 114 | terminal window. |
| 115 | |
| 116 | - `fix_encoding`: True |
| 117 | |
| 118 | Detect mojibake and attempt to fix it by decoding the text in a different |
| 119 | encoding standard. |
| 120 | |
| 121 | The following four options affect `fix_encoding` works, and do nothing if |
| 122 | `fix_encoding` is False: |
| 123 | |
| 124 | - `restore_byte_a0`: True |
| 125 | |
| 126 | Allow a literal space (U+20) to be interpreted as a non-breaking space |
| 127 | (U+A0) when that would make it part of a fixable mojibake string. |
| 128 | |
| 129 | Because spaces are very common characters, this could lead to false |
| 130 | positives, but we try to apply it only when there's strong evidence for |
| 131 | mojibake. Disabling `restore_byte_a0` is safer from false positives, |
| 132 | but creates false negatives. |
| 133 | |
| 134 | - `replace_lossy_sequences`: True |
| 135 | |
| 136 | Detect mojibake that has been partially replaced by the characters |
| 137 | '�' or '?'. If the mojibake could be decoded otherwise, replace the |
| 138 | detected sequence with '�'. |
| 139 | |
| 140 | - `decode_inconsistent_utf8`: True |
| 141 | |
| 142 | When we see sequences that distinctly look like UTF-8 mojibake, but |
| 143 | there's no consistent way to reinterpret the string in a new encoding, |
| 144 | replace the mojibake with the appropriate UTF-8 characters anyway. |
| 145 | |
| 146 | This helps to decode strings that are concatenated from different |
| 147 | encodings. |
| 148 |
no outgoing calls
no test coverage detected