MCPcopy
hub / github.com/rspeer/python-ftfy / TextFixerConfig

Class TextFixerConfig

ftfy/__init__.py:91–227  ·  view source on GitHub ↗

r""" A TextFixerConfig object stores configuration options for ftfy. It's implemented as a namedtuple with defaults, so you can instantiate it by providing the values to change from their defaults as keyword arguments. For example, to disable 'unescape_html' and keep the rest of the

Source from the content-addressed store, hash-verified

89
90
91class TextFixerConfig(NamedTuple):
92 r"""
93 A TextFixerConfig object stores configuration options for ftfy.
94
95 It's implemented as a namedtuple with defaults, so you can instantiate
96 it by providing the values to change from their defaults as keyword arguments.
97 For example, to disable 'unescape_html' and keep the rest of the defaults::
98
99 TextFixerConfig(unescape_html=False)
100
101 Here are the options and their default values:
102
103 - `unescape_html`: "auto"
104
105 Configures whether to replace HTML entities such as & with the character
106 they represent. "auto" says to do this by default, but disable it when a
107 literal < character appears, indicating that the input is actual HTML and
108 entities should be preserved. The value can be True, to always enable this
109 fixer, or False, to always disable it.
110
111 - `remove_terminal_escapes`: True
112
113 Removes "ANSI" terminal escapes, such as for changing the color of text in a
114 terminal window.
115
116 - `fix_encoding`: True
117
118 Detect mojibake and attempt to fix it by decoding the text in a different
119 encoding standard.
120
121 The following four options affect `fix_encoding` works, and do nothing if
122 `fix_encoding` is False:
123
124 - `restore_byte_a0`: True
125
126 Allow a literal space (U+20) to be interpreted as a non-breaking space
127 (U+A0) when that would make it part of a fixable mojibake string.
128
129 Because spaces are very common characters, this could lead to false
130 positives, but we try to apply it only when there&#x27;s strong evidence for
131 mojibake. Disabling `restore_byte_a0` is safer from false positives,
132 but creates false negatives.
133
134 - `replace_lossy_sequences`: True
135
136 Detect mojibake that has been partially replaced by the characters
137 '�' or '?'. If the mojibake could be decoded otherwise, replace the
138 detected sequence with '�'.
139
140 - `decode_inconsistent_utf8`: True
141
142 When we see sequences that distinctly look like UTF-8 mojibake, but
143 there&#x27;s no consistent way to reinterpret the string in a new encoding,
144 replace the mojibake with the appropriate UTF-8 characters anyway.
145
146 This helps to decode strings that are concatenated from different
147 encodings.
148

Callers 8

mainFunction · 0.90
fix_textFunction · 0.85
fix_and_explainFunction · 0.85
fix_encoding_and_explainFunction · 0.85
fix_encodingFunction · 0.85
fix_text_segmentFunction · 0.85
fix_fileFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected