hub / github.com/rspeer/python-ftfy / TextFixerConfig

Class TextFixerConfig

ftfy/__init__.py:91–227 · view source on GitHub ↗

r""" A TextFixerConfig object stores configuration options for ftfy. It's implemented as a namedtuple with defaults, so you can instantiate it by providing the values to change from their defaults as keyword arguments. For example, to disable 'unescape_html' and keep the rest of the

Source from the content-addressed store, hash-verified

89
90
91	class TextFixerConfig(NamedTuple):
92	r"""
93	A TextFixerConfig object stores configuration options for ftfy.
94
95	It's implemented as a namedtuple with defaults, so you can instantiate
96	it by providing the values to change from their defaults as keyword arguments.
97	For example, to disable 'unescape_html' and keep the rest of the defaults::
98
99	TextFixerConfig(unescape_html=False)
100
101	Here are the options and their default values:
102
103	- `unescape_html`: "auto"
104
105	Configures whether to replace HTML entities such as & with the character
106	they represent. "auto" says to do this by default, but disable it when a
107	literal < character appears, indicating that the input is actual HTML and
108	entities should be preserved. The value can be True, to always enable this
109	fixer, or False, to always disable it.
110
111	- `remove_terminal_escapes`: True
112
113	Removes "ANSI" terminal escapes, such as for changing the color of text in a
114	terminal window.
115
116	- `fix_encoding`: True
117
118	Detect mojibake and attempt to fix it by decoding the text in a different
119	encoding standard.
120
121	The following four options affect `fix_encoding` works, and do nothing if
122	`fix_encoding` is False:
123
124	- `restore_byte_a0`: True
125
126	Allow a literal space (U+20) to be interpreted as a non-breaking space
127	(U+A0) when that would make it part of a fixable mojibake string.
128
129	Because spaces are very common characters, this could lead to false
130	positives, but we try to apply it only when there's strong evidence for
131	mojibake. Disabling `restore_byte_a0` is safer from false positives,
132	but creates false negatives.
133
134	- `replace_lossy_sequences`: True
135
136	Detect mojibake that has been partially replaced by the characters
137	'�' or '?'. If the mojibake could be decoded otherwise, replace the
138	detected sequence with '�'.
139
140	- `decode_inconsistent_utf8`: True
141
142	When we see sequences that distinctly look like UTF-8 mojibake, but
143	there's no consistent way to reinterpret the string in a new encoding,
144	replace the mojibake with the appropriate UTF-8 characters anyway.
145
146	This helps to decode strings that are concatenated from different
147	encodings.
148

Callers 8

mainFunction · 0.90

fix_textFunction · 0.85

fix_and_explainFunction · 0.85

fix_encoding_and_explainFunction · 0.85

_fix_encoding_one_step_and_explainFunction · 0.85

fix_encodingFunction · 0.85

fix_text_segmentFunction · 0.85

fix_fileFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected