MCPcopy
hub / github.com/rspeer/python-ftfy / fix_text

Function fix_text

ftfy/__init__.py:290–361  ·  view source on GitHub ↗

r""" Given Unicode text as input, fix inconsistencies and glitches in it, such as mojibake (text that was decoded in the wrong encoding). Let's start with some examples: >>> fix_text('✔ No problems') '✔ No problems' >>> print(fix_text("¯\\_(ã\x83\x84)_/&

(text: str, config: TextFixerConfig | None = None, **kwargs: Any)

Source from the content-addressed store, hash-verified

288
289
290def fix_text(text: str, config: TextFixerConfig | None = None, **kwargs: Any) -> str:
291 r"""
292 Given Unicode text as input, fix inconsistencies and glitches in it,
293 such as mojibake (text that was decoded in the wrong encoding).
294
295 Let's start with some examples:
296
297 >>> fix_text('✔ No problems')
298 '✔ No problems'
299
300 >>> print(fix_text("¯\\_(ã\x83\x84)_/¯"))
301 ¯\_(ツ)_/¯
302
303 >>> fix_text('Broken text… it’s flubberific!')
304 "Broken text... it's flubberific!"
305
306 >>> fix_text('LOUD NOISES')
307 'LOUD NOISES'
308
309 ftfy applies a number of different fixes to the text, and can accept
310 configuration to select which fixes to apply.
311
312 The configuration takes the form of a :class:`TextFixerConfig` object,
313 and you can see a description of the options in that class's docstring
314 or in the full documentation at ftfy.readthedocs.org.
315
316 For convenience and backward compatibility, the configuration can also
317 take the form of keyword arguments, which will set the equivalently-named
318 fields of the TextFixerConfig object.
319
320 For example, here are two ways to fix text but skip the "uncurl_quotes"
321 step::
322
323 fix_text(text, TextFixerConfig(uncurl_quotes=False))
324 fix_text(text, uncurl_quotes=False)
325
326 This function fixes text in independent segments, which are usually lines
327 of text, or arbitrarily broken up every 1 million codepoints (configurable
328 with `config.max_decode_length`) if there aren't enough line breaks. The
329 bound on segment lengths helps to avoid unbounded slowdowns.
330
331 ftfy can also provide an 'explanation', a list of transformations it applied
332 to the text that would fix more text like it. This function doesn't provide
333 explanations (because there may be different fixes for different segments
334 of text).
335
336 To get an explanation, use the :func:`fix_and_explain()` function, which
337 fixes the string in one segment and explains what it fixed.
338 """
339
340 if config is None:
341 config = TextFixerConfig(explain=False)
342 config = _config_from_kwargs(config, kwargs)
343 if isinstance(text, bytes):
344 raise UnicodeError(BYTES_ERROR_TEXT)
345
346 out = []
347 pos = 0

Callers 4

test_entitiesFunction · 0.90
test_old_parameter_nameFunction · 0.90
test_json_exampleFunction · 0.90
test_ohio_flagFunction · 0.90

Calls 3

TextFixerConfigClass · 0.85
_config_from_kwargsFunction · 0.85
fix_and_explainFunction · 0.85

Tested by 4

test_entitiesFunction · 0.72
test_old_parameter_nameFunction · 0.72
test_json_exampleFunction · 0.72
test_ohio_flagFunction · 0.72