MCPcopy Index your code
hub / github.com/Turing-Project/WriteGPT / convert_to_unicode

Function convert_to_unicode

LanguageNetwork/GPT2/scripts/tokenization.py:80–97  ·  view source on GitHub ↗

Converts `text` to Unicode (if it's not already), assuming utf-8 input.

(text)

Source from the content-addressed store, hash-verified

78
79
80def convert_to_unicode(text):
81 """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
82 if six.PY3:
83 if isinstance(text, str):
84 return text
85 elif isinstance(text, bytes):
86 return text.decode("utf-8", "ignore")
87 else:
88 raise ValueError("Unsupported string type: %s" % (type(text)))
89 elif six.PY2:
90 if isinstance(text, str):
91 return text.decode("utf-8", "ignore")
92 elif isinstance(text, unicode):
93 return text
94 else:
95 raise ValueError("Unsupported string type: %s" % (type(text)))
96 else:
97 raise ValueError("Not running on Python2 or Python 3?")
98
99
100def printable_text(text):

Callers 4

load_vocabFunction · 0.70
tokenizeMethod · 0.70
tokenizeMethod · 0.70
demo.pyFile · 0.70

Calls 1

decodeMethod · 0.45

Tested by

no test coverage detected