MCPcopy Index your code
hub / github.com/TensorSpeech/TensorFlowTTS / tokenize

Function tokenize

tensorflow_tts/utils/korean.py:349–359  ·  view source on GitHub ↗
(text, as_id=False)

Source from the content-addressed store, hash-verified

347
348
349def tokenize(text, as_id=False):
350 # jamo package에 있는 hangul_to_jamo를 이용하여 한글 string을 초성/중성/종성으로 나눈다.
351 text = normalize(text)
352 tokens = list(
353 hangul_to_jamo(text)
354 ) # '존경하는' --> ['ᄌ', 'ᅩ', 'ᆫ', 'ᄀ', 'ᅧ', 'ᆼ', 'ᄒ', 'ᅡ', 'ᄂ', 'ᅳ', 'ᆫ', '~']
355
356 if as_id:
357 return [_symbol_to_id[token] for token in tokens]
358 else:
359 return [token for token in tokens]
360
361
362def tokenizer_fn(iterator):

Callers 1

tokenizer_fnFunction · 0.85

Calls 1

normalizeFunction · 0.70

Tested by

no test coverage detected