Function tokenize

tensorflow_tts/utils/korean.py:349–359 · view source on GitHub ↗

(text, as_id=False)

Source from the content-addressed store, hash-verified

347
348
349	def tokenize(text, as_id=False):
350	# jamo package에 있는 hangul_to_jamo를 이용하여 한글 string을 초성/중성/종성으로 나눈다.
351	text = normalize(text)
352	tokens = list(
353	hangul_to_jamo(text)
354	) # '존경하는' --> ['ᄌ', 'ᅩ', 'ᆫ', 'ᄀ', 'ᅧ', 'ᆼ', 'ᄒ', 'ᅡ', 'ᄂ', 'ᅳ', 'ᆫ', '~']
355
356	if as_id:
357	return [_symbol_to_id[token] for token in tokens]
358	else:
359	return [token for token in tokens]
360
361
362	def tokenizer_fn(iterator):

tokenizer_fnFunction · 0.85

normalizeFunction · 0.70

no test coverage detected