hub / github.com/ddbourgin/numpy-ml / transform

Method transform

numpy_ml/preprocessing/nlp.py:284–308 · view source on GitHub ↗

Transform the words in `text` into their byte pair encoded token IDs. Parameters ---------- text: str or list of `N` strings The list of strings to encode Returns ------- codes : list of `N` lists A list of byte pair

(self, text)

Source from the content-addressed store, hash-verified

282	return v_out
283
284	def transform(self, text):
285	"""
286	Transform the words in `text` into their byte pair encoded token IDs.
287
288	Parameters
289	----------
290	text: str or list of `N` strings
291	The list of strings to encode
292
293	Returns
294	-------
295	codes : list of `N` lists
296	A list of byte pair token IDs for each of the `N` strings in
297	`text`.
298
299	Examples
300	--------
301	>>> B = BytePairEncoder(max_merges=100).fit("./example.txt")
302	>>> encoded_tokens = B.transform("Hello! How are you 😁 ?")
303	>>> encoded_tokens
304	[[72, 879, 474, ...]]
305	"""
306	if isinstance(text, str):
307	text = [text]
308	return [self._transform(string) for string in text]
309
310	def _transform(self, text):
311	"""Transform a single text string to a list of byte-pair IDs"""

Callers

nothing calls this directly

Calls 1

_transformMethod · 0.95

Tested by

no test coverage detected