MCPcopy Index your code
hub / github.com/ddbourgin/numpy-ml / transform

Method transform

numpy_ml/preprocessing/nlp.py:284–308  ·  view source on GitHub ↗

Transform the words in `text` into their byte pair encoded token IDs. Parameters ---------- text: str or list of `N` strings The list of strings to encode Returns ------- codes : list of `N` lists A list of byte pair

(self, text)

Source from the content-addressed store, hash-verified

282 return v_out
283
284 def transform(self, text):
285 """
286 Transform the words in `text` into their byte pair encoded token IDs.
287
288 Parameters
289 ----------
290 text: str or list of `N` strings
291 The list of strings to encode
292
293 Returns
294 -------
295 codes : list of `N` lists
296 A list of byte pair token IDs for each of the `N` strings in
297 `text`.
298
299 Examples
300 --------
301 >>> B = BytePairEncoder(max_merges=100).fit("./example.txt")
302 >>> encoded_tokens = B.transform("Hello! How are you 😁 ?")
303 >>> encoded_tokens
304 [[72, 879, 474, ...]]
305 """
306 if isinstance(text, str):
307 text = [text]
308 return [self._transform(string) for string in text]
309
310 def _transform(self, text):
311 """Transform a single text string to a list of byte-pair IDs"""

Callers

nothing calls this directly

Calls 1

_transformMethod · 0.95

Tested by

no test coverage detected