Transform the words in `text` into their byte pair encoded token IDs. Parameters ---------- text: str or list of `N` strings The list of strings to encode Returns ------- codes : list of `N` lists A list of byte pair
(self, text)
| 282 | return v_out |
| 283 | |
| 284 | def transform(self, text): |
| 285 | """ |
| 286 | Transform the words in `text` into their byte pair encoded token IDs. |
| 287 | |
| 288 | Parameters |
| 289 | ---------- |
| 290 | text: str or list of `N` strings |
| 291 | The list of strings to encode |
| 292 | |
| 293 | Returns |
| 294 | ------- |
| 295 | codes : list of `N` lists |
| 296 | A list of byte pair token IDs for each of the `N` strings in |
| 297 | `text`. |
| 298 | |
| 299 | Examples |
| 300 | -------- |
| 301 | >>> B = BytePairEncoder(max_merges=100).fit("./example.txt") |
| 302 | >>> encoded_tokens = B.transform("Hello! How are you 😁 ?") |
| 303 | >>> encoded_tokens |
| 304 | [[72, 879, 474, ...]] |
| 305 | """ |
| 306 | if isinstance(text, str): |
| 307 | text = [text] |
| 308 | return [self._transform(string) for string in text] |
| 309 | |
| 310 | def _transform(self, text): |
| 311 | """Transform a single text string to a list of byte-pair IDs""" |
nothing calls this directly
no test coverage detected