MCPcopy
hub / github.com/PaddlePaddle/PaddleFormers / add

Method add

paddleformers/transformers/legacy/tokenizer_utils.py:291–318  ·  view source on GitHub ↗

Passes over every char (utf-8 char) on word and recursively adds it to the internal `data` trie representation. The special key `""` is used to represent termination. This function is idempotent, adding twice the same word will leave the trie unchanged Example:

(self, word: str)

Source from the content-addressed store, hash-verified

289 self.data = {}
290
291 def add(self, word: str):
292 """
293 Passes over every char (utf-8 char) on word and recursively adds it to the internal `data` trie representation.
294 The special key `""` is used to represent termination.
295
296 This function is idempotent, adding twice the same word will leave the trie unchanged
297
298 Example:
299
300 ```python
301 >>> trie = Trie()
302 >>> trie.add("Hello 友達")
303 >>> trie.data
304 {"H": {"e": {"l": {"l": {"o": {" ": {"友": {"達": {"": 1}}}}}}}}}
305
306 >>> trie.add("Hello")
307 >>> trie.data
308 {"H": {"e": {"l": {"l": {"o": {"": 1, " ": {"友": {"達": {"": 1}}}}}}}}}
309 ```
310 """
311 if not word:
312 # Prevent empty string
313 return
314 ref = self.data
315 for char in word:
316 ref[char] = char in ref and ref[char] or {}
317 ref = ref[char]
318 ref[""] = 1
319
320 def split(self, text: str) -> List[str]:
321 """

Callers 15

_create_trieMethod · 0.95
_check_received_keysFunction · 0.45
_get_unsavable_keysMethod · 0.45
__init__Method · 0.45
splitMethod · 0.45
_get_pairsMethod · 0.45
_from_pretrainedMethod · 0.45
apply_rotaryMethod · 0.45
apply_rotary_3dMethod · 0.45

Calls

no outgoing calls

Tested by

no test coverage detected