MCPcopy
hub / github.com/jundot/omlx / encode

Method encode

tests/conftest.py:33–44  ·  view source on GitHub ↗

Encode text to token ids (simple simulation).

(self, text: str, add_special_tokens: bool = True)

Source from the content-addressed store, hash-verified

31 self.bos_token_id = 1
32
33 def encode(self, text: str, add_special_tokens: bool = True) -> List[int]:
34 """Encode text to token ids (simple simulation)."""
35 # Simple simulation: each word becomes a token
36 tokens = []
37 if add_special_tokens:
38 tokens.append(self.bos_token_id)
39 # Simulate tokenization by splitting on spaces
40 for i, word in enumerate(text.split()):
41 # Use hash to get a consistent token id for each word
42 token_id = (hash(word) % (self.vocab_size - 10)) + 10
43 tokens.append(token_id)
44 return tokens
45
46 def decode(
47 self,

Callers 15

__call__Method · 0.95
do_GETMethod · 0.45
_load_hf_calibrationFunction · 0.45
_send_app_controlFunction · 0.45
_get_stop_tokensMethod · 0.45
_build_state_machineMethod · 0.45
add_requestMethod · 0.45

Calls 2

appendMethod · 0.80
splitMethod · 0.80

Tested by

no test coverage detected