Method encode

tests/conftest.py:33–44 · view source on GitHub ↗

Encode text to token ids (simple simulation).

(self, text: str, add_special_tokens: bool = True)

Source from the content-addressed store, hash-verified

31	self.bos_token_id = 1
32
33	def encode(self, text: str, add_special_tokens: bool = True) -> List[int]:
34	"""Encode text to token ids (simple simulation)."""
35	# Simple simulation: each word becomes a token
36	tokens = []
37	if add_special_tokens:
38	tokens.append(self.bos_token_id)
39	# Simulate tokenization by splitting on spaces
40	for i, word in enumerate(text.split()):
41	# Use hash to get a consistent token id for each word
42	token_id = (hash(word) % (self.vocab_size - 10)) + 10
43	tokens.append(token_id)
44	return tokens
45
46	def decode(
47	self,

__call__Method · 0.95

do_GETMethod · 0.45

_load_builtin_calibrationFunction · 0.45

_load_hf_calibrationFunction · 0.45

_send_app_controlFunction · 0.45

_get_stop_tokensMethod · 0.45

_get_xtc_special_tokensMethod · 0.45

_build_state_machineMethod · 0.45

_encode_thinking_markerMethod · 0.45

_resolve_think_end_token_idsMethod · 0.45

_resolve_think_close_patternMethod · 0.45

add_requestMethod · 0.45

appendMethod · 0.80

splitMethod · 0.80

no test coverage detected