hub / github.com/InternLM/lmdeploy / encode

Method encode

lmdeploy/tokenizer.py:465–483 · view source on GitHub ↗

Tokenize a prompt. Args: s: a prompt. add_bos: Whether to add ``bos`` token id when encoding the prompt. add_special_tokens: Whether or not to add special tokens when encoding the prompt. Returns: list[int]: token ids.

(self, s: str, add_bos: bool = True, add_special_tokens: bool = True, **kwargs)

Source from the content-addressed store, hash-verified

463	return self.model.get_vocab()
464
465	def encode(self, s: str, add_bos: bool = True, add_special_tokens: bool = True, **kwargs):
466	"""Tokenize a prompt.
467
468	Args:
469	s: a prompt.
470	add_bos: Whether to add ``bos`` token id when encoding the prompt.
471	add_special_tokens: Whether or not to add special tokens
472	when encoding the prompt.
473
474	Returns:
475	list[int]: token ids.
476	"""
477	encoded = self.model.encode(s, add_bos, add_special_tokens, **kwargs)
478	if encoded[:2] == [self.bos_token_id] * 2:
479	self.logger.warning(f'Detected duplicate bos token {self.bos_token_id} in prompt, '
480	'this will likely reduce response quality, one of them will be'
481	'removed')
482	encoded = encoded[1:]
483	return encoded
484
485	def decode(
486	self,

Callers 15

indexes_containing_tokenMethod · 0.95

test_engine_generation_configFunction · 0.95

test_glm4_special_tokenFunction · 0.95

encodeMethod · 0.45

encode_textFunction · 0.45

get_passkey_promptFunction · 0.45

test_input_ids_modeMethod · 0.45

sample_sharegpt_requestsFunction · 0.45

sample_random_requestsFunction · 0.45

process_requestMethod · 0.45

_normalize_rowFunction · 0.45

Calls

no outgoing calls

Tested by 4

test_engine_generation_configFunction · 0.76

test_glm4_special_tokenFunction · 0.76

get_passkey_promptFunction · 0.36

test_input_ids_modeMethod · 0.36