Method decode

models/tokenization_moss.py:310–352 · view source on GitHub ↗

Converts a sequence of ids in a string, using the tokenizer and vocabulary with options to remove special tokens and clean up tokenization spaces. Similar to doing `self.convert_tokens_to_string(self.convert_ids_to_tokens(token_ids))`. Args: token_ids (

(
        self,
        token_ids: Union[int, List[int], "np.ndarray", "torch.Tensor", "tf.Tensor"],
        skip_special_tokens: bool = False,
        clean_up_tokenization_spaces: bool = None,
        truncate_before_pattern: Optional[List[str]] = None,
        **kwargs,
    )

Source from the content-addressed store, hash-verified

308	return (text, kwargs)
309
310	def decode(
311	self,
312	token_ids: Union[int, List[int], "np.ndarray", "torch.Tensor", "tf.Tensor"],
313	skip_special_tokens: bool = False,
314	clean_up_tokenization_spaces: bool = None,
315	truncate_before_pattern: Optional[List[str]] = None,
316	**kwargs,
317	) -> str:
318	"""
319	Converts a sequence of ids in a string, using the tokenizer and vocabulary with options to remove special
320	tokens and clean up tokenization spaces.
321
322	Similar to doing `self.convert_tokens_to_string(self.convert_ids_to_tokens(token_ids))`.
323
324	Args:
325	token_ids (`Union[int, List[int], np.ndarray, torch.Tensor, tf.Tensor]`):
326	List of tokenized input ids. Can be obtained using the `__call__` method.
327	skip_special_tokens (`bool`, optional, defaults to `False`):
328	Whether or not to remove special tokens in the decoding.
329	clean_up_tokenization_spaces (`bool`, optional):
330	Whether or not to clean up the tokenization spaces. If `None`, will default to
331	`self.clean_up_tokenization_spaces` (available in the `tokenizer_config`).
332	truncate_before_pattern (`List[str]`, optional, defaults to `None`):
333	A list of regular expression strings that will be used to truncate the returned string. This can be
334	used to remove extra pieces of code (e.g. truncate if observing a comment symbol "#" at the beginning
335	of a new line). An example pattern could be `["^#", re.escape("<\|endoftext\|>"), "^'''", "\n\n\n"]`.
336	kwargs (additional keyword arguments, optional):
337	Will be passed to the underlying model specific decode method.
338
339	Returns:
340	`str`: The decoded sentence.
341	"""
342	decoded_text = super()._decode(
343	token_ids=token_ids,
344	skip_special_tokens=skip_special_tokens,
345	clean_up_tokenization_spaces=clean_up_tokenization_spaces,
346	**kwargs,
347	)
348
349	if truncate_before_pattern is not None and len(truncate_before_pattern) > 0:
350	decoded_text = self.truncate(decoded_text, truncate_before_pattern)
351
352	return decoded_text
353
354	def truncate(self, completion, truncate_before_pattern):
355	def find_re(string, pattern, start_pos):

Callers 6

create_itemFunction · 0.80

generate_answerFunction · 0.80

mainFunction · 0.80

predictFunction · 0.80

mainFunction · 0.80

convert_tokens_to_stringMethod · 0.80

Calls 1

truncateMethod · 0.95

Tested by

no test coverage detected