MCPcopy Index your code
hub / github.com/NVIDIA/TensorRT-LLM / SamplingParams

Class SamplingParams

tensorrt_llm/sampling_params.py:113–552  ·  view source on GitHub ↗

Sampling parameters for text generation. Usage Examples: use_beam_search is False: - best_of is None: (top-p/top-k) sampling n responses and return n generations - best_of is not None: (top-p/top-k) sampling best_of responses and return n generations (best_of >=

Source from the content-addressed store, hash-verified

111
112@dataclass(slots=True, kw_only=True)
113class SamplingParams:
114 """Sampling parameters for text generation.
115
116 Usage Examples:
117
118 use_beam_search is False:
119 - best_of is None: (top-p/top-k) sampling n responses and return n generations
120 - best_of is not None: (top-p/top-k) sampling best_of responses and return n generations (best_of >= n must hold)
121 use_beam_search is True:
122 - best_of is None: beam search with beam width of n, return n generations
123 - best_of is not None: beam search with beam width of best_of, return n generations (best_of >= n must hold)
124
125 Args:
126 end_id (int, optional): The end token id. Defaults to None.
127 pad_id (int, optional): The pad token id. Defaults to None.
128 max_tokens (int): The maximum number of tokens to generate. Defaults to 32.
129 bad (str, List[str], optional): A string or a list of strings that redirect the generation when they are generated, so that the bad strings are excluded from the returned output. Defaults to None.
130 bad_token_ids (List[int], optional): A list of token ids that redirect the generation when they are generated, so that the bad ids are excluded from the returned output. Defaults to None.
131 stop (str, List[str], optional): A string or a list of strings that stop the generation when they are generated. The returned output will not contain the stop strings unless include_stop_str_in_output is True. Defaults to None.
132 stop_token_ids (List[int], optional): A list of token ids that stop the generation when they are generated. Defaults to None.
133 include_stop_str_in_output (bool): Whether to include the stop strings in output text. Defaults to False.
134 embedding_bias (torch.Tensor, optional): The embedding bias tensor. Expected type is kFP32 and shape is [vocab_size]. Defaults to None.
135 logits_processor (tensorrt_llm.sampling_params.LogitsProcessor, List[tensorrt_llm.sampling_params.LogitsProcessor], optional): The logits postprocessor callback(s). Defaults to None.
136 If a list, each processor is applied in order during generation (supported in PyTorch backend only).
137 apply_batched_logits_processor (bool): Whether to apply batched logits postprocessor callback. Defaults to False.
138 The BatchedLogitsProcessor class is recommended for callback creation. The callback must be provided when initializing LLM.
139
140 n (int): Number of sequences to generate. Defaults to 1.
141 best_of (int, optional): Number of sequences to consider for best output. Defaults to None.
142 use_beam_search (bool): Whether to use beam search. Defaults to False.
143
144 top_k (int, optional): Controls number of logits to sample from. Can assume non-negative values, where 0 means 'all logits'. Defaults to None.
145 The value None is treated as "not specified" in the following.
146 If neither temperature, top_p, nor top_k are specified, sampling is greedy.
147 If temperature > 0 and/or top_p < 1 are specified, sampling will proceed accordingly and top_k will default to top_k = 0.
148 Setting top_k = 1 results in greedy sampling.
149 top_p (float, optional): Controls the top-P probability to sample from. Can have values between 0 and 1. Defaults to None.
150 The value None is treated as "not specified" in the following.
151 If neither temperature, top_p, nor top_k are specified, sampling is greedy.
152 If temperature > 0 and/or top_k > 1 are specified, sampling will proceed accordingly and top_p will default to top_p = 1.
153 Setting top_p = 0 should result in greedy sampling, but is currently disallowed in the backend.
154 top_p_min (float, optional): Controls decay in the top-P algorithm. topPMin is lower-bound. None means using C++ runtime default 1.e-6. Defaults to None.
155 top_p_reset_ids (int, optional): Controls decay in the top-P algorithm. Indicates where to reset the decay. None means using C++ runtime default 1. Defaults to None.
156 top_p_decay (float, optional): Controls decay in the top-P algorithm. The decay value. None means using C++ runtime default 1.f. Defaults to None.
157 seed (int, optional): Controls the random seed used by the random number generator in sampling. None means using C++ runtime default 0. Defaults to None.
158 temperature (float, optional): Controls the modulation of logits when sampling new tokens. It can have values >= 0.f. Defaults to None.
159 The value None is treated as "not specified" in the following.
160 If neither temperature, top_p, nor top_k are specified, sampling is greedy.
161 If top_p < 1 and/or top_k > 1 are specified, sampling will proceed accordingly and temperature will default to temperature = 1.
162 Setting temperature = 0 results in greedy sampling.
163 min_tokens (int, optional): Lower bound on the number of tokens to generate. Values < 1 have no effect. None means using C++ runtime default 1. Defaults to None.
164 beam_search_diversity_rate (float, optional): Used to penalize tokens based on how often they appear in the sequence. It can have any value > 0.f. Values < 1.f encourages repetition, values > 1.f discourages it. None means using C++ runtime default 1.f. Defaults to None.
165 repetition_penalty (float, optional): Used to penalize tokens based on how often they appear in the sequence. It can have any value > 0.f. Values < 1.f encourages repetition, values > 1.f discourages it. None means using C++ runtime default 1.f. Defaults to None.
166 presence_penalty (float, optional): Used to penalize tokens already present in the sequence (irrespective of the number of appearances). It can have any values. Values < 0.f encourage repetition, values > 0.f discourage it. None means using C++ runtime default 0.f. Defaults to None.
167 frequency_penalty (float, optional): Used to penalize tokens already present in the sequence (dependent on the number of appearances). It can have any values. Values < 0.f encourage repetition, values > 0.f discourage it. None means using C++ runtime default 0.f. Defaults to None.
168 prompt_ignore_length (int, optional): Controls how many tokens to ignore from the prompt for presence and frequency penalties. Values <= 0 have no effect. Values > input (prompt) length will be clamped. None means using C++ runtime default 0. Defaults to None.
169 length_penalty (float, optional): Controls how to penalize longer sequences in beam search. None means using C++ runtime default 0.f. Defaults to None.
170 early_stopping (int, optional): Controls whether the generation process finishes once beamWidth sentences are generated (ends with end_token). None means using C++ runtime default 1. Defaults to None.

Calls

no outgoing calls