hub / github.com/NVIDIA/TensorRT-LLM / prepare_inputs

Method prepare_inputs

tensorrt_llm/models/enc_dec/model.py:1341–2018 · view source on GitHub ↗

@brief: Prepare inputs Tensors for the model, the given sizes are used to determine the ranges of the dimensions of when using TRT dynamic shapes. @return: a list contains values which can be fed into the self.forward()

(self,
                       max_batch_size,
                       max_beam_width,
                       max_decoder_input_len,
                       max_seq_len,
                       max_encoder_input_len,
                       gather_context_logits: bool = False,
                       lora_target_modules: List[str] = None,
                       use_cache=True,
                       *args,
                       **kwargs)

Source from the content-addressed store, hash-verified

1339	return hidden_states
1340
1341	def prepare_inputs(self,
1342	max_batch_size,
1343	max_beam_width,
1344	max_decoder_input_len,
1345	max_seq_len,
1346	max_encoder_input_len,
1347	gather_context_logits: bool = False,
1348	lora_target_modules: List[str] = None,
1349	use_cache=True,
1350	*args,
1351	**kwargs):
1352	'''@brief: Prepare inputs Tensors for the model, the given sizes are used to determine the
1353	ranges of the dimensions of when using TRT dynamic shapes.
1354
1355	@return: a list contains values which can be fed into the self.forward()
1356	'''
1357
1358	# Prepare inputs
1359	max_output_len = max_decoder_input_len + max_seq_len
1360
1361	head_size = self.head_size
1362	num_kv_heads = (self.num_kv_heads + self.mapping.tp_size -
1363	1) // self.mapping.tp_size
1364
1365	encoder_head_size = self.encoder_head_size
1366	encoder_num_kv_heads = (self.encoder_num_kv_heads + self.mapping.tp_size
1367	- 1) // self.mapping.tp_size
1368
1369	bb_range = [
1370	1, (max_batch_size * max_beam_width + 1) // 2,
1371	max_batch_size * max_beam_width
1372	]
1373	bs_range = [1, (max_batch_size + 1) // 2, max_batch_size]
1374	beam_width_range = [1, (max_beam_width + 1) // 2, max_beam_width]
1375	inlen_range = [
1376	1, 1, max_decoder_input_len
1377	] # context phase >= 1 (if forced_input_ids), generation phase = 1
1378	encoder_inlen_range = [
1379	1, (max_encoder_input_len + 1) // 2, max_encoder_input_len
1380	]
1381	mask_len_range = [1, (max_output_len + 1) // 2 + 1, max_output_len + 1]
1382	max_output_len_range = [0, (max_output_len + 1) // 2, max_output_len]
1383
1384	encoder_num_tokens_range = [
1385	0, # 0 for generation phase, >0 for context phase
1386	(max_encoder_input_len * max_batch_size + 1) // 2,
1387	max_encoder_input_len * max_batch_size,
1388	]
1389	decoder_num_tokens_range = [
1390	1,
1391	max_batch_size * max_beam_width,
1392	max(max_decoder_input_len * max_batch_size,
1393	max_beam_width * max_batch_size),
1394	]
1395
1396	# No enable_two_optimization_profiles support yet
1397
1398	encoder_input_len_range = [

Callers

nothing calls this directly

Calls 15

AttentionMaskParamsClass · 0.90

default_netFunction · 0.90

TensorClass · 0.90

current_all_reduce_helperFunction · 0.90

LoraParamsClass · 0.90

assertionFunction · 0.90

shapeFunction · 0.90

KeyValueCacheParamsClass · 0.90

AttentionParamsClass · 0.90

maxFunction · 0.85

set_workspace_tensorMethod · 0.80

pp_layersMethod · 0.80

Tested by

no test coverage detected