Method replay

lightllm/common/basemodel/cuda_graph.py:178–183 · view source on GitHub ↗

(self, input_ids, infer_state, input_ids1=None, infer_state1=None)

Source from the content-addressed store, hash-verified

176	return graph_model_output, graph_model_output1
177
178	def replay(self, input_ids, infer_state, input_ids1=None, infer_state1=None):
179	if self.enable_decode_microbatch_overlap:
180	return self._replay_overlap(input_ids, infer_state, input_ids1, infer_state1)
181	else:
182	assert input_ids1 is None and infer_state1 is None
183	return self._replay(input_ids, infer_state)
184
185	@torch.no_grad()
186	def warmup(self, model):

_decodeMethod · 0.80

microbatch_overlap_decodeMethod · 0.80

_capture_decodeMethod · 0.80

_capture_decode_overlapMethod · 0.80

_replayMethod · 0.80

_replay_overlapMethod · 0.80

_benchMethod · 0.80

test_kernelFunction · 0.80

test_decode_attentionsFunction · 0.80

test_kernelFunction · 0.80

_replay_overlapMethod · 0.95

_replayMethod · 0.95

test_kernelFunction · 0.64

test_decode_attentionsFunction · 0.64

test_kernelFunction · 0.64

test_fp8_block_gemmFunction · 0.64

test_expert_id_counterFunction · 0.64