MCPcopy
hub / github.com/ModelTC/LightLLM / replay

Method replay

lightllm/common/basemodel/cuda_graph.py:178–183  ·  view source on GitHub ↗
(self, input_ids, infer_state, input_ids1=None, infer_state1=None)

Source from the content-addressed store, hash-verified

176 return graph_model_output, graph_model_output1
177
178 def replay(self, input_ids, infer_state, input_ids1=None, infer_state1=None):
179 if self.enable_decode_microbatch_overlap:
180 return self._replay_overlap(input_ids, infer_state, input_ids1, infer_state1)
181 else:
182 assert input_ids1 is None and infer_state1 is None
183 return self._replay(input_ids, infer_state)
184
185 @torch.no_grad()
186 def warmup(self, model):

Callers 15

_decodeMethod · 0.80
_capture_decodeMethod · 0.80
_replayMethod · 0.80
_replay_overlapMethod · 0.80
_benchMethod · 0.80
test_kernelFunction · 0.80
test_decode_attentionsFunction · 0.80
test_decode_attentionsFunction · 0.80
test_kernelFunction · 0.80
test_kernelFunction · 0.80

Calls 2

_replay_overlapMethod · 0.95
_replayMethod · 0.95

Tested by 8

test_kernelFunction · 0.64
test_decode_attentionsFunction · 0.64
test_decode_attentionsFunction · 0.64
test_kernelFunction · 0.64
test_kernelFunction · 0.64
test_kernelFunction · 0.64
test_fp8_block_gemmFunction · 0.64
test_expert_id_counterFunction · 0.64