MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / _debug_run

Method _debug_run

tensorrt_llm/runtime/session.py:300–324  ·  view source on GitHub ↗

Run the engine enqueue with allocated output tensors, for debug purpose, since it is a sync call and slower than run

(self,
                   inputs: Dict[str, "torch.Tensor"],
                   context=None)

Source from the content-addressed store, hash-verified

298 return ok
299
300 def _debug_run(self,
301 inputs: Dict[str, "torch.Tensor"],
302 context=None) -> Dict[str, "torch.Tensor"]:
303 '''Run the engine enqueue with allocated output tensors, for debug purpose, since it is a sync call and slower than run
304 '''
305 import torch
306
307 inputs_info = [
308 TensorInfo(name, torch_dtype_to_trt(tensor.dtype), tensor.shape)
309 for name, tensor in inputs.items()
310 ]
311 outputs_info = self.infer_shapes(inputs_info)
312 outputs = {
313 t.name:
314 torch.empty(tuple(t.shape),
315 dtype=trt_dtype_to_torch(t.dtype),
316 device='cuda')
317 for t in outputs_info
318 }
319 with _scoped_stream() as stream:
320 self.run(inputs=inputs,
321 outputs=outputs,
322 stream=stream,
323 context=context)
324 return outputs

Callers 1

Calls 7

infer_shapesMethod · 0.95
runMethod · 0.95
TensorInfoClass · 0.85
torch_dtype_to_trtFunction · 0.85
_scoped_streamFunction · 0.85
trt_dtype_to_torchFunction · 0.50
emptyMethod · 0.45

Tested by 1