MCPcopy Index your code
hub / github.com/NVIDIA/TensorRT-LLM / get_buffer

Method get_buffer

tensorrt_llm/_torch/memory_buffer_utils.py:52–122  ·  view source on GitHub ↗

Return a reusable buffer view for the requested shape/dtype. The returned tensor is backed by an underlying `torch.uint8` buffer. When no suitable buffer exists in the pool, a new tensor is created via `torch.empty`, so its contents are uninitialized. Overwrite the data befor

(self, tensor_shape: list[int], dtype: torch.dtype,
                   buffer_name: str, reserve_buffer: bool)

Source from the content-addressed store, hash-verified

50 target_shape)
51
52 def get_buffer(self, tensor_shape: list[int], dtype: torch.dtype,
53 buffer_name: str, reserve_buffer: bool):
54 """Return a reusable buffer view for the requested shape/dtype.
55 The returned tensor is backed by an underlying `torch.uint8` buffer. When
56 no suitable buffer exists in the pool, a new tensor is created via
57 `torch.empty`, so its contents are uninitialized. Overwrite the data before use if needed.
58 """
59
60 # all buffers are allocated with 1 byte element size
61 required_memory_size = math.prod(tensor_shape) * dtype.itemsize
62
63 candidate_blocks = self.buffers.get(buffer_name, [])
64
65 # Find the best-fit available buffer.
66 best_fit_block: Optional[BufferBlock] = None
67 smallest_sufficient_size = float('inf')
68 for block in candidate_blocks:
69 # Skip buffers that are too small.
70 if block.buffer.numel() < required_memory_size:
71 continue
72
73 # Find the smallest buffer that is still large enough (best-fit).
74 if block.buffer.numel() < smallest_sufficient_size:
75 # Use reserved block if find one.
76 if best_fit_block is not None and best_fit_block.is_reserved and not block.is_reserved:
77 continue
78
79 best_fit_block = block
80 smallest_sufficient_size = block.buffer.numel()
81
82 if best_fit_block is not None:
83 if reserve_buffer:
84 best_fit_block.is_reserved = True
85 # A suitable buffer was found, so reuse it.
86 return self._view_as(best_fit_block.buffer, tensor_shape, dtype)
87
88 for block in list(candidate_blocks):
89 if not block.is_reserved:
90 # Need to call del BufferBlock.buffer, otherwise memory isn't
91 # released and OOM may happen.
92 buffer_size = block.buffer.numel()
93 del block.buffer
94 if buffer_size >= 1024 * 1024 * 1024:
95 torch.cuda.empty_cache()
96 candidate_blocks.remove(block)
97
98 # No suitable buffer was found, so allocate a new one.
99 # The new buffer is created with uint8 to represent raw bytes.
100 new_buffer_tensor = None
101 try:
102 with torch.cuda.memory.use_mem_pool(get_shared_pool()):
103 new_buffer_tensor = torch.empty((required_memory_size, ),
104 device='cuda',
105 dtype=torch.uint8)
106 except Exception as ex:
107 # Need to check if this is an OOM exception
108 logger.debug(
109 f"Exception happened to create tensor from given memory pool: {str(ex)}"

Callers

nothing calls this directly

Calls 9

_view_asMethod · 0.95
get_shared_poolFunction · 0.85
BufferBlockClass · 0.85
getMethod · 0.45
numelMethod · 0.45
removeMethod · 0.45
emptyMethod · 0.45
debugMethod · 0.45
appendMethod · 0.45

Tested by

no test coverage detected