Pack GPU memory into a PyTorch tensor with specified stride. Parameters: ptr: GPU memory address obtained from cudaMalloc segment_size: Memory size of each segment in bytes segment_stride: Memory stride size between segments in bytes num_segments: Number of
(
ptr: int, segment_size: int, segment_stride: int, num_segments: int, dtype: torch.dtype, dev_id
)
| 181 | |
| 182 | |
| 183 | def pack_strided_memory( |
| 184 | ptr: int, segment_size: int, segment_stride: int, num_segments: int, dtype: torch.dtype, dev_id |
| 185 | ): |
| 186 | """ |
| 187 | Pack GPU memory into a PyTorch tensor with specified stride. |
| 188 | |
| 189 | Parameters: |
| 190 | ptr: GPU memory address obtained from cudaMalloc |
| 191 | segment_size: Memory size of each segment in bytes |
| 192 | segment_stride: Memory stride size between segments in bytes |
| 193 | num_segments: Number of segments |
| 194 | dtype: PyTorch data type for the resulting tensor |
| 195 | dev_id: CUDA device ID |
| 196 | |
| 197 | Returns: |
| 198 | PyTorch tensor that references the provided memory |
| 199 | |
| 200 | Note: |
| 201 | This function creates a new DLPack capsule each time it's called, |
| 202 | even with the same pointer. Each capsule is consumed only once. |
| 203 | """ |
| 204 | # Create a new capsule each time |
| 205 | capsule_wrapper = create_dlpack_capsule( |
| 206 | ptr, segment_size, segment_stride, num_segments, dtype, dev_id |
| 207 | ) |
| 208 | torch_tensor = torch.utils.dlpack.from_dlpack(capsule_wrapper.capsule) |
| 209 | torch_tensor._capsule_wrapper = capsule_wrapper |
| 210 | return torch_tensor |
no test coverage detected