MCPcopy
hub / github.com/policy-gradient/GRPO-Zero / init_kv_cache

Method init_kv_cache

qwen2_model.py:290–300  ·  view source on GitHub ↗
(
        self,
        max_batch_size: int,
        max_seq_len: int,
        device: torch.device,
        dtype: torch.dtype,
    )

Source from the content-addressed store, hash-verified

288 return output
289
290 def init_kv_cache(
291 self,
292 max_batch_size: int,
293 max_seq_len: int,
294 device: torch.device,
295 dtype: torch.dtype,
296 ):
297 for layer in self.layers:
298 layer.self_attn.init_kv_cache(
299 max_batch_size, max_seq_len, dtype=dtype, device=device
300 )
301
302 def del_kv_cache(self):
303 for layer in self.layers:

Callers

nothing calls this directly

Calls 1

init_kv_cacheMethod · 0.45

Tested by

no test coverage detected