MCPcopy
hub / github.com/hpcaitech/ColossalAI / GlideInput

Class GlideInput

colossalai/inference/spec/struct.py:33–56  ·  view source on GitHub ↗

Dataclass for Glide Models (e.g. `colossalai/inference/modeling/models/glide_llama.py`). Used for pack data that will be used during glimpsing KV Caches of the main model. Args: block_tables (torch.Tensor): [num_seqs, max_blocks_per_seq] The block table of KV Caches. large_k

Source from the content-addressed store, hash-verified

31
32@dataclass
33class GlideInput:
34 """Dataclass for Glide Models (e.g. `colossalai/inference/modeling/models/glide_llama.py`).
35 Used for pack data that will be used during glimpsing KV Caches of the main model.
36
37 Args:
38 block_tables (torch.Tensor): [num_seqs, max_blocks_per_seq] The block table of KV Caches.
39 large_k_cache (torch.Tensor): [num_blocks, num_kv_heads, block_size, head_size]
40 Blocked key cache of the main model
41 large_v_cache (torch.Tensor): Blocked value cache of the main model. It has the same shape as k cache.
42 sequence_lengths (torch.Tensor): [num_seqs] Sequence lengths of the current batch.
43 """
44
45 block_tables: torch.Tensor = None
46 large_k_cache: torch.Tensor = None
47 large_v_cache: torch.Tensor = None
48 sequence_lengths: torch.Tensor = None
49 n_spec_tokens: int = 5
50
51 @property
52 def glimpse_ready(self):
53 return all(
54 attr is not None
55 for attr in [self.block_tables, self.large_k_cache, self.large_v_cache, self.sequence_lengths]
56 )

Callers 1

steps_spec_decMethod · 0.90

Calls

no outgoing calls

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…