Dataclass for Glide Models (e.g. `colossalai/inference/modeling/models/glide_llama.py`). Used for pack data that will be used during glimpsing KV Caches of the main model. Args: block_tables (torch.Tensor): [num_seqs, max_blocks_per_seq] The block table of KV Caches. large_k
| 31 | |
| 32 | @dataclass |
| 33 | class GlideInput: |
| 34 | """Dataclass for Glide Models (e.g. `colossalai/inference/modeling/models/glide_llama.py`). |
| 35 | Used for pack data that will be used during glimpsing KV Caches of the main model. |
| 36 | |
| 37 | Args: |
| 38 | block_tables (torch.Tensor): [num_seqs, max_blocks_per_seq] The block table of KV Caches. |
| 39 | large_k_cache (torch.Tensor): [num_blocks, num_kv_heads, block_size, head_size] |
| 40 | Blocked key cache of the main model |
| 41 | large_v_cache (torch.Tensor): Blocked value cache of the main model. It has the same shape as k cache. |
| 42 | sequence_lengths (torch.Tensor): [num_seqs] Sequence lengths of the current batch. |
| 43 | """ |
| 44 | |
| 45 | block_tables: torch.Tensor = None |
| 46 | large_k_cache: torch.Tensor = None |
| 47 | large_v_cache: torch.Tensor = None |
| 48 | sequence_lengths: torch.Tensor = None |
| 49 | n_spec_tokens: int = 5 |
| 50 | |
| 51 | @property |
| 52 | def glimpse_ready(self): |
| 53 | return all( |
| 54 | attr is not None |
| 55 | for attr in [self.block_tables, self.large_k_cache, self.large_v_cache, self.sequence_lengths] |
| 56 | ) |
no outgoing calls
no test coverage detected
searching dependent graphs…