hub / github.com/hpcaitech/ColossalAI / GlideInput

Class GlideInput

colossalai/inference/spec/struct.py:33–56 · view source on GitHub ↗

Dataclass for Glide Models (e.g. `colossalai/inference/modeling/models/glide_llama.py`). Used for pack data that will be used during glimpsing KV Caches of the main model. Args: block_tables (torch.Tensor): [num_seqs, max_blocks_per_seq] The block table of KV Caches. large_k

Source from the content-addressed store, hash-verified

31
32	@dataclass
33	class GlideInput:
34	"""Dataclass for Glide Models (e.g. `colossalai/inference/modeling/models/glide_llama.py`).
35	Used for pack data that will be used during glimpsing KV Caches of the main model.
36
37	Args:
38	block_tables (torch.Tensor): [num_seqs, max_blocks_per_seq] The block table of KV Caches.
39	large_k_cache (torch.Tensor): [num_blocks, num_kv_heads, block_size, head_size]
40	Blocked key cache of the main model
41	large_v_cache (torch.Tensor): Blocked value cache of the main model. It has the same shape as k cache.
42	sequence_lengths (torch.Tensor): [num_seqs] Sequence lengths of the current batch.
43	"""
44
45	block_tables: torch.Tensor = None
46	large_k_cache: torch.Tensor = None
47	large_v_cache: torch.Tensor = None
48	sequence_lengths: torch.Tensor = None
49	n_spec_tokens: int = 5
50
51	@property
52	def glimpse_ready(self):
53	return all(
54	attr is not None
55	for attr in [self.block_tables, self.large_k_cache, self.large_v_cache, self.sequence_lengths]
56	)

Callers 1

steps_spec_decMethod · 0.90

Calls

no outgoing calls

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…