MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / compute

Function compute

cpp/kernels/xqa/ref.py:156–163  ·  view source on GitHub ↗
(q, k, v, kvScale, headElems)

Source from the content-addressed store, hash-verified

154
155
156def compute(q, k, v, kvScale, headElems):
157 qkScale = (headElems**-0.5) * kvScale
158 qk = q @ k.T * qkScale
159 row_max = np.max(qk, axis=1).reshape(-1, 1)
160 x = np.exp(qk - row_max)
161 row_sum = np.sum(x, axis=1).reshape(-1, 1)
162 x @ v * (kvScale / row_sum)
163 return x, row_max, row_sum

Callers

nothing calls this directly

Calls 2

sumMethod · 0.80
maxMethod · 0.45

Tested by

no test coverage detected