MCPcopy
hub / github.com/hpcaitech/ColossalAI / all_to_all_single_fp8

Function all_to_all_single_fp8

colossalai/quantization/fp8.py:258–282  ·  view source on GitHub ↗

r""" This is wrapper for _all_to_all_single_fp8.

(
    output, input, output_split_sizes=None, input_split_sizes=None, fp8_format="e5m2", group=None, async_op=False
)

Source from the content-addressed store, hash-verified

256
257
258def all_to_all_single_fp8(
259 output, input, output_split_sizes=None, input_split_sizes=None, fp8_format="e5m2", group=None, async_op=False
260) -> Optional[Handle]:
261 r"""
262 This is wrapper for _all_to_all_single_fp8.
263 """
264 if process_group_is_intranode(group):
265 return dist.all_to_all_single(
266 output,
267 input,
268 output_split_sizes=output_split_sizes,
269 input_split_sizes=input_split_sizes,
270 group=group,
271 async_op=async_op,
272 )
273 else:
274 return _all_to_all_single_fp8(
275 output,
276 input,
277 fp8_format=fp8_format,
278 output_split_sizes=output_split_sizes,
279 input_split_sizes=input_split_sizes,
280 group=group,
281 async_op=async_op,
282 )
283
284
285def cast_to_fp8_pipeline(inp: Any) -> None:

Callers 5

_all_to_allFunction · 0.90
_all_to_all_singleFunction · 0.90
check_4gpuFunction · 0.90
check_all2allFunction · 0.90
check_all2all_unevenFunction · 0.90

Calls 2

_all_to_all_single_fp8Function · 0.85

Tested by 3

check_4gpuFunction · 0.72
check_all2allFunction · 0.72
check_all2all_unevenFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…