MCPcopy
hub / github.com/InternLM/lmdeploy / cache_block_seq_len

Method cache_block_seq_len

lmdeploy/cli/utils.py:553–564  ·  view source on GitHub ↗

Add argument cache_block_seq_len to parser.

(parser)

Source from the content-addressed store, hash-verified

551
552 @staticmethod
553 def cache_block_seq_len(parser):
554 """Add argument cache_block_seq_len to parser."""
555
556 return parser.add_argument('--cache-block-seq-len',
557 type=int,
558 default=64,
559 help='The length of the token sequence in a k/v block. '
560 'For Turbomind Engine, if the GPU compute capability '
561 'is >= 8.0, it should be a multiple of 32, otherwise '
562 'it should be a multiple of 64. For Pytorch Engine, '
563 'if Lora Adapter is specified, this parameter will '
564 'be ignored')
565
566 @staticmethod
567 def kernel_block_size(parser):

Callers 4

add_parser_api_serverMethod · 0.80
parse_argsFunction · 0.80
parse_argsFunction · 0.80
parse_argsFunction · 0.80

Calls

no outgoing calls

Tested by

no test coverage detected