MCPcopy
hub / github.com/lm-sys/FastChat / check_length

Function check_length

fastchat/serve/openai_api_server.py:158–180  ·  view source on GitHub ↗
(request, prompt, max_tokens, worker_addr)

Source from the content-addressed store, hash-verified

156
157
158async def check_length(request, prompt, max_tokens, worker_addr):
159 if (
160 not isinstance(max_tokens, int) or max_tokens <= 0
161 ): # model worker not support max_tokens=None
162 max_tokens = 1024 * 1024
163
164 context_len = await fetch_remote(
165 worker_addr + "/model_details", {"model": request.model}, "context_length"
166 )
167 token_num = await fetch_remote(
168 worker_addr + "/count_token",
169 {"model": request.model, "prompt": prompt},
170 "count",
171 )
172 length = min(max_tokens, context_len - token_num)
173
174 if length <= 0:
175 return None, create_error_response(
176 ErrorCode.CONTEXT_OVERFLOW,
177 f"This model's maximum context length is {context_len} tokens. However, your messages resulted in {token_num} tokens. Please reduce the length of the messages.",
178 )
179
180 return length, None
181
182
183def check_requests(request) -> Optional[JSONResponse]:

Callers 2

create_chat_completionFunction · 0.85
create_completionFunction · 0.85

Calls 2

fetch_remoteFunction · 0.85
create_error_responseFunction · 0.85

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…