hub / github.com/lm-sys/FastChat / get_model_answers

Function get_model_answers

fastchat/llm_judge/gen_model_answer.py:74–190 · view source on GitHub ↗

(
    model_path,
    model_id,
    questions,
    answer_file,
    max_new_token,
    num_choices,
    num_gpus_per_model,
    max_gpu_memory,
    dtype,
    revision,
)

Source from the content-addressed store, hash-verified

72
73	@torch.inference_mode()
74	def get_model_answers(
75	model_path,
76	model_id,
77	questions,
78	answer_file,
79	max_new_token,
80	num_choices,
81	num_gpus_per_model,
82	max_gpu_memory,
83	dtype,
84	revision,
85	):
86	model, tokenizer = load_model(
87	model_path,
88	revision=revision,
89	device="cuda",
90	num_gpus=num_gpus_per_model,
91	max_gpu_memory=max_gpu_memory,
92	dtype=dtype,
93	load_8bit=False,
94	cpu_offloading=False,
95	debug=False,
96	)
97
98	for question in tqdm(questions):
99	if question["category"] in temperature_config:
100	temperature = temperature_config[question["category"]]
101	else:
102	temperature = 0.7
103
104	choices = []
105	for i in range(num_choices):
106	torch.manual_seed(i)
107	conv = get_conversation_template(model_id)
108	turns = []
109	for j in range(len(question["turns"])):
110	qs = question["turns"][j]
111	conv.append_message(conv.roles[0], qs)
112	conv.append_message(conv.roles[1], None)
113	prompt = conv.get_prompt()
114	input_ids = tokenizer([prompt]).input_ids
115
116	if temperature < 1e-4:
117	do_sample = False
118	else:
119	do_sample = True
120
121	# some models may error out when generating long outputs
122	try:
123	output_ids = model.generate(
124	torch.as_tensor(input_ids).cuda(),
125	do_sample=do_sample,
126	temperature=temperature,
127	max_new_tokens=max_new_token,
128	)
129	if model.config.is_encoder_decoder:
130	output_ids = output_ids[0]
131	else:

Callers

nothing calls this directly

Calls 7

load_modelFunction · 0.90

get_conversation_templateFunction · 0.90

append_messageMethod · 0.80

get_promptMethod · 0.80

update_last_messageMethod · 0.80

writeMethod · 0.80

generateMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…