MCPcopy
hub / github.com/lm-sys/FastChat / get_model_answers

Function get_model_answers

fastchat/llm_judge/gen_model_answer.py:74–190  ·  view source on GitHub ↗
(
    model_path,
    model_id,
    questions,
    answer_file,
    max_new_token,
    num_choices,
    num_gpus_per_model,
    max_gpu_memory,
    dtype,
    revision,
)

Source from the content-addressed store, hash-verified

72
73@torch.inference_mode()
74def get_model_answers(
75 model_path,
76 model_id,
77 questions,
78 answer_file,
79 max_new_token,
80 num_choices,
81 num_gpus_per_model,
82 max_gpu_memory,
83 dtype,
84 revision,
85):
86 model, tokenizer = load_model(
87 model_path,
88 revision=revision,
89 device="cuda",
90 num_gpus=num_gpus_per_model,
91 max_gpu_memory=max_gpu_memory,
92 dtype=dtype,
93 load_8bit=False,
94 cpu_offloading=False,
95 debug=False,
96 )
97
98 for question in tqdm(questions):
99 if question["category"] in temperature_config:
100 temperature = temperature_config[question["category"]]
101 else:
102 temperature = 0.7
103
104 choices = []
105 for i in range(num_choices):
106 torch.manual_seed(i)
107 conv = get_conversation_template(model_id)
108 turns = []
109 for j in range(len(question["turns"])):
110 qs = question["turns"][j]
111 conv.append_message(conv.roles[0], qs)
112 conv.append_message(conv.roles[1], None)
113 prompt = conv.get_prompt()
114 input_ids = tokenizer([prompt]).input_ids
115
116 if temperature < 1e-4:
117 do_sample = False
118 else:
119 do_sample = True
120
121 # some models may error out when generating long outputs
122 try:
123 output_ids = model.generate(
124 torch.as_tensor(input_ids).cuda(),
125 do_sample=do_sample,
126 temperature=temperature,
127 max_new_tokens=max_new_token,
128 )
129 if model.config.is_encoder_decoder:
130 output_ids = output_ids[0]
131 else:

Callers

nothing calls this directly

Calls 7

load_modelFunction · 0.90
append_messageMethod · 0.80
get_promptMethod · 0.80
update_last_messageMethod · 0.80
writeMethod · 0.80
generateMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…