MCPcopy Index your code
hub / github.com/evalplus/evalplus / script

Function script

evalplus/perf/select_pe_inputs.py:15–116  ·  view source on GitHub ↗
(solutions: str, output_profiled_solutions: str, pe_inputs: str = None)

Source from the content-addressed store, hash-verified

13
14
15def script(solutions: str, output_profiled_solutions: str, pe_inputs: str = None):
16 assert solutions.endswith(".jsonl")
17 assert pe_inputs is None or pe_inputs.endswith(".jsonl")
18 assert output_profiled_solutions.endswith(".jsonl")
19
20 evalplus = get_human_eval_plus(noextreme=True)
21 mbppplus = get_mbpp_plus(noextreme=True)
22 tasks = {**evalplus, **mbppplus}
23
24 # assume each line's format is: {
25 # "task_id": task's id,
26 # "inputs": a list of inputs,
27 inputs_dict = None
28
29 if pe_inputs is not None:
30 print("Loading performance-exercising inputs...")
31 with open(pe_inputs, "r") as f:
32 inputs_dict = {
33 task["task_id"]: task["inputs"] for l in f for task in [json.loads(l)]
34 }
35
36 # Notably, the solutions are already validated and cleaned.
37 with open(solutions, "r") as f:
38 solutions = {}
39 for l in f:
40 solution = json.loads(l)
41 solutions[solution["task_id"]] = solution["solution"]
42
43 for task_id, task in tqdm(tasks.items()):
44 if inputs_dict:
45 inputs = (
46 mbpp_deserialize_inputs(task_id, inputs_dict[task_id])
47 if "Mbpp/" in task_id
48 else inputs_dict[task_id]
49 )
50 else:
51 inputs = task["base_input"] + list(task["plus_input"])
52
53 input_costs = []
54
55 if task_id.startswith("HumanEval"):
56 canonical_solution = task["prompt"] + task["canonical_solution"]
57 else:
58 canonical_solution = task["canonical_solution"]
59
60 for inp in inputs:
61 costs = profile(
62 canonical_solution,
63 task["entry_point"],
64 [inp],
65 timeout_second_per_test=PERF_CURATE_TIMEOUT_SECOND,
66 )
67 if are_profiles_broken(costs):
68 continue
69 input_costs.append((median(costs), inp))
70 input_costs.sort(reverse=True, key=lambda x: x[0])
71
72 for _, pe_input in input_costs:

Callers

nothing calls this directly

Calls 6

get_human_eval_plusFunction · 0.90
get_mbpp_plusFunction · 0.90
mbpp_deserialize_inputsFunction · 0.90
profileFunction · 0.90
are_profiles_brokenFunction · 0.90
mbpp_serialize_inputsFunction · 0.90

Tested by

no test coverage detected