hub / github.com/evalplus/evalplus / script

Function script

evalplus/sanitize.py:175–243 · view source on GitHub ↗

(
    samples: str, inplace: bool = False, debug_task: str = None, mbpp_version="default"
)

Source from the content-addressed store, hash-verified

173
174
175	def script(
176	samples: str, inplace: bool = False, debug_task: str = None, mbpp_version="default"
177	):
178	# task_id -> entry_point
179	entry_point = {}
180	# merge two datasets
181	dataset = {get_human_eval_plus(), get_mbpp_plus(version=mbpp_version)}
182
183	for task_id, problem in dataset.items():
184	entry_point[task_id] = problem["entry_point"]
185
186	# make a new folder with "-sanitized" suffix
187	is_folder = os.path.isdir(samples)
188	target_path = pathlib.Path(samples)
189	if not inplace:
190	if is_folder:
191	new_name = target_path.name + "-sanitized"
192	else:
193	new_name = target_path.name.replace(".jsonl", "-sanitized.jsonl")
194	target_path = target_path.parent / new_name
195	target_path = str(target_path)
196
197	nsan = 0
198	ntotal = 0
199
200	new_solutions = []
201
202	for solution in tqdm(load_solutions(samples)):
203	task_id = solution["task_id"]
204	if task_id not in dataset:
205	print(
206	f"Skiping {task_id} as it does not existing in the latest EvalPlus dataset."
207	)
208	continue
209
210	function_name = entry_point[task_id] if task_id in entry_point else None
211	dbg_identifier = solution["_identifier"]
212	if debug_task is not None and task_id != debug_task:
213	continue
214
215	ntotal += 1
216	if "solution" in solution:
217	old_code = solution["solution"]
218	else:
219	assert "completion" in solution
220	old_code = dataset[task_id]["prompt"] + "\n" + solution["completion"]
221
222	new_code = sanitize(code=old_code, entrypoint=function_name)
223
224	# if changed, print the message
225	if new_code != old_code:
226	msg = "Sanitized: " + dbg_identifier
227	if is_folder:
228	msg += " -> " + dbg_identifier.replace(samples, target_path)
229	print(msg)
230	nsan += 1
231
232	new_solutions.append({"task_id": task_id, "solution": new_code})

Callers

nothing calls this directly

Calls 6

get_human_eval_plusFunction · 0.90

get_mbpp_plusFunction · 0.90

load_solutionsFunction · 0.90

write_directoryFunction · 0.90

write_jsonlFunction · 0.90

sanitizeFunction · 0.70

Tested by

no test coverage detected