hub / github.com/THUDM/AgentBench / main

Function main

src/analysis.py:301–444 · view source on GitHub ↗

(args)

Source from the content-addressed store, hash-verified

299
300
301	def main(args):
302	agent_names, task_names, validation_names, details = analyze_output(
303	args.config, args.output, parse_timestamp(args.time)
304	)
305	task_names.sort(key=lambda x: TaskHandler.get_handler(x).get_order_priority())
306	summary = OrderedDict()
307	for agent in details:
308	summary[agent] = OrderedDict()
309	for task in details[agent]:
310	handler = TaskHandler.get_handler(task)
311	if handler is not None:
312	summary[agent][task] = handler.get_main_metric(
313	details[agent][task]["overall"]
314	)
315	else:
316	summary[agent][task] = details[agent][task]["overall"]
317
318	for agent in details:
319	for task in details[agent]:
320	print(
321	ColorMessage.cyan(
322	f"Agent: {agent:20} Task: {task:20} Path: {details[agent][task]['file']}"
323	)
324	)
325
326	final_result = {
327	"summary": summary,
328	"details": details,
329	}
330
331	os.makedirs(args.save, exist_ok=True)
332
333	# Overall Calculation
334
335	with open(os.path.join(args.save, "result.json"), "w", encoding="utf-8") as f:
336	json.dump(final_result, f, indent=4, ensure_ascii=False, sort_keys=True)
337	with open(os.path.join(args.save, "result.yaml"), "w", encoding="utf-8") as f:
338	yaml.dump(final_result, f, indent=4, allow_unicode=True, sort_keys=True)
339	with open(os.path.join(args.save, "summary.csv"), "w", encoding="utf-8") as f:
340	"""
341	Format:
342	Agent\\Task, Task1, Task2, ...
343	Agent1, MainMetric(Agent1,Task1), MainMetric(Agent1,Task2), ...
344	......
345	"""
346	f.write("Agent\\Task," + ",".join(task_names) + "\n")
347	for agent in summary:
348	f.write(
349	agent
350	+ ","
351	+ ",".join(
352	[
353	(str(summary[agent][task]) if task in summary[agent] else "")
354	for task in task_names
355	]
356	)
357	+ "\n"
358	)

Callers 1

analysis.pyFile · 0.70

Calls 7

analyze_outputFunction · 0.85

parse_timestampFunction · 0.85

get_handlerMethod · 0.80

cyanMethod · 0.80

greenMethod · 0.80

get_order_priorityMethod · 0.45

get_main_metricMethod · 0.45

Tested by

no test coverage detected