MCPcopy Index your code
hub / github.com/THUDM/AgentBench / main

Function main

src/analysis.py:301–444  ·  view source on GitHub ↗
(args)

Source from the content-addressed store, hash-verified

299
300
301def main(args):
302 agent_names, task_names, validation_names, details = analyze_output(
303 args.config, args.output, parse_timestamp(args.time)
304 )
305 task_names.sort(key=lambda x: TaskHandler.get_handler(x).get_order_priority())
306 summary = OrderedDict()
307 for agent in details:
308 summary[agent] = OrderedDict()
309 for task in details[agent]:
310 handler = TaskHandler.get_handler(task)
311 if handler is not None:
312 summary[agent][task] = handler.get_main_metric(
313 details[agent][task]["overall"]
314 )
315 else:
316 summary[agent][task] = details[agent][task]["overall"]
317
318 for agent in details:
319 for task in details[agent]:
320 print(
321 ColorMessage.cyan(
322 f"Agent: {agent:20} Task: {task:20} Path: {details[agent][task]['file']}"
323 )
324 )
325
326 final_result = {
327 "summary": summary,
328 "details": details,
329 }
330
331 os.makedirs(args.save, exist_ok=True)
332
333 # Overall Calculation
334
335 with open(os.path.join(args.save, "result.json"), "w", encoding="utf-8") as f:
336 json.dump(final_result, f, indent=4, ensure_ascii=False, sort_keys=True)
337 with open(os.path.join(args.save, "result.yaml"), "w", encoding="utf-8") as f:
338 yaml.dump(final_result, f, indent=4, allow_unicode=True, sort_keys=True)
339 with open(os.path.join(args.save, "summary.csv"), "w", encoding="utf-8") as f:
340 """
341 Format:
342 Agent\\Task, Task1, Task2, ...
343 Agent1, MainMetric(Agent1,Task1), MainMetric(Agent1,Task2), ...
344 ......
345 """
346 f.write("Agent\\Task," + ",".join(task_names) + "\n")
347 for agent in summary:
348 f.write(
349 agent
350 + ","
351 + ",".join(
352 [
353 (str(summary[agent][task]) if task in summary[agent] else "")
354 for task in task_names
355 ]
356 )
357 + "\n"
358 )

Callers 1

analysis.pyFile · 0.70

Calls 7

analyze_outputFunction · 0.85
parse_timestampFunction · 0.85
get_handlerMethod · 0.80
cyanMethod · 0.80
greenMethod · 0.80
get_order_priorityMethod · 0.45
get_main_metricMethod · 0.45

Tested by

no test coverage detected