MCPcopy Index your code
hub / github.com/google/adk-python / run_eval

Function run_eval

contributing/samples/integrations/gepa/experiment.py:596–639  ·  view source on GitHub ↗

Runs evaluation on the test set using the given instructions. Args: output_dir: The directory to save evaluation results. instructions: The system instructions to evaluate. config: The experiment configuration.

(output_dir: str, instructions: str, config: ExperimentConfig)

Source from the content-addressed store, hash-verified

594
595
596def run_eval(output_dir: str, instructions: str, config: ExperimentConfig):
597 """Runs evaluation on the test set using the given instructions.
598
599 Args:
600 output_dir: The directory to save evaluation results.
601 instructions: The system instructions to evaluate.
602 config: The experiment configuration.
603 """
604 eval_dataset = _get_dataset(config.eval_dataset)
605 tau_bench_run_config = RunConfig(
606 env=config.tau_bench_env,
607 model=config.agent_model,
608 model_provider=config.agent_model_provider,
609 user_model=config.user_model,
610 user_model_provider=config.user_model_provider,
611 agent_strategy='tool-calling',
612 user_strategy='llm',
613 max_concurrency=config.max_concurrency,
614 num_trials=config.num_eval_trials,
615 task_ids=eval_dataset,
616 log_dir=output_dir,
617 task_split=config.eval_dataset.split,
618 )
619 with open(os.path.join(output_dir, 'prompt.txt'), 'w') as f:
620 f.write(instructions)
621
622 json.dump(
623 tau_bench_run_config.model_dump(),
624 open(os.path.join(output_dir, 'run_config.json'), 'w'),
625 )
626 tau_bench_results = run_tau_bench_rollouts(
627 tau_bench_run_config,
628 system_instruction=instructions,
629 rater=_rater(config) if config.use_rater else None,
630 )
631 total = len(tau_bench_results)
632 numerator = sum(1 for res in tau_bench_results if res.reward == 1)
633 print(
634 f'average reward (total={total}): {numerator/total if total > 0 else 0}'
635 )
636 json.dump(
637 dict(results=[r.model_dump() for r in tau_bench_results]),
638 open(os.path.join(output_dir, 'results.json'), 'w'),
639 )

Callers 1

run_eval_legacyMethod · 0.85

Calls 8

_get_datasetFunction · 0.85
RunConfigClass · 0.85
openFunction · 0.85
run_tau_bench_rolloutsFunction · 0.85
_raterFunction · 0.85
joinMethod · 0.45
writeMethod · 0.45
model_dumpMethod · 0.45

Tested by

no test coverage detected