MCPcopy
hub / github.com/InternLM/lmdeploy / mllm_eval_test

Function mllm_eval_test

autotest/utils/evaluate_utils.py:349–379  ·  view source on GitHub ↗
(model_path, eval_path, case_name, port=DEFAULT_PORT, test_type='infer', extra_config={})

Source from the content-addressed store, hash-verified

347
348
349def mllm_eval_test(model_path, eval_path, case_name, port=DEFAULT_PORT, test_type='infer', extra_config={}):
350 work_dir = os.path.join(eval_path, f'wk_{case_name}')
351 timestamp = time.strftime('%Y%m%d_%H%M%S')
352 eval_log = os.path.join(eval_path, f'log_{case_name}_{timestamp}.log')
353
354 print(f'Starting VLMEvalKit evaluation for model: {model_path}')
355 print(f'Model path: {model_path}')
356 print(f'Case: {case_name}')
357 print(f'Work directory: {work_dir}')
358
359 os.makedirs(work_dir, exist_ok=True)
360
361 extra_config_str = get_cli_str(extra_config)
362
363 if test_type == 'infer':
364 cmd = f'python run.py --data MMBench_V11_MINI MMStar_MINI AI2D_MINI OCRBench_MINI --model {case_name} --base-url http://{DEFAULT_SERVER}:{port}/v1 --reuse --work-dir {work_dir} --timeout 7200 --mode infer {extra_config_str}' # noqa
365 elif test_type == 'eval':
366 cmd = f'python run.py --data MMBench_V11_MINI MMStar_MINI AI2D_MINI OCRBench_MINI --model {case_name} --base-url http://{DEFAULT_SERVER}:empty/v1 --reuse --work-dir {work_dir} --api-nproc 32 --mode eval --judge turbomind_Qwen2.5-32B-Instruct_nccl_tp2_0 --judge-base-url http://{DEFAULT_SERVER}:{port}/v1' # noqa
367
368 result, msg = execute_command_with_logging(cmd, eval_log)
369
370 allure.attach.file(eval_log, name=eval_log, attachment_type=allure.attachment_type.TEXT)
371
372 if test_type == 'eval':
373 mllm_summary(case_name,
374 result,
375 msg,
376 work_dir,
377 eval_path,
378 dataset_list=['MMBench_V11_MINI', 'MMStar_MINI', 'AI2D_MINI', 'OCRBench_MINI'])
379 return result, msg

Callers 2

run_eval_testFunction · 0.90

Calls 4

get_cli_strFunction · 0.90
mllm_summaryFunction · 0.85
joinMethod · 0.80

Tested by 2

run_eval_testFunction · 0.72