MCPcopy
hub / github.com/Wan-Video/Wan2.2 / generate

Method generate

wan/speech2video.py:392–679  ·  view source on GitHub ↗

r""" Generates video frames from input image and text prompt using diffusion process. Args: input_prompt (`str`): Text prompt for content generation. ref_image_path ('str'): Input image path audio_path ('str'):

(
        self,
        input_prompt,
        ref_image_path,
        audio_path,
        enable_tts,
        tts_prompt_audio,
        tts_prompt_text,
        tts_text,
        num_repeat=1,
        pose_video=None,
        max_area=720 * 1280,
        infer_frames=80,
        shift=5.0,
        sample_solver='unipc',
        sampling_steps=40,
        guide_scale=5.0,
        n_prompt="",
        seed=-1,
        offload_model=True,
        init_first_frame=False,
    )

Source from the content-addressed store, hash-verified

390 return (HEIGHT, WIDTH)
391
392 def generate(
393 self,
394 input_prompt,
395 ref_image_path,
396 audio_path,
397 enable_tts,
398 tts_prompt_audio,
399 tts_prompt_text,
400 tts_text,
401 num_repeat=1,
402 pose_video=None,
403 max_area=720 * 1280,
404 infer_frames=80,
405 shift=5.0,
406 sample_solver='unipc',
407 sampling_steps=40,
408 guide_scale=5.0,
409 n_prompt="",
410 seed=-1,
411 offload_model=True,
412 init_first_frame=False,
413 ):
414 r"""
415 Generates video frames from input image and text prompt using diffusion process.
416
417 Args:
418 input_prompt (`str`):
419 Text prompt for content generation.
420 ref_image_path ('str'):
421 Input image path
422 audio_path ('str'):
423 Audio for video driven
424 num_repeat ('int'):
425 Number of clips to generate; will be automatically adjusted based on the audio length
426 pose_video ('str'):
427 If provided, uses a sequence of poses to drive the generated video
428 max_area (`int`, *optional*, defaults to 720*1280):
429 Maximum pixel area for latent space calculation. Controls video resolution scaling
430 infer_frames (`int`, *optional*, defaults to 80):
431 How many frames to generate per clips. The number should be 4n
432 shift (`float`, *optional*, defaults to 5.0):
433 Noise schedule shift parameter. Affects temporal dynamics
434 [NOTE]: If you want to generate a 480p video, it is recommended to set the shift value to 3.0.
435 sample_solver (`str`, *optional*, defaults to 'unipc'):
436 Solver used to sample the video.
437 sampling_steps (`int`, *optional*, defaults to 40):
438 Number of diffusion sampling steps. Higher values improve quality but slow generation
439 guide_scale (`float` or tuple[`float`], *optional*, defaults 5.0):
440 Classifier-free guidance scale. Controls prompt adherence vs. creativity.
441 If tuple, the first guide_scale will be used for low noise model and
442 the second guide_scale will be used for high noise model.
443 n_prompt (`str`, *optional*, defaults to ""):
444 Negative prompt for content exclusion. If not given, use `config.sample_neg_prompt`
445 seed (`int`, *optional*, defaults to -1):
446 Random seed for noise generation. If -1, use random seed
447 offload_model (`bool`, *optional*, defaults to True):
448 If True, offloads models to CPU during generation to save VRAM
449 init_first_frame (`bool`, *optional*, defaults to False):

Callers 1

generateFunction · 0.95

Calls 15

get_gen_sizeMethod · 0.95
ttsMethod · 0.95
encode_audioMethod · 0.95
load_pose_condMethod · 0.95
set_timestepsMethod · 0.95
stepMethod · 0.95
get_sampling_sigmasFunction · 0.85
retrieve_timestepsFunction · 0.85
toMethod · 0.80
deviceMethod · 0.80

Tested by

no test coverage detected