hub / github.com/Wan-Video/Wan2.2 / generate

Method generate

wan/speech2video.py:392–679 · view source on GitHub ↗

r""" Generates video frames from input image and text prompt using diffusion process. Args: input_prompt (`str`): Text prompt for content generation. ref_image_path ('str'): Input image path audio_path ('str'):

(
        self,
        input_prompt,
        ref_image_path,
        audio_path,
        enable_tts,
        tts_prompt_audio,
        tts_prompt_text,
        tts_text,
        num_repeat=1,
        pose_video=None,
        max_area=720 * 1280,
        infer_frames=80,
        shift=5.0,
        sample_solver='unipc',
        sampling_steps=40,
        guide_scale=5.0,
        n_prompt="",
        seed=-1,
        offload_model=True,
        init_first_frame=False,
    )

Source from the content-addressed store, hash-verified

390	return (HEIGHT, WIDTH)
391
392	def generate(
393	self,
394	input_prompt,
395	ref_image_path,
396	audio_path,
397	enable_tts,
398	tts_prompt_audio,
399	tts_prompt_text,
400	tts_text,
401	num_repeat=1,
402	pose_video=None,
403	max_area=720 * 1280,
404	infer_frames=80,
405	shift=5.0,
406	sample_solver='unipc',
407	sampling_steps=40,
408	guide_scale=5.0,
409	n_prompt="",
410	seed=-1,
411	offload_model=True,
412	init_first_frame=False,
413	):
414	r"""
415	Generates video frames from input image and text prompt using diffusion process.
416
417	Args:
418	input_prompt (`str`):
419	Text prompt for content generation.
420	ref_image_path ('str'):
421	Input image path
422	audio_path ('str'):
423	Audio for video driven
424	num_repeat ('int'):
425	Number of clips to generate; will be automatically adjusted based on the audio length
426	pose_video ('str'):
427	If provided, uses a sequence of poses to drive the generated video
428	max_area (`int`, optional, defaults to 720*1280):
429	Maximum pixel area for latent space calculation. Controls video resolution scaling
430	infer_frames (`int`, optional, defaults to 80):
431	How many frames to generate per clips. The number should be 4n
432	shift (`float`, optional, defaults to 5.0):
433	Noise schedule shift parameter. Affects temporal dynamics
434	[NOTE]: If you want to generate a 480p video, it is recommended to set the shift value to 3.0.
435	sample_solver (`str`, optional, defaults to 'unipc'):
436	Solver used to sample the video.
437	sampling_steps (`int`, optional, defaults to 40):
438	Number of diffusion sampling steps. Higher values improve quality but slow generation
439	guide_scale (`float` or tuple[`float`], optional, defaults 5.0):
440	Classifier-free guidance scale. Controls prompt adherence vs. creativity.
441	If tuple, the first guide_scale will be used for low noise model and
442	the second guide_scale will be used for high noise model.
443	n_prompt (`str`, optional, defaults to ""):
444	Negative prompt for content exclusion. If not given, use `config.sample_neg_prompt`
445	seed (`int`, optional, defaults to -1):
446	Random seed for noise generation. If -1, use random seed
447	offload_model (`bool`, optional, defaults to True):
448	If True, offloads models to CPU during generation to save VRAM
449	init_first_frame (`bool`, optional, defaults to False):

Callers 1

generateFunction · 0.95

Calls 15

get_gen_sizeMethod · 0.95

ttsMethod · 0.95

encode_audioMethod · 0.95

load_pose_condMethod · 0.95

set_timestepsMethod · 0.95

stepMethod · 0.95

FlowUniPCMultistepSchedulerClass · 0.85

FlowDPMSolverMultistepSchedulerClass · 0.85

get_sampling_sigmasFunction · 0.85

retrieve_timestepsFunction · 0.85

toMethod · 0.80

deviceMethod · 0.80

Tested by

no test coverage detected