hub / github.com/babysor/MockingBird / compute_partial_slices

Function compute_partial_slices

encoder/inference.py:59–108 · view source on GitHub ↗

Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain partial utterances of each. Both the waveform and the mel spectrogram slices are returned, so as to make each partial utterance waveform correspond to its sp

(n_samples, partial_utterance_n_frames=partials_n_frames,
                           min_pad_coverage=0.75, overlap=0.5)

Source from the content-addressed store, hash-verified

57
58
59	def compute_partial_slices(n_samples, partial_utterance_n_frames=partials_n_frames,
60	min_pad_coverage=0.75, overlap=0.5):
61	"""
62	Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain
63	partial utterances of <partial_utterance_n_frames> each. Both the waveform and the mel
64	spectrogram slices are returned, so as to make each partial utterance waveform correspond to
65	its spectrogram. This function assumes that the mel spectrogram parameters used are those
66	defined in params_data.py.
67
68	The returned ranges may be indexing further than the length of the waveform. It is
69	recommended that you pad the waveform with zeros up to wave_slices[-1].stop.
70
71	:param n_samples: the number of samples in the waveform
72	:param partial_utterance_n_frames: the number of mel spectrogram frames in each partial
73	utterance
74	:param min_pad_coverage: when reaching the last partial utterance, it may or may not have
75	enough frames. If at least <min_pad_coverage> of <partial_utterance_n_frames> are present,
76	then the last partial utterance will be considered, as if we padded the audio. Otherwise,
77	it will be discarded, as if we trimmed the audio. If there aren't enough frames for 1 partial
78	utterance, this parameter is ignored so that the function always returns at least 1 slice.
79	:param overlap: by how much the partial utterance should overlap. If set to 0, the partial
80	utterances are entirely disjoint.
81	:return: the waveform slices and mel spectrogram slices as lists of array slices. Index
82	respectively the waveform and the mel spectrogram with these slices to obtain the partial
83	utterances.
84	"""
85	assert 0 <= overlap < 1
86	assert 0 < min_pad_coverage <= 1
87
88	samples_per_frame = int((sampling_rate * mel_window_step / 1000))
89	n_frames = int(np.ceil((n_samples + 1) / samples_per_frame))
90	frame_step = max(int(np.round(partial_utterance_n_frames * (1 - overlap))), 1)
91
92	# Compute the slices
93	wav_slices, mel_slices = [], []
94	steps = max(1, n_frames - partial_utterance_n_frames + frame_step + 1)
95	for i in range(0, steps, frame_step):
96	mel_range = np.array([i, i + partial_utterance_n_frames])
97	wav_range = mel_range * samples_per_frame
98	mel_slices.append(slice(*mel_range))
99	wav_slices.append(slice(*wav_range))
100
101	# Evaluate whether extra padding is warranted or not
102	last_wav_range = wav_slices[-1]
103	coverage = (n_samples - last_wav_range.start) / (last_wav_range.stop - last_wav_range.start)
104	if coverage < min_pad_coverage and len(mel_slices) > 1:
105	mel_slices = mel_slices[:-1]
106	wav_slices = wav_slices[:-1]
107
108	return wav_slices, mel_slices
109
110
111	def embed_utterance(wav, using_partials=True, return_partials=False, **kwargs):

Callers 1

embed_utteranceFunction · 0.85

Calls 1

appendMethod · 0.80

Tested by

no test coverage detected