MCPcopy
hub / github.com/babysor/MockingBird / compute_partial_slices

Function compute_partial_slices

encoder/inference.py:59–108  ·  view source on GitHub ↗

Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain partial utterances of each. Both the waveform and the mel spectrogram slices are returned, so as to make each partial utterance waveform correspond to its sp

(n_samples, partial_utterance_n_frames=partials_n_frames,
                           min_pad_coverage=0.75, overlap=0.5)

Source from the content-addressed store, hash-verified

57
58
59def compute_partial_slices(n_samples, partial_utterance_n_frames=partials_n_frames,
60 min_pad_coverage=0.75, overlap=0.5):
61 """
62 Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain
63 partial utterances of <partial_utterance_n_frames> each. Both the waveform and the mel
64 spectrogram slices are returned, so as to make each partial utterance waveform correspond to
65 its spectrogram. This function assumes that the mel spectrogram parameters used are those
66 defined in params_data.py.
67
68 The returned ranges may be indexing further than the length of the waveform. It is
69 recommended that you pad the waveform with zeros up to wave_slices[-1].stop.
70
71 :param n_samples: the number of samples in the waveform
72 :param partial_utterance_n_frames: the number of mel spectrogram frames in each partial
73 utterance
74 :param min_pad_coverage: when reaching the last partial utterance, it may or may not have
75 enough frames. If at least <min_pad_coverage> of <partial_utterance_n_frames> are present,
76 then the last partial utterance will be considered, as if we padded the audio. Otherwise,
77 it will be discarded, as if we trimmed the audio. If there aren&#x27;t enough frames for 1 partial
78 utterance, this parameter is ignored so that the function always returns at least 1 slice.
79 :param overlap: by how much the partial utterance should overlap. If set to 0, the partial
80 utterances are entirely disjoint.
81 :return: the waveform slices and mel spectrogram slices as lists of array slices. Index
82 respectively the waveform and the mel spectrogram with these slices to obtain the partial
83 utterances.
84 """
85 assert 0 <= overlap < 1
86 assert 0 < min_pad_coverage <= 1
87
88 samples_per_frame = int((sampling_rate * mel_window_step / 1000))
89 n_frames = int(np.ceil((n_samples + 1) / samples_per_frame))
90 frame_step = max(int(np.round(partial_utterance_n_frames * (1 - overlap))), 1)
91
92 # Compute the slices
93 wav_slices, mel_slices = [], []
94 steps = max(1, n_frames - partial_utterance_n_frames + frame_step + 1)
95 for i in range(0, steps, frame_step):
96 mel_range = np.array([i, i + partial_utterance_n_frames])
97 wav_range = mel_range * samples_per_frame
98 mel_slices.append(slice(*mel_range))
99 wav_slices.append(slice(*wav_range))
100
101 # Evaluate whether extra padding is warranted or not
102 last_wav_range = wav_slices[-1]
103 coverage = (n_samples - last_wav_range.start) / (last_wav_range.stop - last_wav_range.start)
104 if coverage < min_pad_coverage and len(mel_slices) > 1:
105 mel_slices = mel_slices[:-1]
106 wav_slices = wav_slices[:-1]
107
108 return wav_slices, mel_slices
109
110
111def embed_utterance(wav, using_partials=True, return_partials=False, **kwargs):

Callers 1

embed_utteranceFunction · 0.85

Calls 1

appendMethod · 0.80

Tested by

no test coverage detected