Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain partial utterances of each. Both the waveform and the mel spectrogram slices are returned, so as to make each partial utterance waveform correspond to its sp
(n_samples, partial_utterance_n_frames=partials_n_frames,
min_pad_coverage=0.75, overlap=0.5)
| 57 | |
| 58 | |
| 59 | def compute_partial_slices(n_samples, partial_utterance_n_frames=partials_n_frames, |
| 60 | min_pad_coverage=0.75, overlap=0.5): |
| 61 | """ |
| 62 | Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain |
| 63 | partial utterances of <partial_utterance_n_frames> each. Both the waveform and the mel |
| 64 | spectrogram slices are returned, so as to make each partial utterance waveform correspond to |
| 65 | its spectrogram. This function assumes that the mel spectrogram parameters used are those |
| 66 | defined in params_data.py. |
| 67 | |
| 68 | The returned ranges may be indexing further than the length of the waveform. It is |
| 69 | recommended that you pad the waveform with zeros up to wave_slices[-1].stop. |
| 70 | |
| 71 | :param n_samples: the number of samples in the waveform |
| 72 | :param partial_utterance_n_frames: the number of mel spectrogram frames in each partial |
| 73 | utterance |
| 74 | :param min_pad_coverage: when reaching the last partial utterance, it may or may not have |
| 75 | enough frames. If at least <min_pad_coverage> of <partial_utterance_n_frames> are present, |
| 76 | then the last partial utterance will be considered, as if we padded the audio. Otherwise, |
| 77 | it will be discarded, as if we trimmed the audio. If there aren't enough frames for 1 partial |
| 78 | utterance, this parameter is ignored so that the function always returns at least 1 slice. |
| 79 | :param overlap: by how much the partial utterance should overlap. If set to 0, the partial |
| 80 | utterances are entirely disjoint. |
| 81 | :return: the waveform slices and mel spectrogram slices as lists of array slices. Index |
| 82 | respectively the waveform and the mel spectrogram with these slices to obtain the partial |
| 83 | utterances. |
| 84 | """ |
| 85 | assert 0 <= overlap < 1 |
| 86 | assert 0 < min_pad_coverage <= 1 |
| 87 | |
| 88 | samples_per_frame = int((sampling_rate * mel_window_step / 1000)) |
| 89 | n_frames = int(np.ceil((n_samples + 1) / samples_per_frame)) |
| 90 | frame_step = max(int(np.round(partial_utterance_n_frames * (1 - overlap))), 1) |
| 91 | |
| 92 | # Compute the slices |
| 93 | wav_slices, mel_slices = [], [] |
| 94 | steps = max(1, n_frames - partial_utterance_n_frames + frame_step + 1) |
| 95 | for i in range(0, steps, frame_step): |
| 96 | mel_range = np.array([i, i + partial_utterance_n_frames]) |
| 97 | wav_range = mel_range * samples_per_frame |
| 98 | mel_slices.append(slice(*mel_range)) |
| 99 | wav_slices.append(slice(*wav_range)) |
| 100 | |
| 101 | # Evaluate whether extra padding is warranted or not |
| 102 | last_wav_range = wav_slices[-1] |
| 103 | coverage = (n_samples - last_wav_range.start) / (last_wav_range.stop - last_wav_range.start) |
| 104 | if coverage < min_pad_coverage and len(mel_slices) > 1: |
| 105 | mel_slices = mel_slices[:-1] |
| 106 | wav_slices = wav_slices[:-1] |
| 107 | |
| 108 | return wav_slices, mel_slices |
| 109 | |
| 110 | |
| 111 | def embed_utterance(wav, using_partials=True, return_partials=False, **kwargs): |
no test coverage detected