MCPcopy
hub / github.com/QwenLM/Qwen-Audio / log_mel_spectrogram

Function log_mel_spectrogram

audio.py:157–204  ·  view source on GitHub ↗

Compute the log-Mel spectrogram of Parameters ---------- audio: Union[str, np.ndarray, torch.Tensor], shape = (*) The path to audio or either a NumPy array or Tensor containing the audio waveform in 16 kHz n_mels: int The number of Mel-frequency filters, only 8

(
    audio: Union[str, np.ndarray, torch.Tensor],
    n_mels: int = N_MELS,
    padding: int = 0,
    device: Optional[Union[str, torch.device]] = None,
)

Source from the content-addressed store, hash-verified

155
156
157def log_mel_spectrogram(
158 audio: Union[str, np.ndarray, torch.Tensor],
159 n_mels: int = N_MELS,
160 padding: int = 0,
161 device: Optional[Union[str, torch.device]] = None,
162):
163 """
164 Compute the log-Mel spectrogram of
165
166 Parameters
167 ----------
168 audio: Union[str, np.ndarray, torch.Tensor], shape = (*)
169 The path to audio or either a NumPy array or Tensor containing the audio waveform in 16 kHz
170
171 n_mels: int
172 The number of Mel-frequency filters, only 80 is supported
173
174 padding: int
175 Number of zero samples to pad to the right
176
177 device: Optional[Union[str, torch.device]]
178 If given, the audio tensor is moved to this device before STFT
179
180 Returns
181 -------
182 torch.Tensor, shape = (80, n_frames)
183 A Tensor that contains the Mel spectrogram
184 """
185 if not torch.is_tensor(audio):
186 if isinstance(audio, str):
187 audio = load_audio(audio)
188 audio = torch.from_numpy(audio)
189
190 if device is not None:
191 audio = audio.to(device)
192 if padding > 0:
193 audio = F.pad(audio, (0, padding))
194 window = torch.hann_window(N_FFT).to(audio.device)
195 stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
196 magnitudes = stft[..., :-1].abs() ** 2
197
198 filters = mel_filters(audio.device, n_mels)
199 mel_spec = filters @ magnitudes
200
201 log_spec = torch.clamp(mel_spec, min=1e-10).log10()
202 log_spec = torch.maximum(log_spec, log_spec.max() - 8.0)
203 log_spec = (log_spec + 4.0) / 4.0
204 return log_spec
205
206
207@dataclass

Callers 1

process_audioMethod · 0.85

Calls 2

load_audioFunction · 0.85
mel_filtersFunction · 0.85

Tested by

no test coverage detected