Calculate Mel Frequency Cepstral Coefficients (MFCCs) from an audio signal. Args: audio: The input audio signal. sample_rate: The sample rate of the audio signal (in Hz). ftt_size: The size of the FFT window (default is 1024). hop_length: The hop length for
(
audio: np.ndarray,
sample_rate: int,
ftt_size: int = 1024,
hop_length: int = 20,
mel_filter_num: int = 10,
dct_filter_num: int = 40,
)
| 67 | |
| 68 | |
| 69 | def mfcc( |
| 70 | audio: np.ndarray, |
| 71 | sample_rate: int, |
| 72 | ftt_size: int = 1024, |
| 73 | hop_length: int = 20, |
| 74 | mel_filter_num: int = 10, |
| 75 | dct_filter_num: int = 40, |
| 76 | ) -> np.ndarray: |
| 77 | """ |
| 78 | Calculate Mel Frequency Cepstral Coefficients (MFCCs) from an audio signal. |
| 79 | |
| 80 | Args: |
| 81 | audio: The input audio signal. |
| 82 | sample_rate: The sample rate of the audio signal (in Hz). |
| 83 | ftt_size: The size of the FFT window (default is 1024). |
| 84 | hop_length: The hop length for frame creation (default is 20ms). |
| 85 | mel_filter_num: The number of Mel filters (default is 10). |
| 86 | dct_filter_num: The number of DCT filters (default is 40). |
| 87 | |
| 88 | Returns: |
| 89 | A matrix of MFCCs for the input audio. |
| 90 | |
| 91 | Raises: |
| 92 | ValueError: If the input audio is empty. |
| 93 | |
| 94 | Example: |
| 95 | >>> sample_rate = 44100 # Sample rate of 44.1 kHz |
| 96 | >>> duration = 2.0 # Duration of 1 second |
| 97 | >>> t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False) |
| 98 | >>> audio = 0.5 * np.sin(2 * np.pi * 440.0 * t) # Generate a 440 Hz sine wave |
| 99 | >>> mfccs = mfcc(audio, sample_rate) |
| 100 | >>> mfccs.shape |
| 101 | (40, 101) |
| 102 | """ |
| 103 | logging.info(f"Sample rate: {sample_rate}Hz") |
| 104 | logging.info(f"Audio duration: {len(audio) / sample_rate}s") |
| 105 | logging.info(f"Audio min: {np.min(audio)}") |
| 106 | logging.info(f"Audio max: {np.max(audio)}") |
| 107 | |
| 108 | # normalize audio |
| 109 | audio_normalized = normalize(audio) |
| 110 | |
| 111 | logging.info(f"Normalized audio min: {np.min(audio_normalized)}") |
| 112 | logging.info(f"Normalized audio max: {np.max(audio_normalized)}") |
| 113 | |
| 114 | # frame audio into |
| 115 | audio_framed = audio_frames( |
| 116 | audio_normalized, sample_rate, ftt_size=ftt_size, hop_length=hop_length |
| 117 | ) |
| 118 | |
| 119 | logging.info(f"Framed audio shape: {audio_framed.shape}") |
| 120 | logging.info(f"First frame: {audio_framed[0]}") |
| 121 | |
| 122 | # convert to frequency domain |
| 123 | # For simplicity we will choose the Hanning window. |
| 124 | window = get_window("hann", ftt_size, fftbins=True) |
| 125 | audio_windowed = audio_framed * window |
| 126 |
no test coverage detected