hub / github.com/TheAlgorithms/Python / mfcc

Function mfcc

machine_learning/mfcc.py:69–149 · view source on GitHub ↗

Calculate Mel Frequency Cepstral Coefficients (MFCCs) from an audio signal. Args: audio: The input audio signal. sample_rate: The sample rate of the audio signal (in Hz). ftt_size: The size of the FFT window (default is 1024). hop_length: The hop length for

(
    audio: np.ndarray,
    sample_rate: int,
    ftt_size: int = 1024,
    hop_length: int = 20,
    mel_filter_num: int = 10,
    dct_filter_num: int = 40,
)

Source from the content-addressed store, hash-verified

67
68
69	def mfcc(
70	audio: np.ndarray,
71	sample_rate: int,
72	ftt_size: int = 1024,
73	hop_length: int = 20,
74	mel_filter_num: int = 10,
75	dct_filter_num: int = 40,
76	) -> np.ndarray:
77	"""
78	Calculate Mel Frequency Cepstral Coefficients (MFCCs) from an audio signal.
79
80	Args:
81	audio: The input audio signal.
82	sample_rate: The sample rate of the audio signal (in Hz).
83	ftt_size: The size of the FFT window (default is 1024).
84	hop_length: The hop length for frame creation (default is 20ms).
85	mel_filter_num: The number of Mel filters (default is 10).
86	dct_filter_num: The number of DCT filters (default is 40).
87
88	Returns:
89	A matrix of MFCCs for the input audio.
90
91	Raises:
92	ValueError: If the input audio is empty.
93
94	Example:
95	>>> sample_rate = 44100 # Sample rate of 44.1 kHz
96	>>> duration = 2.0 # Duration of 1 second
97	>>> t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
98	>>> audio = 0.5 * np.sin(2 * np.pi * 440.0 * t) # Generate a 440 Hz sine wave
99	>>> mfccs = mfcc(audio, sample_rate)
100	>>> mfccs.shape
101	(40, 101)
102	"""
103	logging.info(f"Sample rate: {sample_rate}Hz")
104	logging.info(f"Audio duration: {len(audio) / sample_rate}s")
105	logging.info(f"Audio min: {np.min(audio)}")
106	logging.info(f"Audio max: {np.max(audio)}")
107
108	# normalize audio
109	audio_normalized = normalize(audio)
110
111	logging.info(f"Normalized audio min: {np.min(audio_normalized)}")
112	logging.info(f"Normalized audio max: {np.max(audio_normalized)}")
113
114	# frame audio into
115	audio_framed = audio_frames(
116	audio_normalized, sample_rate, ftt_size=ftt_size, hop_length=hop_length
117	)
118
119	logging.info(f"Framed audio shape: {audio_framed.shape}")
120	logging.info(f"First frame: {audio_framed[0]}")
121
122	# convert to frequency domain
123	# For simplicity we will choose the Hanning window.
124	window = get_window("hann", ftt_size, fftbins=True)
125	audio_windowed = audio_framed * window
126

Callers 1

exampleFunction · 0.85

Calls 7

audio_framesFunction · 0.85

calculate_fftFunction · 0.85

calculate_signal_powerFunction · 0.85

mel_spaced_filterbankFunction · 0.85

discrete_cosine_transformFunction · 0.85

transposeMethod · 0.80

normalizeFunction · 0.70

Tested by

no test coverage detected