Spectrogram¶
Mel Spectrogram¶
melspectrogram.py: Utilities for dealing with mel spectrograms
WARNING: This module has not been thoroughly tested for compatibility with modules and tools in OpenSoundscape.
-
class
opensoundscape.melspectrogram.
MelSpectrogram
(S, sample_rate, hop_length, fmin, fmax)¶ Immutable spectrogram container
WARNING: This class has not been thoroughly tested for compatibility with modules and tools in OpenSoundscape.
-
classmethod
from_audio
(audio, n_fft=1024, n_mels=128, window='flattop', win_length=256, hop_length=32, htk=True, fmin=None, fmax=None)¶ Create a MelSpectrogram object from an Audio object
The kwargs are cherry-picked from:
- https://librosa.org/doc/latest/generated/librosa.feature.melspectrogram.html#librosa.feature.melspectrogram
- https://librosa.org/doc/latest/generated/librosa.filters.mel.html?librosa.filters.mel
Parameters: - n_fft – Length of the FFT window [default: 1024]
- n_mels – Number of mel bands to generate [default: 128]
- window – The windowing function to use [default: “flattop”]
- win_length – Each frame of audio is windowed by window. The window will be of length win_length and then padded with zeros to match n_fft [default: 256]
- hop_length – Number of samples between successive frames [default: 32]
- htk – use HTK formula instead of Slaney [default: True]
- fmin – lowest frequency (in Hz) [default: None]
- fmax – highest frequency (in Hz). If None, use fmax = sr / 2.0 [default: None]
Returns: opensoundscape.melspectrogram.MelSpectrogram object
-
to_image
(shape=None, mode='RGB', s_range=(0, 20))¶ Generate PIL Image from MelSpectrogram
Given a range of values for S (e.g. default is minimum 0, maximum 20) generate a PIL image in 3-channel (RGB) or single channel (L) mode. A user can optionally resize the image.
Parameters: - shape – Resize to shape (h, w) [default: None]
- mode – Mode to write out “RGB” or “L” [default: “RGB”]
- s_range – The input range of S [default: (0, 20)]
Returns: PIL.Image
-
to_pcen
(gain=0.8, bias=10.0, power=0.25, time_constant=0.06)¶ Create PCEN from MelSpectrogram
Argument descriptions come from https://librosa.org/doc/latest/generated/librosa.pcen.html?highlight=pcen#librosa-pcen
Parameters: - gain – The gain factor. Typical values should be slightly less than 1 [default: 0.8]
- bias – The bias point of the nonlinear compression [default: 10.0]
- power – The compression exponent. Typical values should be between 0 and 0.5. Smaller values of power result in stronger compression. At the limit power=0, polynomial compression becomes logarithmic [default: 0.25]
- time_constant – The time constant for IIR filtering, measured in seconds [default: 0.06]
Returns: The per-channel energy normalized version of MelSpectrogram.S
-
classmethod
Spectrogram¶
spectrogram.py: Utilities for dealing with spectrograms
-
class
opensoundscape.spectrogram.
Spectrogram
(spectrogram, frequencies, times, decibel_limits, window_samples=None, overlap_samples=None, window_type=None, audio_sample_rate=None)¶ Immutable spectrogram container
Can be initialized directly from spectrogram, frequency, and time values or created from an Audio object using the .from_audio() method.
-
frequencies
¶ (list) discrete frequency bins genereated by fft
-
times
¶ (list) time from beginning of file to the center of each window
-
spectrogram
¶ a 2d array containing 10*log10(fft) for each time window
-
decibel_limits
¶ minimum and maximum decibel values in .spectrogram
-
window_samples
¶ number of samples per window when spec was created [default: none]
-
overlap_samples
¶ number of samples overlapped in consecutive windows when spec was created [default: none]
-
window_type
¶ window fn used to make spectrogram, eg ‘hann’ [default: none]
-
audio_sample_rate
¶ sample rate of audio from which spec was created [default: none]
-
amplitude
(freq_range=None)¶ create an amplitude vs time signal from spectrogram
by summing pixels in the vertical dimension
- Args
- freq_range=None: sum Spectrogrm only in this range of [low, high] frequencies in Hz (if None, all frequencies are summed)
Returns: a time-series array of the vertical sum of spectrogram value
-
bandpass
(min_f, max_f)¶ extract a frequency band from a spectrogram
crops the 2-d array of the spectrograms to the desired frequency range
Parameters: - min_f – low frequency in Hz for bandpass
- max_f – high frequency in Hz for bandpass
Returns: bandpassed spectrogram object
-
duration
()¶ calculate the ammount of time represented in the spectrogram
Note: time may be shorter than the duration of the audio from which the spectrogram was created, because the windows may align in a way such that some samples from the end of the original audio were discarded
-
classmethod
from_audio
(audio, window_type='hann', window_samples=512, overlap_samples=256, decibel_limits=(-100, -20))¶ create a Spectrogram object from an Audio object
Parameters: - window_type="hann" – see scipy.signal.spectrogram docs for description of window parameter
- window_samples=512 – number of audio samples per spectrogram window (pixel)
- overlap_samples=256 – number of samples shared by consecutive windows
- = (decibel_limits) – limit the dB values to (min,max) (lower values set to min, higher values set to max)
Returns: opensoundscape.spectrogram.Spectrogram object
-
classmethod
from_file
()¶ create a Spectrogram object from a file
Parameters: file – path of image to load Returns: opensoundscape.spectrogram.Spectrogram object
-
limit_db_range
(min_db=-100, max_db=-20)¶ Limit the decibel values of the spectrogram to range from min_db to max_db
values less than min_db are set to min_db values greater than max_db are set to max_db
similar to Audacity’s gain and range parameters
Parameters: - min_db – values lower than this are set to this
- max_db – values higher than this are set to this
Returns: Spectrogram object with db range applied
-
linear_scale
(feature_range=(0, 1))¶ Linearly rescale spectrogram values to a range of values using in_range as decibel_limits
Parameters: feature_range – tuple of (low,high) values for output Returns: Spectrogram object with values rescaled to feature_range
-
min_max_scale
(feature_range=(0, 1))¶ Linearly rescale spectrogram values to a range of values using in_range as minimum and maximum
Parameters: feature_range – tuple of (low,high) values for output Returns: Spectrogram object with values rescaled to feature_range
-
net_amplitude
(signal_band, reject_bands=None)¶ create amplitude signal in signal_band and subtract amplitude from reject_bands
rescale the signal and reject bands by dividing by their bandwidths in Hz (amplitude of each reject_band is divided by the total bandwidth of all reject_bands. amplitude of signal_band is divided by badwidth of signal_band. )
Parameters: - signal_band – [low,high] frequency range in Hz (positive contribution)
- band (reject) – list of [low,high] frequency ranges in Hz (negative contribution)
return: time-series array of net amplitude
-
plot
(inline=True, fname=None, show_colorbar=False)¶ Plot the spectrogram with matplotlib.pyplot
Parameters: - inline=True –
- fname=None – specify a string path to save the plot to (ending in .png/.pdf)
- show_colorbar – include image legend colorbar from pyplot
-
to_image
(shape=None, mode='RGB')¶ Create a Pillow Image from spectrogram
Linearly rescales values in the spectrogram from self.decibel_limits to [255,0]
Default of self.decibel_limits on load is [-100, -20], so, e.g., -20 db is loudest -> black, -100 db is quietest -> white
Parameters: - destination – a file path (string)
- shape=None – tuple of image dimensions, eg (224,224)
- mode="RGB" – RGB for 3-channel color or “L” for 1-channel grayscale
Returns: Pillow Image object
-
trim
(start_time, end_time)¶ extract a time segment from a spectrogram
Parameters: - start_time – in seconds
- end_time – in seconds
Returns: spectrogram object from extracted time segment
-
window_length
()¶ calculate length of a single fft window, in seconds:
-
window_start_times
()¶ get start times of each window, rather than midpoint times
-
window_step
()¶ calculate time difference (sec) between consecutive windows’ centers
-