Mel Spectrogram

melspectrogram.py: Utilities for dealing with mel spectrograms

WARNING: This module has not been thoroughly tested for compatibility with modules and tools in OpenSoundscape.

class opensoundscape.melspectrogram.MelSpectrogram(S, sample_rate, hop_length, fmin, fmax)

Immutable spectrogram container

WARNING: This class has not been thoroughly tested for compatibility with modules and tools in OpenSoundscape.

classmethod from_audio(audio, n_fft=1024, n_mels=128, window='flattop', win_length=256, hop_length=32, htk=True, fmin=None, fmax=None)

Create a MelSpectrogram object from an Audio object

The kwargs are cherry-picked from:

  • n_fft – Length of the FFT window [default: 1024]
  • n_mels – Number of mel bands to generate [default: 128]
  • window – The windowing function to use [default: “flattop”]
  • win_length – Each frame of audio is windowed by window. The window will be of length win_length and then padded with zeros to match n_fft [default: 256]
  • hop_length – Number of samples between successive frames [default: 32]
  • htk – use HTK formula instead of Slaney [default: True]
  • fmin – lowest frequency (in Hz) [default: None]
  • fmax – highest frequency (in Hz). If None, use fmax = sr / 2.0 [default: None]

opensoundscape.melspectrogram.MelSpectrogram object

to_image(shape=None, mode='RGB', s_range=(0, 20))

Generate PIL Image from MelSpectrogram

Given a range of values for S (e.g. default is minimum 0, maximum 20) generate a PIL image in 3-channel (RGB) or single channel (L) mode. A user can optionally resize the image.

  • shape – Resize to shape (h, w) [default: None]
  • mode – Mode to write out “RGB” or “L” [default: “RGB”]
  • s_range – The input range of S [default: (0, 20)]


to_pcen(gain=0.8, bias=10.0, power=0.25, time_constant=0.06)

Create PCEN from MelSpectrogram

Argument descriptions come from https://librosa.org/doc/latest/generated/librosa.pcen.html?highlight=pcen#librosa-pcen

  • gain – The gain factor. Typical values should be slightly less than 1 [default: 0.8]
  • bias – The bias point of the nonlinear compression [default: 10.0]
  • power – The compression exponent. Typical values should be between 0 and 0.5. Smaller values of power result in stronger compression. At the limit power=0, polynomial compression becomes logarithmic [default: 0.25]
  • time_constant – The time constant for IIR filtering, measured in seconds [default: 0.06]

The per-channel energy normalized version of MelSpectrogram.S


spectrogram.py: Utilities for dealing with spectrograms

class opensoundscape.spectrogram.Spectrogram(spectrogram, frequencies, times, decibel_limits, window_samples=None, overlap_samples=None, window_type=None, audio_sample_rate=None)

Immutable spectrogram container

Can be initialized directly from spectrogram, frequency, and time values or created from an Audio object using the .from_audio() method.


(list) discrete frequency bins genereated by fft


(list) time from beginning of file to the center of each window


a 2d array containing 10*log10(fft) for each time window


minimum and maximum decibel values in .spectrogram


number of samples per window when spec was created [default: none]


number of samples overlapped in consecutive windows when spec was created [default: none]


window fn used to make spectrogram, eg ‘hann’ [default: none]


sample rate of audio from which spec was created [default: none]


create an amplitude vs time signal from spectrogram

by summing pixels in the vertical dimension

freq_range=None: sum Spectrogrm only in this range of [low, high] frequencies in Hz (if None, all frequencies are summed)
Returns:a time-series array of the vertical sum of spectrogram value
bandpass(min_f, max_f)

extract a frequency band from a spectrogram

crops the 2-d array of the spectrograms to the desired frequency range

  • min_f – low frequency in Hz for bandpass
  • max_f – high frequency in Hz for bandpass

bandpassed spectrogram object


calculate the ammount of time represented in the spectrogram

Note: time may be shorter than the duration of the audio from which the spectrogram was created, because the windows may align in a way such that some samples from the end of the original audio were discarded

classmethod from_audio(audio, window_type='hann', window_samples=512, overlap_samples=256, decibel_limits=(-100, -20))

create a Spectrogram object from an Audio object

  • window_type="hann" – see scipy.signal.spectrogram docs for description of window parameter
  • window_samples=512 – number of audio samples per spectrogram window (pixel)
  • overlap_samples=256 – number of samples shared by consecutive windows
  • = (decibel_limits) – limit the dB values to (min,max) (lower values set to min, higher values set to max)

opensoundscape.spectrogram.Spectrogram object

classmethod from_file()

create a Spectrogram object from a file

Parameters:file – path of image to load
Returns:opensoundscape.spectrogram.Spectrogram object
limit_db_range(min_db=-100, max_db=-20)

Limit the decibel values of the spectrogram to range from min_db to max_db

values less than min_db are set to min_db values greater than max_db are set to max_db

similar to Audacity’s gain and range parameters

  • min_db – values lower than this are set to this
  • max_db – values higher than this are set to this

Spectrogram object with db range applied

linear_scale(feature_range=(0, 1))

Linearly rescale spectrogram values to a range of values using in_range as decibel_limits

Parameters:feature_range – tuple of (low,high) values for output
Returns:Spectrogram object with values rescaled to feature_range
min_max_scale(feature_range=(0, 1))

Linearly rescale spectrogram values to a range of values using in_range as minimum and maximum

Parameters:feature_range – tuple of (low,high) values for output
Returns:Spectrogram object with values rescaled to feature_range
net_amplitude(signal_band, reject_bands=None)

create amplitude signal in signal_band and subtract amplitude from reject_bands

rescale the signal and reject bands by dividing by their bandwidths in Hz (amplitude of each reject_band is divided by the total bandwidth of all reject_bands. amplitude of signal_band is divided by badwidth of signal_band. )

  • signal_band – [low,high] frequency range in Hz (positive contribution)
  • band (reject) – list of [low,high] frequency ranges in Hz (negative contribution)

return: time-series array of net amplitude

plot(inline=True, fname=None, show_colorbar=False)

Plot the spectrogram with matplotlib.pyplot

  • inline=True
  • fname=None – specify a string path to save the plot to (ending in .png/.pdf)
  • show_colorbar – include image legend colorbar from pyplot
to_image(shape=None, mode='RGB')

Create a Pillow Image from spectrogram

Linearly rescales values in the spectrogram from self.decibel_limits to [255,0]

Default of self.decibel_limits on load is [-100, -20], so, e.g., -20 db is loudest -> black, -100 db is quietest -> white

  • destination – a file path (string)
  • shape=None – tuple of image dimensions, eg (224,224)
  • mode="RGB" – RGB for 3-channel color or “L” for 1-channel grayscale

Pillow Image object

trim(start_time, end_time)

extract a time segment from a spectrogram

  • start_time – in seconds
  • end_time – in seconds

spectrogram object from extracted time segment


calculate length of a single fft window, in seconds:


get start times of each window, rather than midpoint times


calculate time difference (sec) between consecutive windows’ centers