Spectrogram

Mel Spectrogram

melspectrogram.py: Utilities for dealing with mel spectrograms

class opensoundscape.melspectrogram.MelSpectrogram(S, sample_rate, hop_length, fmin, fmax)

Immutable spectrogram container

classmethod from_audio(audio, n_fft=1024, n_mels=128, window='flattop', win_length=256, hop_length=32, htk=True, fmin=None, fmax=None)

Create a MelSpectrogram object from an Audio object

The kwargs are cherry-picked from:

Parameters:
  • n_fft – Length of the FFT window [default: 1024]
  • n_mels – Number of mel bands to generate [default: 128]
  • window – The windowing function to use [default: “flattop”]
  • win_length – Each frame of audio is windowed by window. The window will be of length win_length and then padded with zeros to match n_fft [default: 256]
  • hop_length – Number of samples between successive frames [default: 32]
  • htk – use HTK formula instead of Slaney [default: True]
  • fmin – lowest frequency (in Hz) [default: None]
  • fmax – highest frequency (in Hz). If None, use fmax = sr / 2.0 [default: None]
Returns:

opensoundscape.melspectrogram.MelSpectrogram object

to_image(shape=None, mode='RGB', s_range=(0, 20))

Generate PIL Image from MelSpectrogram

Given a range of values for S (e.g. default is minimum 0, maximum 20) generate a PIL image in 3-channel (RGB) or single channel (L) mode. A user can optionally resize the image.

Parameters:
  • shape – Resize to shape (h, w) [default: None]
  • mode – Mode to write out “RGB” or “L” [default: “RGB”]
  • s_range – The input range of S [default: (0, 20)]
Returns:

PIL.Image

to_pcen(gain=0.8, bias=10.0, power=0.25, time_constant=0.06)

Create PCEN from MelSpectrogram

Argument descriptions come from https://librosa.org/doc/latest/generated/librosa.pcen.html?highlight=pcen#librosa-pcen

Parameters:
  • gain – The gain factor. Typical values should be slightly less than 1 [default: 0.8]
  • bias – The bias point of the nonlinear compression [default: 10.0]
  • power – The compression exponent. Typical values should be between 0 and 0.5. Smaller values of power result in stronger compression. At the limit power=0, polynomial compression becomes logarithmic [default: 0.25]
  • time_constant – The time constant for IIR filtering, measured in seconds [default: 0.06]
Returns:

The per-channel energy normalized version of MelSpectrogram.S

Spectrogram

spectrogram.py: Utilities for dealing with spectrograms

class opensoundscape.spectrogram.Spectrogram(spectrogram, frequencies, times)

Immutable spectrogram container

amplitude(freq_range=None)

create an amplitude vs time signal from spectrogram

by summing pixels in the vertical dimension

Args
freq_range=None: sum Spectrogrm only in this range of [low, high] frequencies in Hz (if None, all frequencies are summed)
Returns:a time-series array of the vertical sum of spectrogram value
bandpass(min_f, max_f)

extract a frequency band from a spectrogram

crops the 2-d array of the spectrograms to the desired frequency range

Parameters:
  • min_f – low frequency in Hz for bandpass
  • high_f – high frequency in Hz for bandpass
Returns:

bandpassed spectrogram object

classmethod from_audio(audio, window_type='hann', window_samples=512, overlap_samples=256, decibel_limits=(-100, -20))

create a Spectrogram object from an Audio object

Parameters:
  • window_type="hann" – see scipy.signal.spectrogram docs for description of window parameter
  • window_samples=512 – number of audio samples per spectrogram window (pixel)
  • overlap_samples=256 – number of samples shared by consecutive windows
  • = (decibel_limits) – limit the dB values to (min,max) (lower values set to min, higher values set to max)
Returns:

opensoundscape.spectrogram.Spectrogram object

classmethod from_file()

create a Spectrogram object from a file

Parameters:file – path of image to load
Returns:opensoundscape.spectrogram.Spectrogram object
limit_db_range(min_db=-100, max_db=-20)

Limit the decibel values of the spectrogram to range from min_db to max_db

values less than min_db are set to min_db values greater than max_db are set to max_db

similar to Audacity’s gain and range parameters

Parameters:
  • min_db – values lower than this are set to this
  • max_db – values higher than this are set to this
Returns:

Spectrogram object with db range applied

linear_scale(feature_range=(0, 1))

Linearly rescale spectrogram values to a range of values using in_range as decibel_limits

Parameters:feature_range – tuple of (low,high) values for output
Returns:Spectrogram object with values rescaled to feature_range
min_max_scale(feature_range=(0, 1))

Linearly rescale spectrogram values to a range of values using in_range as minimum and maximum

Parameters:feature_range – tuple of (low,high) values for output
Returns:Spectrogram object with values rescaled to feature_range
net_amplitude(signal_band, reject_bands=None)

create amplitude signal in signal_band and subtract amplitude from reject_bands

rescale the signal and reject bands by dividing by their bandwidths in Hz (amplitude of each reject_band is divided by the total bandwidth of all reject_bands. amplitude of signal_band is divided by badwidth of signal_band. )

Parameters:
  • signal_band – [low,high] frequency range in Hz (positive contribution)
  • band (reject) – list of [low,high] frequency ranges in Hz (negative contribution)

return: time-series array of net amplitude

plot(inline=True, fname=None, show_colorbar=False)

Plot the spectrogram with matplotlib.pyplot

Parameters:
  • inline=True
  • fname=None – specify a string path to save the plot to (ending in .png/.pdf)
  • show_colorbar – include image legend colorbar from pyplot
to_image(shape=None, mode='RGB', spec_range=[-100, -20])

create a Pillow Image from spectrogram linearly rescales values from db_range (default [-100, -20]) to [255,0] (ie, -20 db is loudest -> black, -100 db is quietest -> white)

Parameters:
  • destination – a file path (string)
  • shape=None – tuple of image dimensions, eg (224,224)
  • mode="RGB" – RGB for 3-channel color or “L” for 1-channel grayscale
  • spec_range=[-100,-20] – the lowest and highest possible values in the spectrogram
Returns:

Pillow Image object

trim(start_time, end_time)

extract a time segment from a spectrogram

Parameters:
  • start_time – in seconds
  • end_time – in seconds
Returns:

spectrogram object from extracted time segment