Spectrogram¶

Mel Spectrogram¶

melspectrogram.py: Utilities for dealing with mel spectrograms

WARNING: This module has not been thoroughly tested for compatibility with modules and tools in OpenSoundscape.

class opensoundscape.melspectrogram.MelSpectrogram(S, sample_rate, hop_length, fmin, fmax)¶

Immutable spectrogram container

WARNING: This class has not been thoroughly tested for compatibility with modules and tools in OpenSoundscape.

classmethod from_audio(audio, n_fft=1024, n_mels=128, window='flattop', win_length=256, hop_length=32, htk=True, fmin=None, fmax=None)¶

Create a MelSpectrogram object from an Audio object

The kwargs are cherry-picked from:

Parameters:

n_fft – Length of the FFT window [default: 1024]
n_mels – Number of mel bands to generate [default: 128]
window – The windowing function to use [default: “flattop”]
win_length – Each frame of audio is windowed by window. The window will be of length win_length and then padded with zeros to match n_fft [default: 256]
hop_length – Number of samples between successive frames [default: 32]
htk – use HTK formula instead of Slaney [default: True]
fmin – lowest frequency (in Hz) [default: None]
fmax – highest frequency (in Hz). If None, use fmax = sr / 2.0 [default: None]

Returns:

opensoundscape.melspectrogram.MelSpectrogram object

to_image(shape=None, mode='RGB', s_range=(0, 20))¶

Generate PIL Image from MelSpectrogram

Given a range of values for S (e.g. default is minimum 0, maximum 20) generate a PIL image in 3-channel (RGB) or single channel (L) mode. A user can optionally resize the image.

Parameters:	shape – Resize to shape (h, w) [default: None] mode – Mode to write out “RGB” or “L” [default: “RGB”] s_range – The input range of S [default: (0, 20)]
Returns:	PIL.Image

to_pcen(gain=0.8, bias=10.0, power=0.25, time_constant=0.06)¶

Create PCEN from MelSpectrogram

Argument descriptions come from https://librosa.org/doc/latest/generated/librosa.pcen.html?highlight=pcen#librosa-pcen

Parameters:

gain – The gain factor. Typical values should be slightly less than 1 [default: 0.8]
bias – The bias point of the nonlinear compression [default: 10.0]
power – The compression exponent. Typical values should be between 0 and 0.5. Smaller values of power result in stronger compression. At the limit power=0, polynomial compression becomes logarithmic [default: 0.25]
time_constant – The time constant for IIR filtering, measured in seconds [default: 0.06]

Returns:

The per-channel energy normalized version of MelSpectrogram.S

Spectrogram¶

spectrogram.py: Utilities for dealing with spectrograms

class opensoundscape.spectrogram.Spectrogram(spectrogram, frequencies, times, decibel_limits, window_samples=None, overlap_samples=None, window_type=None, audio_sample_rate=None)¶

Immutable spectrogram container

Can be initialized directly from spectrogram, frequency, and time values or created from an Audio object using the .from_audio() method.

frequencies¶: (list) discrete frequency bins genereated by fft

times¶: (list) time from beginning of file to the center of each window

spectrogram¶: a 2d array containing 10*log10(fft) for each time window

decibel_limits¶: minimum and maximum decibel values in .spectrogram

window_samples¶: number of samples per window when spec was created [default: none]

overlap_samples¶: number of samples overlapped in consecutive windows when spec was created [default: none]

window_type¶: window fn used to make spectrogram, eg ‘hann’ [default: none]

audio_sample_rate¶: sample rate of audio from which spec was created [default: none]

amplitude(freq_range=None)¶

create an amplitude vs time signal from spectrogram

by summing pixels in the vertical dimension

Args: freq_range=None: sum Spectrogrm only in this range of [low, high] frequencies in Hz (if None, all frequencies are summed)

Returns:	a time-series array of the vertical sum of spectrogram value

bandpass(min_f, max_f)¶

extract a frequency band from a spectrogram

crops the 2-d array of the spectrograms to the desired frequency range

Parameters:	min_f – low frequency in Hz for bandpass max_f – high frequency in Hz for bandpass
Returns:	bandpassed spectrogram object

duration()¶

calculate the ammount of time represented in the spectrogram

Note: time may be shorter than the duration of the audio from which the spectrogram was created, because the windows may align in a way such that some samples from the end of the original audio were discarded

classmethod from_audio(audio, window_type='hann', window_samples=512, overlap_samples=256, decibel_limits=(-100, -20))¶

create a Spectrogram object from an Audio object

Parameters:	window_type="hann" – see scipy.signal.spectrogram docs for description of window parameter window_samples=512 – number of audio samples per spectrogram window (pixel) overlap_samples=256 – number of samples shared by consecutive windows = (decibel_limits) – limit the dB values to (min,max) (lower values set to min, higher values set to max)
Returns:	opensoundscape.spectrogram.Spectrogram object

classmethod from_file()¶

create a Spectrogram object from a file

Parameters:	file – path of image to load
Returns:	opensoundscape.spectrogram.Spectrogram object

limit_db_range(min_db=-100, max_db=-20)¶

Limit the decibel values of the spectrogram to range from min_db to max_db

values less than min_db are set to min_db values greater than max_db are set to max_db

similar to Audacity’s gain and range parameters

Parameters:	min_db – values lower than this are set to this max_db – values higher than this are set to this
Returns:	Spectrogram object with db range applied

linear_scale(feature_range=(0, 1))¶

Linearly rescale spectrogram values to a range of values using in_range as decibel_limits

Parameters:	feature_range – tuple of (low,high) values for output
Returns:	Spectrogram object with values rescaled to feature_range

min_max_scale(feature_range=(0, 1))¶

Linearly rescale spectrogram values to a range of values using in_range as minimum and maximum

Parameters:	feature_range – tuple of (low,high) values for output
Returns:	Spectrogram object with values rescaled to feature_range

net_amplitude(signal_band, reject_bands=None)¶

create amplitude signal in signal_band and subtract amplitude from reject_bands

rescale the signal and reject bands by dividing by their bandwidths in Hz (amplitude of each reject_band is divided by the total bandwidth of all reject_bands. amplitude of signal_band is divided by badwidth of signal_band. )

Parameters:	signal_band – [low,high] frequency range in Hz (positive contribution) band (reject) – list of [low,high] frequency ranges in Hz (negative contribution)

return: time-series array of net amplitude

plot(inline=True, fname=None, show_colorbar=False)¶

Plot the spectrogram with matplotlib.pyplot

Parameters:	inline=True – fname=None – specify a string path to save the plot to (ending in .png/.pdf) show_colorbar – include image legend colorbar from pyplot

to_image(shape=None, mode='RGB')¶

Create a Pillow Image from spectrogram

Linearly rescales values in the spectrogram from self.decibel_limits to [255,0]

Default of self.decibel_limits on load is [-100, -20], so, e.g., -20 db is loudest -> black, -100 db is quietest -> white

Parameters:	destination – a file path (string) shape=None – tuple of image dimensions, eg (224,224) mode="RGB" – RGB for 3-channel color or “L” for 1-channel grayscale
Returns:	Pillow Image object

trim(start_time, end_time)¶

extract a time segment from a spectrogram

Parameters:	start_time – in seconds end_time – in seconds
Returns:	spectrogram object from extracted time segment

window_length()¶: calculate length of a single fft window, in seconds:

window_start_times()¶: get start times of each window, rather than midpoint times

window_step()¶: calculate time difference (sec) between consecutive windows’ centers