Audio

audio.py: Utilities for loading and modifying Audio objects

Note: Out-of-place operations

Functions that modify Audio (and Spectrogram) objects are “out of place”, meaning that they return a new Audio object instead of modifying the original object. This means that running a line ` audio_object.resample(22050) # WRONG! ` will not change the sample rate of audio_object! If your goal was to overwrite audio_object with the new, resampled audio, you would instead write ` audio_object = audio_object.resample(22050) `

class opensoundscape.audio.Audio(samples, sample_rate, resample_type='soxr_hq', metadata=None)[source]

Container for audio samples

Initialization requires sample array. To load audio file, use Audio.from_file()

Initializing an Audio object directly requires the specification of the sample rate. Use Audio.from_file or Audio.from_bytesio with sample_rate=None to use a native sampling rate.

Parameters:
  • samples (np.array) – The audio samples

  • sample_rate (integer) – The sampling rate for the audio samples

  • resample_type (str) – The resampling method to use [default: “soxr_hq”]

Returns:

An initialized Audio object

apply_gain(dB, clip_range=(-1, 1))[source]

apply dB (decibels) of gain to audio signal

Specifically, multiplies samples by 10^(dB/20)

Parameters:
  • dB – decibels of gain to apply

  • clip_range – [minimum,maximum] values for samples - values outside this range will be replaced with the range boundary values. Pass None to preserve original sample values without clipping. [Default: [-1,1]]

Returns:

Audio object with gain applied to samples

bandpass(low_f, high_f, order)[source]

Bandpass audio signal with a butterworth filter

Uses a phase-preserving algorithm (scipy.signal’s butter and solfiltfilt)

Parameters:
  • low_f – low frequency cutoff (-3 dB) in Hz of bandpass filter

  • high_f – high frequency cutoff (-3 dB) in Hz of bandpass filter

  • order – butterworth filter order (integer) ~= steepness of cutoff

property dBFS

calculate the root-mean-square dB value relative to a full-scale sine wave

property duration

Calculates the Audio duration in seconds

extend_by(duration)[source]

Extend audio file by adding duration seconds of silence to the end

Parameters:

duration – the final duration in seconds of the audio object

Returns:

a new Audio object with silence added to the end

extend_to(duration)[source]

Extend audio file to desired duration by adding silence to the end

If duration is less than the Audio’s .duration, the Audio object is trimmed. Otherwise, silence is added to the end of the Audio object to achieve the desired duration.

Parameters:

duration – the final duration in seconds of the audio object

Returns:

a new Audio object of the desired duration

classmethod from_bytesio(bytesio, sample_rate=None, resample_type='soxr_hq')[source]

Read from bytesio object

Read an Audio object from a BytesIO object. This is primarily used for passing Audio over HTTP.

Parameters:
  • bytesio – Contents of WAV file as BytesIO

  • sample_rate – The final sampling rate of Audio object [default: None]

  • resample_type – The librosa method to do resampling [default: “soxr_hq”]

Returns:

An initialized Audio object

classmethod from_file(path, sample_rate=None, resample_type='soxr_hq', dtype=numpy.float32, load_metadata=True, offset=None, duration=None, start_timestamp=None, out_of_bounds_mode='warn')[source]

Load audio from files

Deal with the various possible input types to load an audio file Also attempts to load metadata using tinytag.

Audio objects only support mono (one-channel) at this time. Files with multiple channels are mixed down to a single channel. To load multiple channels as separate Audio objects, use load_channels_as_audio()

Optionally, load only a piece of a file using offset and duration. This will efficiently read sections of a .wav file regardless of where the desired clip is in the audio. For mp3 files, access time grows linearly with time since the beginning of the file.

This function relies on librosa.load(), which supports wav natively but requires ffmpeg for mp3 support.

Parameters:
  • path (str, Path) – path to an audio file

  • sample_rate (int, None) – resample audio with value and resample_type, if None use source sample_rate (default: None)

  • resample_type – method used to resample_type (default: “soxr_hq”)

  • dtype – data type of samples returned [Default: np.float32]

  • load_metadata (bool) – if True, attempts to load metadata from the audio file. If an exception occurs, self.metadata will be None. Otherwise self.metadata is a dictionary. Note: will also attempt to parse AudioMoth metadata from the comment field, if the artist field includes AudioMoth. The parsing function for AudioMoth is likely to break when new firmware versions change the comment metadata field.

  • offset – load audio starting at this time (seconds) after the start of the file. Defaults to 0 seconds. - cannot specify both offset and start_timestamp

  • duration – load audio of this duration (seconds) starting at offset. If None, loads all the way to the end of the file.

  • start_timestamp

    load audio starting at this localized datetime.datetime timestamp - cannot specify both offset and start_timestamp - will only work if loading metadata results in localized datetime

    object for ‘recording_start_time’ key

    • will raise AudioOutOfBoundsError if requested time period

    is not full contained within the audio file Example of creating localized timestamp: ` import pytz; from datetime import datetime; local_timestamp = datetime(2020,12,25,23,59,59) local_timezone = pytz.timezone('US/Eastern') timestamp = local_timezone.localize(local_timestamp) `

  • out_of_bounds_mode

    • ‘warn’: generate a warning [default]

    • ’raise’: raise an AudioOutOfBoundsError

    • ’ignore’: return any available audio with no warning/error

Returns:

samples, sample_rate, resample_type, metadata (dict or None)

Return type:

Audio object with attributes

Note: default sample_rate=None means use file’s sample rate, does not resample

classmethod from_url(url, sample_rate=None, resample_type='kaiser_fast')[source]

Read audio file from URL

Download audio from a URL and create an Audio object

Note: averages channels of multi-channel object to create mono object

Parameters:
  • url – Location to download the file from

  • sample_rate – The final sampling rate of Audio object [default: None] - if None, retains original sample rate

  • resample_type – The librosa method to do resampling [default: “kaiser_fast”]

Returns:

Audio object

highpass(cutoff_f, order)[source]

High-pass audio signal with a butterworth filter

Uses a phase-preserving algorithm (scipy.signal’s butter and solfiltfilt)

Removes low frequencies below cutoff_f and preserves high frequencies

Parameters:
  • cutoff_f – cutoff frequency (-3 dB) in Hz of high-pass filter

  • order – butterworth filter order (integer) ~= steepness of cutoff

loop(length=None, n=None)[source]

Extend audio file by looping it

Parameters:
  • length – the final length in seconds of the looped file (cannot be used with n)[default: None]

  • n – the number of occurences of the original audio sample (cannot be used with length) [default: None] For example, n=1 returns the original sample, and n=2 returns two concatenated copies of the original sample

Returns:

a new Audio object of the desired length or repetitions

lowpass(cutoff_f, order)[source]

Low-pass audio signal with a butterworth filter

Uses a phase-preserving algorithm (scipy.signal’s butter and solfiltfilt)

Removes high frequencies above cuttof_f and preserves low frequencies

Parameters:
  • cutoff_f – cutoff frequency (-3 dB) in Hz of lowpass filter

  • order – butterworth filter order (integer) ~= steepness of cutoff

classmethod noise(duration, sample_rate, color='white', dBFS=-10)[source]

“Create audio object with noise of a desired ‘color’

set np.random.seed() for reproducible results

Based on an implementatino by @Bob in StackOverflow question 67085963

Parameters:
  • duration – length in seconds

  • sample_rate – samples per second

  • color – any of these colors, which describe the shape of the power spectral density: - white: uniform psd (equal energy per linear frequency band) - pink: psd = 1/sqrt(f) (equal energy per octave) - brownian: psd = 1/f (aka brown noise) - brown: synonym for brownian - violet: psd = f - blue: psd = sqrt(f)

  • [default – ‘white’]

Returns: Audio object

Note: Clips samples to [-1,1] which can result in dBFS different from that requested, especially when dBFS is near zero

normalize(peak_level=None, peak_dBFS=None)[source]

Return audio object with normalized waveform

Linearly scales waveform values so that the max absolute value matches the specified value (default: 1.0)

Parameters:
  • peak_level – maximum absolute value of resulting waveform

  • peak_dBFS – maximum resulting absolute value in decibels Full Scale - for example, -3 dBFS equals a peak level of 0.71 - Note: do not specify both peak_level and peak_dBFS

Returns:

Audio object with normalized samples

Note: if all samples are zero, returns the original object (avoids division by zero)

resample(sample_rate, resample_type=None)[source]

Resample Audio object

Parameters:
  • sample_rate (scalar) – the new sample rate

  • resample_type (str) – resampling algorithm to use [default: None (uses self.resample_type of instance)]

Returns:

a new Audio object of the desired sample rate

property rms

Calculates the root-mean-square value of the audio samples

save(path, metadata_format='opso', soundfile_subtype=None, soundfile_format=None, suppress_warnings=False)[source]

Save Audio to file

supports all file formats supported by underlying package soundfile, including WAV, MP3, and others

NOTE: saving metadata is only supported for WAV and AIFF formats

Supports writing the following metadata fields: [“title”,”copyright”,”software”,”artist”,”comment”,”date”, “album”,”license”,”tracknumber”,”genre”]

Parameters:
  • path – destination for output

  • metadata_format

    strategy for saving metadata. Can be: - ‘opso’ [Default]: Saves metadata dictionary in the comment

    field as a JSON string. Uses the most recent version of opso_metadata formats.

    • ’opso_metadata_v0.1’: specify the exact version of opso_metadata to use

    • ’soundfile’: Saves the default soundfile metadata fields only:
      [“title”,”copyright”,”software”,”artist”,”comment”,”date”,

      ”album”,”license”,”tracknumber”,”genre”]

    • None: does not save metadata to file

  • soundfile_subtype – soundfile audio subtype choice, see soundfile.write or list options with soundfile.available_subtypes()

  • soundfile_format – soundfile audio format choice, see soundfile.write

  • suppress_warnings – if True, will not warn user when unable to save metadata [default: False]

show_widget(normalize=False, autoplay=False)[source]

create and display IPython.display.Audio widget; see that class for docs

classmethod silence(duration, sample_rate)[source]

“Create audio object with zero-valued samples

Parameters:
  • duration – length in seconds

  • sample_rate – samples per second

Note: rounds down to integer number of samples

spectrum()[source]

Create frequency spectrum from an Audio object using fft

Parameters:

self

Returns:

fft, frequencies

split(clip_duration, clip_overlap=0, final_clip=None)[source]

Split Audio into even-lengthed clips

The Audio object is split into clips of a specified duration and overlap

Parameters:
  • clip_duration (float) – The duration in seconds of the clips

  • clip_overlap (float) – The overlap of the clips in seconds [default: 0]

  • final_clip (str) –

    Behavior if final_clip is less than clip_duration seconds long. By default, discards remaining audio if less than clip_duration seconds long [default: None]. Options: - None: Discard the remainder (do not make a clip) - “extend”: Extend the final clip with silence to reach

    clip_duration length

    • ”remainder”: Use only remainder of Audio (final clip will be

      shorter than clip_duration)

    • ”full”: Increase overlap with previous clip to yield a clip with

      clip_duration length

Returns:

list of audio objects - dataframe w/columns for start_time and end_time of each clip

Return type:

  • audio_clips

split_and_save(destination, prefix, clip_duration, clip_overlap=0, final_clip=None, dry_run=False)[source]

Split audio into clips and save them to a folder

Parameters:
  • destination – A folder to write clips to

  • prefix – A name to prepend to the written clips

  • clip_duration – The duration of each clip in seconds

  • clip_overlap – The overlap of each clip in seconds [default: 0]

  • final_clip (str) – Behavior if final_clip is less than clip_duration seconds long.

  • [default

    None] By default, ignores final clip entirely. Possible options (any other input will ignore the final clip entirely),

    • ”remainder”: Include the remainder of the Audio (clip will not have clip_duration length)

    • ”full”: Increase the overlap to yield a clip with clip_duration length

    • ”extend”: Similar to remainder but extend (repeat) the clip to reach clip_duration length

    • None: Discard the remainder

  • dry_run (bool) – If True, skip writing audio and just return clip DataFrame [default: False]

Returns:

pandas.DataFrame containing paths and start and end times for each clip

trim(start_time, end_time)[source]

Trim Audio object in time

If start_time is less than zero, output starts from time 0 If end_time is beyond the end of the sample, trims to end of sample

Parameters:
  • start_time – time in seconds for start of extracted clip

  • end_time – time in seconds for end of extracted clip

Returns:

a new Audio object containing samples from start_time to end_time - metadata is updated to reflect new start time and duration

see also: trim_samples() to trim using sample positions instead of times

trim_samples(start_sample, end_sample)[source]

Trim Audio object by sample indices

resulting sample array contains self.samples[start_sample:end_sample]

If start_sample is less than zero, output starts from sample 0 If end_sample is beyond the end of the sample, trims to end of sample

Parameters:
  • start_sample – sample index for start of extracted clip, inclusive

  • end_sample – sample index for end of extracted clip, exlusive

Returns:

a new Audio object containing samples from start_sample to end_sample - metadata is updated to reflect new start time and duration

see also: trim() to trim using time in seconds instead of sample positions

exception opensoundscape.audio.AudioOutOfBoundsError[source]

Custom exception indicating the user tried to load audio outside of the time period that exists in the audio object

exception opensoundscape.audio.OpsoLoadAudioInputError[source]

Custom exception indicating we can’t load input

opensoundscape.audio.bandpass_filter(signal, low_f, high_f, sample_rate, order=9)[source]

perform a butterworth bandpass filter on a discrete time signal using scipy.signal’s butter and sosfiltfilt (phase-preserving filtering)

Parameters:
  • signal – discrete time signal (audio samples, list of float)

  • low_f – -3db point for highpass filter (Hz)

  • high_f – -3db point for highpass filter (Hz)

  • sample_rate – samples per second (Hz)

  • order – higher values -> steeper dropoff [default: 9]

Returns:

filtered time signal

opensoundscape.audio.clipping_detector(samples, threshold=0.6)[source]

count the number of samples above a threshold value

Parameters:
  • samples – a time series of float values

  • threshold=0.6 – minimum value of sample to count as clipping

Returns:

number of samples exceeding threshold

opensoundscape.audio.concat(audio_objects, sample_rate=None)[source]

concatenate a list of Audio objects end-to-end

Parameters:
  • audio_objects – iterable of Audio objects

  • sample_rate – target sampling rate - if None, uses sampling rate of _first_ Audio object in list - default: None

Returns: a single Audio object

Notes: discards metadata and retains .resample_type of _first_ audio object

opensoundscape.audio.estimate_delay(primary_audio, reference_audio, max_delay, bandpass_range=None, bandpass_order=9, cc_filter='phat', return_cc_max=False, skip_ref_bandpass=False)[source]

Use generalized cross correlation to estimate time delay between 2 audio objects containing the same signal. The audio objects must be time-synchronized. For example, if audio is delayed by 1 second compared to reference_audio, then estimate_delay(audio, reference_audio, max_delay) will return 1.0.

NOTE: Only the central portion of the signal (between start + max_delay and end - max_delay) is used for cross-correlation. This is to avoid edge effects. This means estimate_delay(primary_audio, reference_audio, max_delay) is not necessarily == estimate_delay(reference_audio, primary_audio, max_delay

Parameters:
  • primary_audio – audio object containing the signal of interest

  • reference_audio – audio object containing the reference signal.

  • max_delay – maximum time delay to consider, in seconds. Must be less than the duration of the primary audio. (see opensoundscape.signal_processing.tdoa)

  • bandpass_range – if None, no bandpass filter is performed otherwise [low_f,high_f]

  • bandpass_order – order of Butterworth bandpass filter

  • cc_filter – generalized cross correlation type, see opensoundscape.signal_processing.gcc() [default: ‘phat’]

  • return_cc_max – if True, returns cross correlation max value as second argument (see opensoundscape.signal_processing.tdoa)

  • skip_ref_bandpass – [default: False] if True, skip the bandpass operation for the reference_audio object, only apply it to audio

Returns:

estimated time delay (seconds) from reference_audio to audio

if return_cc_max is True, also returns a second value, the max of the cross correlation of the two signals

Note: resamples reference_audio if its sample rate does not match audio

opensoundscape.audio.generate_opso_metadata_str(metadata_dictionary, version='v0.1')[source]

generate json string for comment field containing metadata

Preserve Audio.metadata dictionary by dumping to a json string and including it as the ‘comment’ field when saving WAV files.

The string begins with opso_metadata The contents of the string after this 13 character prefix should be parsable as JSON, and should have a key opso_metadata_version specifying the version of the metadata format, for instance ‘v0.1’.

See also: parse_opso_metadata which parses the string created by this fundtion

Parameters:
  • metadata_dictionary – dictionary of audio metadata. Should conform to opso_metadata version. v0.1 should have only strings and floats except the “recording_start_time” key, which should contain a localized (ie has timezone) datetime.datetime object. The datetime is saved as a string in ISO format using datetime.isoformat() and loaded with datetime.fromisoformat().

  • version – version number of opso_metadata format. Currently implemented: [‘v0.1’]

Returns:

string beginning with opso_metadata followed by JSON-parseable string containing the metadata.

opensoundscape.audio.highpass_filter(signal, cutoff_f, sample_rate, order=9)[source]

perform a butterworth highpass filter on a discrete time signal using scipy.signal’s butter and sosfiltfilt (phase-preserving filtering)

Parameters:
  • signal – discrete time signal (audio samples, list of float)

  • cutoff_f – -3db point for highpass filter (Hz)

  • sample_rate – samples per second (Hz)

  • order – higher values -> steeper dropoff [default: 9]

Returns:

filtered time signal

opensoundscape.audio.load_channels_as_audio(path, sample_rate=None, resample_type='soxr_hq', dtype=numpy.float32, offset=0, duration=None, metadata=True)[source]

Load each channel of an audio file to a separate Audio object

Provides a way to access individual channels, since Audio.from_file mixes down to mono by default

Parameters:

Audio.from_file() (see) –

Returns:

list of Audio objects (one per channel)

Note: metadata is copied to each Audio object, but will contain an

additional field: “channel”=”1 of 3” for first of 3 channels

opensoundscape.audio.lowpass_filter(signal, cutoff_f, sample_rate, order=9)[source]

perform a butterworth lowpass filter on a discrete time signal using scipy.signal’s butter and sosfiltfilt (phase-preserving filtering)

Parameters:
  • signal – discrete time signal (audio samples, list of float)

  • low_f – -3db point (?) for highpass filter (Hz)

  • high_f – -3db point (?) for highpass filter (Hz)

  • sample_rate – samples per second (Hz)

  • order – higher values -> steeper dropoff [default: 9]

Returns:

filtered time signal

opensoundscape.audio.mix(audio_objects, duration=None, gain=-3, offsets=None, sample_rate=None, clip_range=(-1, 1))[source]

mixdown (superimpose) Audio signals into a single Audio object

Adds audio samples from multiple audio objects to create a mixdown of Audio samples. Resamples all audio to a consistent sample rate, and optionally applies individual gain and time-offsets to each Audio.

Parameters:
  • audio_objects – iterable of Audio objects

  • duration

    duration in seconds of returned Audio. Can be: - number: extends shorter Audio with silence

    and truncates longer Audio

    • None: extends all Audio to the length of the longest

      value of (Audio.duration + offset)

    [default: None]

  • gain

    number, list of numbers, or None - number: decibles of gain to apply to all objects - list of numbers: dB of gain to apply to each object

    (length must match length of audio_objects)

    [default: -3 dB on each object]

  • offsets – list of time-offsets (seconds) for each Audio object For instance [0,1] starts the first Audio at 0 seconds and shifts the second Audio to start at 1.0 seconds - if None, all objects start at time 0 - otherwise, length must match length of audio_objects.

  • sample_rate – sample rate of returned Audio object - integer: resamples all audio to this sample rate - None: uses sample rate of _first_ Audio object [default: None]

  • clip_range – minimum and maximum sample values. Samples outside this range will be replaced by the range limit values Pass None to keep sample values without clipping. [default: (-1,1)]

Returns:

Audio object

Notes

Audio metadata is discarded. .resample_type of first Audio is retained. Resampling of each Audio uses respective .resample_type of objects.

opensoundscape.audio.parse_opso_metadata(comment_string)[source]

parse metadata saved by opensoundcsape as json in comment field

Parses a json string which opensoundscape saves to the comment metadata field of WAV files to preserve metadata. The string begins with opso_metadata The contents of the string after this 13 character prefix should be parsable as JSON, and should have a key opso_metadata_version specifying the version of the metadata format, for instance ‘v0.1’.

see also generate_opso_metadata which generates the string parsed by this function.

Parameters:

comment_string – a string beginning with opso_metadata followed by JSON parseable dictionary

Returns: dictionary of parsed metadata

Spectrogram

spectrogram.py: Utilities for dealing with spectrograms

class opensoundscape.spectrogram.MelSpectrogram(spectrogram, frequencies, times, window_samples=None, overlap_samples=None, window_type=None, audio_sample_rate=None, scaling=None)[source]

Immutable mel-spectrogram container

A mel spectrogram is a spectrogram with pseudo-logarithmically spaced frequency bins (see literature) rather than linearly spaced bins.

See Spectrogram class an Librosa’s melspectrogram for detailed documentation.

NOTE: Here we rely on scipy’s spectrogram function (via Spectrogram) rather than on librosa’s _spectrogram or melspectrogram, because the amplitude of librosa’s spectrograms do not match expectations. We only use the mel frequency bank from Librosa.

classmethod from_audio(audio, window_type='hann', window_samples=None, window_length_sec=None, overlap_samples=None, overlap_fraction=None, fft_size=None, dB_scale=True, scaling='spectrum', n_mels=64, norm='slaney', htk=False)[source]

Create a MelSpectrogram object from an Audio object

First creates a spectrogram and a mel-frequency filter bank, then computes the dot product of the filter bank with the spectrogram.

A Mel spectgrogram is a spectrogram with a quasi-logarithmic frequency axis that has often been used in langauge processing and other domains.

The kwargs for the mel frequency bank are documented at: - https://librosa.org/doc/latest/generated/librosa.feature.melspectrogram.html#librosa.feature.melspectrogram - https://librosa.org/doc/latest/generated/librosa.filters.mel.html?librosa.filters.mel

Parameters:
  • audio – Audio object

  • window_type="hann" – see scipy.signal.spectrogram docs for description

  • window_samples – number of audio samples per spectrogram window (pixel) - Defaults to 512 if window_samples and window_length_sec are None - Note: cannot specify both window_samples and window_length_sec

  • window_length_sec

    length of a single window in seconds - Note: cannot specify both window_samples and window_length_sec - Warning: specifying this parameter often results in less efficient

    spectrogram computation because window_samples will not be a power of 2.

  • overlap_samples – number of samples shared by consecutive windows - Note: must not specify both overlap_samples and overlap_fraction

  • overlap_fraction – fractional temporal overlap between consecutive windows - Defaults to 0.5 if overlap_samples and overlap_fraction are None - Note: cannot specify both overlap_samples and overlap_fraction

  • fft_size – see scipy.signal.spectrogram’s nfft parameter

  • dB_scale – If True, rescales values to decibels, x=10*log10(x)

  • scaling="spectrum" – (“spectrum” or “denisty”) see scipy.signal.spectrogram docs

  • n_mels – Number of mel bands to generate [default: 128] Note: n_mels should be chosen for compatibility with the Spectrogram parameter window_samples. Choosing a value > ~ window_samples/10 will result in zero-valued rows while small values blend rows from the original spectrogram.

  • norm='slanley' – mel filter bank normalization, see Librosa docs

  • htk – use HTK mel-filter bank instead of Slaney, see Librosa docs [default: False]

Returns:

opensoundscape.spectrogram.MelSpectrogram object

plot(inline=True, fname=None, show_colorbar=False, range=(-100, -20))[source]

Plot the mel spectrogram with matplotlib.pyplot

We can’t use pcolormesh because it will smash pixels to achieve a linear y-axis, rather than preserving the mel scale.

Parameters:
  • inline=True

  • fname=None – specify a string path to save the plot to (ending in .png/.pdf)

  • show_colorbar – include image legend colorbar from pyplot

  • range – (min,max) values of .spectrogram to map to the lowest/highest pixel values Values outside this range will be clipped to the min/max values

class opensoundscape.spectrogram.Spectrogram(spectrogram, frequencies, times, window_samples=None, overlap_samples=None, window_type=None, audio_sample_rate=None, scaling=None)[source]

Immutable spectrogram container

Can be initialized directly from spectrogram, frequency, and time values or created from an Audio object using the .from_audio() method.

frequencies

(list) discrete frequency bins generated by fft

times

(list) time from beginning of file to the center of each window

spectrogram

a 2d array containing fft values for each time window

window_samples

number of samples per window when spec was created [default: none]

overlap_samples

number of samples overlapped in consecutive windows when spec was created [default: none]

window_type

window fn used to make spectrogram, eg ‘hann’ [default: none]

audio_sample_rate

sample rate of audio from which spec was created [default: none]

scaling

Selects between computing the power spectral density (‘density’) where Sxx has units

of V**2/Hz and computing the power spectrum
Type:

‘spectrum’

is measured in V and fs is measured in Hz. [default

spectrum]

amplitude(freq_range=None)[source]

create an amplitude vs time signal from spectrogram

by summing pixels in the vertical dimension

Args

freq_range=None: sum Spectrogrm only in this range of [low, high] frequencies in Hz (if None, all frequencies are summed)

Returns:

a time-series array of the vertical sum of spectrogram value

bandpass(min_f, max_f, out_of_bounds_ok=True)[source]

extract a frequency band from a spectrogram

crops the 2-d array of the spectrograms to the desired frequency range by removing rows.

Lowest and highest row kept are those with frequencies closest to min_f and max_f

Parameters:
  • min_f – low frequency in Hz for bandpass

  • max_f – high frequency in Hz for bandpass

  • out_of_bounds_ok – (bool) if False, raises ValueError if min_f or max_f are not within the range of the original spectrogram’s frequencies [default: True]

Returns:

bandpassed spectrogram object

property duration

calculate the ammount of time represented in the spectrogram

Note: time may be shorter than the duration of the audio from which the spectrogram was created, because the windows may align in a way such that some samples from the end of the original audio were discarded

classmethod from_audio(audio, window_type='hann', window_samples=None, window_length_sec=None, overlap_samples=None, overlap_fraction=None, fft_size=None, dB_scale=True, scaling='spectrum')[source]

create a Spectrogram object from an Audio object

Parameters:
  • audio – Audio object

  • window_type="hann" – see scipy.signal.spectrogram docs

  • window_samples – number of audio samples per spectrogram window (pixel) - Defaults to 512 if window_samples and window_length_sec are None - Note: cannot specify both window_samples and window_length_sec

  • window_length_sec

    length of a single window in seconds - Note: cannot specify both window_samples and window_length_sec - Warning: specifying this parameter often results in less efficient

    spectrogram computation because window_samples will not be a power of 2.

  • overlap_samples – number of samples shared by consecutive windows - Note: must not specify both overlap_samples and overlap_fraction

  • overlap_fraction – fractional temporal overlap between consecutive windows - Defaults to 0.5 if overlap_samples and overlap_fraction are None - Note: cannot specify both overlap_samples and overlap_fraction

  • fft_size – see scipy.signal.spectrogram’s nfft parameter

  • dB_scale – If True, rescales values to decibels, x=10*log10(x)

  • scaling="spectrum" – (“spectrum” or “density”) see scipy.signal.spectrogram docs

Returns:

opensoundscape.spectrogram.Spectrogram object

limit_range(min=-100, max=-20)[source]

Limit (clip) the values of the spectrogram to range from min to max

values of self.spectrogram less than min are set to min values of self.spectrogram greater than max are set to max

similar to Audacity’s gain and range parameters

Parameters:
  • min – values lower than this are set to this

  • max – values higher than this are set to this

Returns:

Spectrogram object with .spectrogram values clipped to (min,max)

linear_scale(feature_range=(0, 1), input_range=(-100, -20))[source]

Linearly rescale spectrogram values to a range of values

Parameters:
  • feature_range – tuple of (low,high) values for output

  • input_range – tuple of (min,max) range. values beyond this range will be clipped to (low,high) before mapping onto the feature_range

Returns:

Spectrogram object with values rescaled to feature_range

min_max_scale(feature_range=(0, 1))[source]

Linearly rescale spectrogram values to a range of values using in_range as minimum and maximum

Parameters:

feature_range – tuple of (low,high) values for output

Returns:

Spectrogram object with values rescaled to feature_range

net_amplitude(signal_band, reject_bands=None)[source]

create amplitude signal in signal_band and subtract amplitude from reject_bands

rescale the signal and reject bands by dividing by their bandwidths in Hz (amplitude of each reject_band is divided by the total bandwidth of all reject_bands. amplitude of signal_band is divided by badwidth of signal_band. )

Parameters:
  • signal_band – [low,high] frequency range in Hz (positive contribution)

  • band (reject) – list of [low,high] frequency ranges in Hz (negative contribution)

return: time-series array of net amplitude

plot(inline=True, fname=None, show_colorbar=False, range=(-100, -20))[source]

Plot the spectrogram with matplotlib.pyplot

Parameters:
  • inline=True

  • fname=None – specify a string path to save the plot to (ending in .png/.pdf)

  • show_colorbar – include image legend colorbar from pyplot

  • range – tuple of (min,max) values of .spectrogram to map to the lowest/highest pixel values. Values outside this range will be clipped to the min/max values

to_image(shape=None, channels=1, colormap=None, invert=False, return_type='pil', range=(-100, -20))[source]

Create an image from spectrogram (array, tensor, or PIL.Image)

Note: Linearly rescales values in the spectrogram from range (min,max) to [0,255] (PIL.Image) or [0,1] (array/tensor)

Default of range is [-100, -20], so, e.g., -20 db is loudest -> black, -100 db is quietest -> white

Parameters:
  • shape – tuple of output dimensions as (height, width) - if None, retains original shape of self.spectrogram - if first or second value are None, retains original shape in that dimension

  • channels – eg 3 for rgb, 1 for greyscale - must be 3 to use colormap

  • colormap – if None, greyscale spectrogram is generated Can be any matplotlib colormap name such as ‘jet’

  • return_type – type of returned object - ‘pil’: PIL.Image - ‘np’: numpy.ndarray - ‘torch’: torch.tensor

  • range – tuple of (min,max) values of .spectrogram to map to the lowest/highest pixel values. Values outside this range will be clipped to the min/max values

Returns:

  • PIL.Image with c channels and shape w,h given by shape

    and values in [0,255]

  • np.ndarray with shape [c,h,w] and values in [0,1]

  • or torch.tensor with shape [c,h,w] and values in [0,1]

Return type:

Image/array with type depending on return_type

trim(start_time, end_time)[source]

extract a time segment from a spectrogram

first and last columns kept are those with times closest to start_time and end_time

Parameters:
  • start_time – in seconds

  • end_time – in seconds

Returns:

spectrogram object from extracted time segment

property window_length

calculate length of a single fft window, in seconds:

property window_start_times

get start times of each window, rather than midpoint times

property window_step

calculate time difference (sec) between consecutive windows’ centers

CNN

classes for pytorch machine learning models in opensoundscape

For tutorials, see notebooks on opensoundscape.org

class opensoundscape.ml.cnn.BaseClassifier(*args: Any, **kwargs: Any)[source]

Base class for a deep-learning classification model.

Implements .predict(), .eval() and .generate_samples() but not .train()

Sub-class this class for flexible behavior. This class is not meant to be used directly.

Child classes: CNN, TensorFlowHubModel

eval(targets, scores, logging_offset=0)[source]

compute single-target or multi-target metrics from targets and scores

By default, the overall model score is “map” (mean average precision) for multi-target models (self.single_target=False) and “f1” (average of f1 score across classes) for single-target models).

Override this function to use a different set of metrics. It should always return (1) a single score (float) used as an overall metric of model quality and (2) a dictionary of computed metrics

Parameters:
  • targets – 0/1 for each sample and each class

  • scores – continuous values in 0/1 for each sample and class

  • logging_offset – modify verbosity - for example, -1 will reduce the amount of printing/logging by 1 level

generate_samples(samples, invalid_samples_log=None, return_invalid_samples=False, **kwargs)[source]

Generate AudioSample objects. Input options same as .predict()

Parameters:
  • samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths

  • args (see .predict() documentation for other) –

  • **kwargs – any arguments to inference_dataloader_cls.__init__ (default class is SafeAudioDataloader)

Returns:

a list of AudioSample objects - if return_invalid_samples is True, returns second value: list of paths to samples that failed to preprocess

Example: ` from opensoundscappe.preprocess.utils import show_tensor_grid samples = generate_samples(['/path/file1.wav','/path/file2.wav']) tensors = [s.data for s in samples] show_tensor_grid(tensors,columns=3) `

predict(samples, batch_size=1, num_workers=0, activation_layer=None, split_files_into_clips=True, overlap_fraction=0, final_clip=None, bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, wandb_session=None, return_invalid_samples=False, progress_bar=True, **kwargs)[source]

Generate predictions on a set of samples

Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)

Parameters:
  • samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths

  • batch_size – Number of files to load simultaneously [default: 1]

  • num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]

  • activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]

  • split_files_into_clips – If True, internally splits and predicts on clips from longer audio files Otherwise, assumes each row of samples corresponds to one complete sample

  • overlap_fraction – fraction of overlap between consecutive clips when predicting on clips of longer audio files. For instance, 0.5 gives 50% overlap between consecutive clips.

  • final_clip – see opensoundscape.utils.generate_clip_times_df

  • bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.

  • invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb

  • return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

  • **kwargs – additional arguments to inference_dataloader_cls.__init__

Returns:

df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess

Effects:

(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. Afte prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.

(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file

Note: if loading an audio file raises a PreprocessingError, the scores

for that sample will be np.nan

class opensoundscape.ml.cnn.CNN(*args: Any, **kwargs: Any)[source]

Generic CNN Model with .train(), .predict(), and .save()

flexible architecture, optimizer, loss function, parameters

for tutorials and examples see opensoundscape.org

Parameters:
  • architecture

    EITHER a pytorch model object (subclass of torch.nn.Module), for example one generated with the cnn_architectures module OR a string matching one of the architectures listed by cnn_architectures.list_architectures(), eg ‘resnet18’. - If a string is provided, uses default parameters

    (including pretrained weights, weights=”DEFAULT”) Note: if num channels != 3, copies weights from original channels by averaging (<3 channels) or recycling (>3 channels)

  • classes – list of class names. Must match with training dataset classes if training.

  • single_target

    • True: model expects exactly one positive class per sample

    • False: samples can have any number of positive classes

    [default: False]

  • preprocessor_class – class of Preprocessor object

  • sample_shape – tuple of height, width, channels for created sample [default: (224,224,3)] #TODO: consider changing to (ch,h,w) to match torchww

classmethod from_torch_dict(path)[source]

load a model saved using CNN.save_torch_dict()

Parameters:

path – path to file saved using CNN.save_torch_dict()

Returns:

new CNN instance

Note: if you used .save() instead of .save_torch_dict(), load the model using cnn.load_model(). Note that the model object will not load properly across different versions of OpenSoundscape. To save and load models across different versions of OpenSoundscape, use .save_torch_dict(), but note that preprocessing and other customized settings will not be retained.

generate_cams(samples, method='gradcam', classes=None, target_layers=None, guided_backprop=False, progress_bar=True, **kwargs)[source]

Generate a activation and/or backprop heatmaps for each sample

Parameters:
  • samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths

  • method

    method to use for activation map. Can be str (choose from below) or a class of pytorch_grad_cam (any subclass of BaseCAM), or None if None, activation maps will not be created [default:’gradcam’]

    str can be any of the following:

    ”gradcam”: pytorch_grad_cam.GradCAM, “hirescam”: pytorch_grad_cam.HiResCAM, “scorecam”: opensoundscape.ml.utils.ScoreCAM, #pytorch_grad_cam.ScoreCAM, “gradcam++”: pytorch_grad_cam.GradCAMPlusPlus, “ablationcam”: pytorch_grad_cam.AblationCAM, “xgradcam”: pytorch_grad_cam.XGradCAM, “eigencam”: pytorch_grad_cam.EigenCAM, “eigengradcam”: pytorch_grad_cam.EigenGradCAM, “layercam”: pytorch_grad_cam.LayerCAM, “fullgrad”: pytorch_grad_cam.FullGrad, “gradcamelementwise”: pytorch_grad_cam.GradCAMElementWise,

  • classes (list) – list of classes, will create maps for each class [default: None] if None, creates an activation map for the highest scoring class on a sample-by-sample basis

  • target_layers (list) –

    list of target layers for GradCAM - if None [default] attempts to use architecture’s default target_layer Note: only architectures created with opensoundscape 0.9.0+ will have a default target layer. See pytorch_grad_cam docs for suggestions. Note: if multiple layers are provided, the activations are merged across

    layers (rather than returning separate activations per layer)

  • guided_backprop – bool [default: False] if True, performs guided backpropagation for each class in classes. AudioSamples will have attribute .gbp_maps, a pd.Series indexed by class name

  • SafeAudioDataloader (**kwargs are passed to) – (incl: batch_size, num_workers, split_file_into_clips, bypass_augmentations, raise_errors, overlap_fraction, final_clip, other DataLoader args)

Returns:

a list of AudioSample objects with .cam attribute, an instance of the CAM class ( visualize with sample.cam.plot()). See the CAM class for more details

See pytorch_grad_cam documentation for references to the source of each method.

load_weights(path, strict=True)[source]

load network weights state dict from a file

For instance, load weights saved with .save_weights() in-place operation

Parameters:
  • path – file path with saved weights

  • strict – (bool) see torch.load()

save(path, save_train_loader=False, save_hooks=False)[source]

save model with weights using torch.save()

load from saved file with torch.load(path) or cnn.load_model(path)

Note: saving and loading model objects across OpenSoundscape versions will not work properly. Instead, use .save_torch_dict and .load_torch_dict (but note that customizations to preprocessing, training params, etc will not be retained using those functions).

For maximum flexibilty in further use, save the model with both .save() and .save_torch_dict()

Parameters:
  • path – file path for saved model object

  • save_train_loader – retrain .train_loader in saved object [default: False]

  • save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()

save_torch_dict(path)[source]

save model to file for use in other opso versions

WARNING: this does not save any preprocessing or augmentation

settings or parameters, or other attributes such as the training parameters or loss function. It only saves architecture, weights, classes, sample shape, sample duration, and single_target.

To save the entire pickled model object (recover all parameters and settings), use model.save() instead. Note that models saved with model.save() will not work across different versions of OpenSoundscape.

To recreate the model after saving with this function, use CNN.from_torch_dict(path)

Parameters:

path – file path for saved model object

Effects:

saves a file using torch.save() containing model weights and other information

save_weights(path)[source]

save just the weights of the network

This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())

Parameters:

path – location to save weights file

train(train_df, validation_df=None, epochs=1, batch_size=1, num_workers=0, save_path='.', save_interval=1, log_interval=10, validation_interval=1, invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, progress_bar=True)[source]

train the model on samples from train_dataset

If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().

Parameters:
  • train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)

  • validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]

  • epochs – number of epochs to train for (1 epoch constitutes 1 view of each training sample)

  • batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation

  • num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)

  • save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]

  • save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.

  • log_interval – interval in batches to print training loss/metrics

  • validation_interval – interval in epochs to test the model on the validation set Note that model will only update it’s best score and save best.model file on epochs that it performs validation.

  • invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example: ` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.train(...,wandb_session=session) session.finish() `

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

Effects:

If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.

class opensoundscape.ml.cnn.InceptionV3(*args: Any, **kwargs: Any)[source]

Child of CNN class for InceptionV3 architecture

classmethod from_torch_dict()[source]

load a model saved using CNN.save_torch_dict()

Parameters:

path – path to file saved using CNN.save_torch_dict()

Returns:

new CNN instance

Note: if you used .save() instead of .save_torch_dict(), load the model using cnn.load_model(). Note that the model object will not load properly across different versions of OpenSoundscape. To save and load models across different versions of OpenSoundscape, use .save_torch_dict(), but note that preprocessing and other customized settings will not be retained.

opensoundscape.ml.cnn.load_model(path, device=None)[source]

load a saved model object

Note: saving and loading model objects across OpenSoundscape versions will not work properly. Instead, use .save_torch_dict and .load_torch_dict (but note that customizations to preprocessing, training params, etc will not be retained using those functions).

For maximum flexibilty in further use, save the model with both .save() and .save_torch_dict()

Parameters:
  • path – file path of saved model

  • device – which device to load into, eg ‘cuda:1’

  • [default – None] will choose first gpu if available, otherwise cpu

Returns:

a model object with loaded weights

opensoundscape.ml.cnn.load_outdated_model(path, architecture, sample_duration, model_class=<class 'opensoundscape.ml.cnn.CNN'>, device=None)[source]

load a CNN saved with a version of OpenSoundscape <0.6.0

This function enables you to load models saved with opso 0.4.x and 0.5.x. If your model was saved with .save() in a previous version of OpenSoundscape >=0.6.0, you must re-load the model using the original package version and save it’s network’s state dict, i.e., torch.save(model.network.state_dict(),path), then load the state dict to a new model object with model.load_weights(). See the Predict with pre-trained CNN tutorial for details.

For models created with the same version of OpenSoundscape as the one you are using, simply use opensoundscape.ml.cnn.load_model().

Note: for future use of the loaded model, you can simply call model.save(path) after creating it, then reload it with model = load_model(path). The saved model will be fully compatible with opensoundscape >=0.7.0.

Examples: ``` #load a binary resnet18 model from opso 0.4.x, 0.5.x, or 0.6.0 from opensoundscape import CNN model = load_outdated_model(‘old_model.tar’,architecture=’resnet18’)

#load a resnet50 model of class CNN created with opso 0.5.0 from opensoundscape import CNN model_050 = load_outdated_model(‘opso050_pytorch_model_r50.model’,architecture=’resnet50’) ```

Parameters:
  • path – path to model file, ie .model or .tar file

  • architecture – see CNN docs (pass None if the class __init__ does not take architecture as an argument)

  • sample_duration – length of samples in seconds

  • model_class – class to construct. Normally CNN.

  • device – optionally specify a device to map tensors onto,

  • 'cpu' (eg) – 0’, ‘cuda:1’[default: None] - if None, will choose cuda:0 if cuda is available, otherwise chooses cpu

  • 'cuda – 0’, ‘cuda:1’[default: None] - if None, will choose cuda:0 if cuda is available, otherwise chooses cpu

Returns:

a cnn model object with the weights loaded from the saved model

opensoundscape.ml.cnn.separate_resnet_feat_clf(model)[source]

Separate feature/classifier training params for a ResNet model

Parameters:

model – an opso model object with a pytorch resnet architecture

Returns:

model with modified .optimizer_params and ._init_optimizer() method

Effects:

creates a new self.opt_net object that replaces the old one resets self.current_epoch to 0

opensoundscape.ml.cnn.use_resample_loss(model, train_df)[source]

Modify a model to use ResampleLoss for multi-target training

ResampleLoss may perform better than BCE Loss for multitarget problems in some scenarios.

Parameters:
  • model – CNN object

  • train_df – dataframe of labels, used to calculate class frequency

Annotations

functions and classes for manipulating annotations of audio

includes BoxedAnnotations class and utilities to combine or “diff” annotations, etc.

class opensoundscape.annotations.BoxedAnnotations(df, annotation_files=None, audio_files=None)[source]

container for “boxed” (frequency-time) annotations of audio (for instance, annotations created in Raven software)

includes functionality to load annotations from Pandas DataFrame or Raven Selection tables (.txt files), output one-hot labels for specific clip lengths or clip start/end times, apply corrections/conversions to annotations, and more.

Contains some analogous functions to Audio and Spectrogram, such as trim() [limit time range] and bandpass() [limit frequency range]

the .df attribute is a Pandas DataFrame containing the annotations with time and frequency bounds

the .annotation_files and .audio_files attributes are lists of annotation and audio file paths, respectively. They are retained as a record of _what audio was annotated_, rather than what annotations were placed on the audio. For instance, an audio file may have no entries in the dataframe if it contains no annotations, but is listed in audio_files because it was annotated/reviewed.

bandpass(low_f, high_f, edge_mode='trim')[source]

Bandpass a set of annotations, analogous to Spectrogram.bandpass()

Reduces the range of annotation boxes overlapping with the bandpass limits, and removes annotation boxes entirely if they lie completely outside of the bandpass limits.

Out-of-place operation: does not modify itself, returns new object

Parameters:
  • low_f – low frequency (Hz) bound

  • high_f – high frequench (Hz) bound

  • edge_mode – what to do when boxes overlap with edges of trim region - ‘trim’: trim boxes to bounds - ‘keep’: allow boxes to extend beyond bounds - ‘remove’: completely remove boxes that extend beyond bounds

Returns:

a copy of the BoxedAnnotations object on the bandpassed region

convert_labels(conversion_table)[source]

modify annotations according to a conversion table

Changes the values of ‘annotation’ column of dataframe. Any labels that do not have specified conversions are left unchanged.

Returns a new BoxedAnnotations object, does not modify itself (out-of-place operation). So use could look like: my_annotations = my_annotations.convert_labels(table)

Parameters:

conversion_table – current values -> new values. can be either - pd.DataFrame with 2 columns [current value, new values] or - dictionary {current values: new values}

Returns:

new BoxedAnnotations object with converted annotation labels

classmethod from_raven_files(raven_files, audio_files=None, annotation_column_idx=8, annotation_column_name=None, keep_extra_columns=True, column_mapping_dict=None)[source]

load annotations from Raven .txt files

Parameters:
  • raven_files – list of raven .txt file paths (as str or pathlib.Path)

  • audio_files – (list) optionally specify audio files corresponding to each raven file (length should match raven_files) - if None (default), .one_hot_clip_labels() will not be able to check the duration of each audio file, and will raise an error unless full_duration is passed as an argument

  • annotation_column_idx – (int) position of column containing annotations - [default: 8] will be correct if the first user-created column in Raven contains annotations. First column is 1, second is 2 etc. - pass None to load the raven file without explicitly assigning a column as the annotation column. The resulting object’s .df will have an annotation column with nan values! NOTE: If annotatino_column_name is passed, this argument is ignored.

  • annotation_column_name – (str) name of the column containing annotations - default: None will use annotation-column_idx to find the annotation column - if not None, this value overrides annotation_column_idx, and the column with this name will be used as the annotations.

  • keep_extra_columns – keep or discard extra Raven file columns (always keeps start_time, end_time, low_f, high_f, annotation audio_file). [default: True] - True: keep all - False: keep none - or iterable of specific columns to keep

  • column_mapping_dict

    dictionary mapping Raven column names to desired column names in the output dataframe. The columns of the laoded Raven file are renamed according to this dictionary. The resulting dataframe must contain: [‘start_time’,’end_time’,’low_f’,’high_f’] [default: None] If None (or for any unspecified columns), will use the standard column names:

    {

    “Begin Time (s)”: “start_time”, “End Time (s)”: “end_time”, “Low Freq (Hz)”: “low_f”, “High Freq (Hz)”: “high_f”,

    }

    This dictionary will be updated with any user-specified mappings.

Returns:

BoxedAnnotations object containing annotations from the Raven files (the .df attribute is a dataframe containing each annotation)

global_one_hot_labels(classes)[source]

get a list of one-hot labels for entire set of annotations :param classes: iterable of class names to give 0/1 labels

Returns:

list of 0/1 labels for each class

one_hot_clip_labels(clip_duration, clip_overlap, min_label_overlap, min_label_fraction=1, full_duration=None, class_subset=None, final_clip=None, audio_files=None)[source]

Generate one-hot labels for clips of fixed duration

wraps utils.make_clip_df() with self.one_hot_labels_like() - Clips are created in the same way as Audio.split() - Labels are applied based on overlap, using self.one_hot_labels_like()

Parameters:
  • clip_duration (float) – The duration in seconds of the clips

  • clip_overlap (float) – The overlap of the clips in seconds [default: 0]

  • min_label_overlap – minimum duration (seconds) of annotation within the time interval for it to count as a label. Note that any annotation of length less than this value will be discarded. We recommend a value of 0.25 for typical bird songs, or shorter values for very short-duration events such as chip calls or nocturnal flight calls.

  • min_label_fraction – [default: None] if >= this fraction of an annotation overlaps with the time window, it counts as a label regardless of its duration. Note that if either of the two criterea (overlap and fraction) is met, the label is 1. if None (default), this criterion is not used (i.e., only min_label_overlap is used). A value of 0.5 for ths parameter would ensure that all annotations result in at least one clip being labeled 1 (if there are no gaps between clips).

  • full_duration – The amount of time (seconds) to split into clips for each file float or None; if None, attempts to get each file’s duration using librosa.get_duration(path=file) where file is the value of audio for each row of self.df

  • class_subset – list of classes for one-hot labels. If None, classes will be all unique values of self.df[‘annotation’]

  • final_clip (str) –

    Behavior if final_clip is less than clip_duration seconds long. By default, discards remaining time if less than clip_duration seconds long [default: None]. Options: - None: Discard the remainder (do not make a clip) - “extend”: Extend the final clip beyond full_duration to reach

    clip_duration length

    • ”remainder”: Use only remainder of full_duration

      (final clip will be shorter than clip_duration)

    • ”full”: Increase overlap with previous clip to yield a

      clip with clip_duration length

  • audio_files – list of audio file paths (as str or pathlib.Path) to create clips for. If None, uses self.audio_files. [default: None]

Returns:

dataframe with index [‘file’,’start_time’,’end_time’] and columns=classes

one_hot_labels_like(clip_df, min_label_overlap, min_label_fraction=None, class_subset=None, warn_no_annotations=False)[source]

create a dataframe of one-hot clip labels based on given starts/ends

Uses start and end clip times from clip_df to define a set of clips for each file. Then extracts annotations overlapping with each clip.

Required overlap to consider an annotation to overlap with a clip is defined by user: an annotation must satisfy the minimum time overlap OR minimum % overlap to be included (doesn’t require both conditions to be met, only one)

clip_df can be created using opensoundscap.utils.make_clip_df

See also: .one_hot_clip_labels(), which creates even-lengthed clips automatically and can often be used instead of this function.

Parameters:
  • clip_df – dataframe with (file, start_time, end_time) MultiIndex specifying the temporal bounds of each clip (clip_df can be created using opensoundscap.helpers.make_clip_df)

  • min_label_overlap – minimum duration (seconds) of annotation within the time interval for it to count as a label. Note that any annotation of length less than this value will be discarded. We recommend a value of 0.25 for typical bird songs, or shorter values for very short-duration events such as chip calls or nocturnal flight calls.

  • min_label_fraction – [default: None] if >= this fraction of an annotation overlaps with the time window, it counts as a label regardless of its duration. Note that if either of the two criterea (overlap and fraction) is met, the label is 1. if None (default), this criterion is not used (i.e., only min_label_overlap is used). A value of 0.5 for ths parameter would ensure that all annotations result in at least one clip being labeled 1 (if there are no gaps between clips).

  • class_subset – list of classes for one-hot labels. If None, classes will be all unique values of self.df[‘annotation’]

  • warn_no_annotations – bool [default:False] if True, raises warnings for any files in clip_df with no corresponding annotations in self.df

Returns:

DataFrame of one-hot labels w/ multi-index of (file, start_time, end_time), a column for each class, and values of 0=absent or 1=present

subset(classes)[source]

subset annotations to those from a list of classes

out-of-place operation (returns new filtered BoxedAnnotations object)

Parameters:
  • classes – list of classes to retain (all others are discarded)

  • them (- the list can include nan or None if you want to keep) –

Returns:

new BoxedAnnotations object containing only annotations in classes

to_raven_files(save_dir, audio_files=None)[source]

save annotations to a Raven-compatible tab-separated text files

Creates one file per unique audio file in ‘file’ column of self.df

Parameters:
  • save_dir – directory for saved files - can be str or pathlib.Path

  • audio_files – list of audio file paths (as str or pathlib.Path) or None [default: None]. If None, uses self.audio_files. Note that it does not use self.df[‘audio_file’].unique()

Outcomes:

creates files containing the annotations for each audio file in a format compatible with Raven Pro/Lite. File is tab-separated and contains columns matching the Raven defaults.

Note: Raven Lite does not support additional columns beyond a single annotation column. Additional columns will not be shown in the Raven Lite interface.

trim(start_time, end_time, edge_mode='trim')[source]

Trim the annotations of each file in time

Trims annotations from outside of the time bounds. Note that the annotation start and end times of different files may not represent the same real-world times. This function only uses the numeric values of annotation start and end times in the annotations, which should be relative to the beginning of the corresponding audio file.

For zero-length annotations (start_time = end_time), start and end times are inclusive on the left and exclusive on the right, ie [lower,upper). For instance start_time=0, end_time=1 includes zero-length annotations at 0 but excludes zero-length annotations a 1.

Out-of-place operation: does not modify itself, returns new object

Parameters:
  • start_time – time (seconds) since beginning for left bound

  • end_time – time (seconds) since beginning for right bound

  • edge_mode – what to do when boxes overlap with edges of trim region - ‘trim’: trim boxes to bounds - ‘keep’: allow boxes to extend beyond bounds - ‘remove’: completely remove boxes that extend beyond bounds

Returns:

a copy of the BoxedAnnotations object on the trimmed region. - note that, like Audio.trim(), there is a new reference point for 0.0 seconds (located at start_time in the original object). For example, calling .trim(5,10) will result in an annotation previously starting at 6s to start at 1s in the new object.

unique_labels()[source]

get list of all unique labels

ignores null/Falsy labels by performing .df.dropna()

opensoundscape.annotations.categorical_to_one_hot(labels, class_subset=None)[source]

transform multi-target categorical labels (list of lists) to one-hot array

Parameters:
  • labels – list of lists of categorical labels, eg [[‘white’,’red’],[‘green’,’white’]] or [[0,1,2],[3]]

  • classes=None – list of classes for one-hot labels. if None, taken to be the unique set of values in labels

Returns:

2d array with 0 for absent and 1 for present class_subset: list of classes corresponding to columns in the array

Return type:

one_hot

opensoundscape.annotations.diff(base_annotations, comparison_annotations)[source]

look at differences between two BoxedAnnotations objects Not Implemented.

Compare different labels of the same boxes (Assumes that a second annotator used the same boxes as the first, but applied new labels to the boxes)

opensoundscape.annotations.one_hot_labels_on_time_interval(df, class_subset, start_time, end_time, min_label_overlap, min_label_fraction=None)[source]

generate a dictionary of one-hot labels for given time-interval

Each class is labeled 1 if any annotation overlaps sufficiently with the time interval. Otherwise the class is labeled 0.

Parameters:
  • df – DataFrame with columns ‘start_time’, ‘end_time’ and ‘annotation’

  • classes – list of classes for one-hot labels. If None, classes will be all unique values of self.df[‘annotation’]

  • start_time – beginning of time interval (seconds)

  • end_time – end of time interval (seconds)

  • min_label_overlap – minimum duration (seconds) of annotation within the time interval for it to count as a label. Note that any annotation of length less than this value will be discarded. We recommend a value of 0.25 for typical bird songs, or shorter values for very short-duration events such as chip calls or nocturnal flight calls.

  • min_label_fraction – [default: None] if >= this fraction of an annotation overlaps with the time window, it counts as a label regardless of its duration. Note that if either of the two criterea (overlap and fraction) is met, the label is 1. if None (default), the criterion is not used (only min_label_overlap is used). A value of 0.5 would ensure that all annotations result in at least one clip being labeled 1 (if no gaps between clips).

Returns:

label 0/1} for all classes

Return type:

dictionary of {class

opensoundscape.annotations.one_hot_to_categorical(one_hot, classes)[source]

transform one_hot labels to multi-target categorical (list of lists)

Parameters:
  • one_hot – 2d array with 0 for absent and 1 for present

  • classes – list of classes corresponding to columns in the array

Returns:

list of lists of categorical labels for each sample, eg

[[‘white’,’red’],[‘green’,’white’]] or [[0,1,2],[3]]

Return type:

labels

Bioacoustics Model Zoo Wrappers

lightweight wrapper to list and get models from bioacoustics model zoo with torch.hub

opensoundscape.ml.bioacoustics_model_zoo.list_models()[source]

list the models available in the [bioacoustics model zoo](https://github.com/kitzeslab/bioacoustics-model-zoo)

Returns:

list of available models

see also: load(model)

opensoundscape.ml.bioacoustics_model_zoo.load(model)[source]

load a model from the [bioacoustics model zoo](https://github.com/kitzeslab/bioacoustics-model-zoo)

see list_models() for a list of available models

Parameters:

model – name of model to load, i.e. one listed by list_models()

Returns:

ready-to-use model object - can typically be used just like CNN classs: model.predict() - see the model zoo page for details on each model

note that some models may require additional arguments; in that case, use torch.hub.load(“kitzeslab/bioacoustics-model-zoo”, …) directly, passing additional arguments after the model name (see https://github.com/kitzeslab/bioacoustics-model-zoo landing page for detailed instructions)

Machine Learning utils

Utilties for .ml

class opensoundscape.ml.utils.ScoreCAM(*args: Any, **kwargs: Any)[source]
opensoundscape.ml.utils.apply_activation_layer(x, activation_layer=None)[source]

applies an activation layer to a set of scores

Parameters:
  • x – input values

  • activation_layer

    • None [default]: return original values

    • ’softmax’: apply softmax activation

    • ’sigmoid’: apply sigmoid activation

    • ’softmax_and_logit’: apply softmax then logit transform

Returns:

values with activation layer applied Note: if x is None, returns None

Note: casts x to float before applying softmax, since torch’s softmax implementation doesn’t support int or Long type

opensoundscape.ml.utils.cas_dataloader(dataset, batch_size, num_workers)[source]

Return a dataloader that uses the class aware sampler

Class aware sampler tries to balance the examples per class in each batch. It selects just a few classes to be present in each batch, then samples those classes for even representation in the batch.

Parameters:
  • dataset – a pytorch dataset type object

  • batch_size – see DataLoader

  • num_workers – see DataLoader

opensoundscape.ml.utils.collate_audio_samples_to_tensors(batch)[source]

takes a list of AudioSample objects, returns batched tensors

use this collate function with DataLoader if you want to use AudioFileDataset (or AudioSplittingDataset) but want the traditional output of PyTorch Dataloaders (returns two tensors:

the first is a tensor of the data with dim 0 as batch dimension, the second is a tensor of the labels with dim 0 as batch dimension)

Parameters:

batch – a list of AudioSample objects

Returns:

(Tensor of stacked AudioSample.data, Tensor of stacked AudioSample.label.values)

Example usage: ```

from opensoundscape import AudioFileDataset, SpectrogramPreprocessor

preprocessor = SpectrogramPreprocessor(sample_duration=2,height=224,width=224) audio_dataset = AudioFileDataset(label_df,preprocessor)

train_dataloader = DataLoader(

audio_dataset, batch_size=64, shuffle=True, collate_fn = collate_audio_samples_to_tensors

)

```

opensoundscape.ml.utils.get_batch(array, batch_size, batch_number)[source]

get a single slice of a larger array

using the batch size and batch index, from zero

Parameters:
  • array – iterable to split into batches

  • batch_size – num elements per batch

  • batch_number – index of batch

Returns:

one batch (subset of array)

Note: the final elements are returned as the last batch even if there are fewer than batch_size

Example

if array=[1,2,3,4,5,6,7] then:

  • get_batch(array,3,0) returns [1,2,3]

  • get_batch(array,3,3) returns [7]

CNN Architectures

Module to initialize PyTorch CNN architectures with custom output shape

This module allows the use of several built-in CNN architectures from PyTorch. The architecture refers to the specific layers and layer input/output shapes (including convolution sizes and strides, etc) - such as the ResNet18 or Inception V3 architecture.

We provide wrappers which modify the output layer to the desired shape (to match the number of classes). The way to change the output layer shape depends on the architecture, which is why we need a wrapper for each one. This code is based on pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

To use these wrappers, for example, if your model has 10 output classes, write

my_arch=resnet18(10)

Then you can initialize a model object from opensoundscape.ml.cnn with your architecture:

model=CNN(my_arch,classes,sample_duration)

or override an existing model’s architecture:

model.network = my_arch

Note: the InceptionV3 architecture must be used differently than other architectures - the easiest way is to simply use the InceptionV3 class in opensoundscape.ml.cnn.

opensoundscape.ml.cnn_architectures.alexnet(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for AlexNet architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.change_conv2d_channels(conv2d, num_channels=3, reuse_weights=True)[source]

Modify the number of input channels for a pytorch CNN

This function changes the input shape of a torch.nn.Conv2D layer to accommodate a different number of channels. It attempts to retain weights in the following manner: - If num_channels is less than the original, it will average weights across the original channels and apply them to all new channels. - if num_channels is greater than the original, it will cycle through the original channels, copying them to the new channels

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

  • reuse_weights – if True (default), averages (if num_channels<original)

  • through (or cycles) – and adds them to the new Conv2D

opensoundscape.ml.cnn_architectures.change_fc_output_size(fc, num_classes)[source]

Modify the number of output nodes of a fully connected layer

Parameters:
  • fc – the fully connected layer of the model that should be modified

  • num_classes – number of output nodes for the new fc

opensoundscape.ml.cnn_architectures.densenet121(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for densenet121 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.efficientnet_b0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_b0 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.efficientnet_b4(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_b4 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.efficientnet_widese_b0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_widese_b0 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.efficientnet_widese_b4(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_widese_b4 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.freeze_params(model)[source]

remove gradients (aka freeze) all model parameters

opensoundscape.ml.cnn_architectures.inception_v3(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for Inception v3 architecture

Input: 229x229

WARNING: expects (299,299) sized images and has auxiliary output. See InceptionV3 class in opensoundscape.ml.cnn for use.

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.list_architectures()[source]

return list of available architecture keyword strings

opensoundscape.ml.cnn_architectures.register_arch(func)[source]

add architecture to ARCH_DICT

opensoundscape.ml.cnn_architectures.resnet101(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet101 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet152(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet152 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet18(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet18 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet34(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet34 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet50(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet50 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.squeezenet1_0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for squeezenet architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.vgg11_bn(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for vgg11 architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

Logging with WandB (Weights and Biases)

helpers for integrating with WandB and exporting content

opensoundscape.logging.wandb_table(dataset, n=None, classes_to_extract=(), random_state=None, raise_exceptions=False, drop_labels=False, gradcam_model=None)[source]

Generate a wandb Table visualizing n random samples from a sample_df

Parameters:
  • dataset – object to generate samples, eg AudioFileDataset or AudioSplittingDataset

  • n – number of samples to generate (randomly selected from df) - if None, does not subsample or change order

  • bypass_augmentations – if True, augmentations in Preprocessor are skipped

  • classes_to_extract – tuple of classes - will create columns containing the scores/labels

  • random_state – default None; if integer provided, used for reproducible random sample

  • drop_labels – if True, does not include ‘label’ column in Table

  • gradcam_model – if not None, will generate GradCAMs for each sample using gradcam_model.get_cams()

Returns: a W&B Table of preprocessed samples with labels and playable audio

Data Selection

tools for subsetting and resampling collections

opensoundscape.data_selection.resample(df, n_samples_per_class, upsample=True, downsample=True, with_replace=False, random_state=None)[source]

resample a one-hot encoded label df for a target n_samples_per_class

Parameters:
  • df – dataframe with one-hot encoded labels: columns are classes, index is sample name/path

  • n_samples_per_class – target number of samples per class

  • upsample – if True, duplicate samples for classes with <n samples to get to n samples

  • downsample – if True, randomly sample classis with >n samples to get to n samples

  • with_replace – flag to enable sampling of the same row more than once, default False

  • random_state – passed to np.random calls. If None, random state is not fixed.

Note: The algorithm assumes that the label df is single-label. If the label df is multi-label, some classes can end up over-represented.

Note 2: The resulting df will have samples ordered by class label, even if the input df had samples in a random order.

opensoundscape.data_selection.upsample(input_df, label_column='Labels', with_replace=False, random_state=None)[source]

Given a input DataFrame of categorical labels, upsample to maximum value

Upsampling removes the class imbalance in your dataset. Rows for each label are repeated up to max_count // rows. Then, we randomly sample the rows to fill up to max_count.

The input df is NOT one-hot encoded in this case, but instead contains categorical labels in a specified label_columns

Parameters:
  • input_df – A DataFrame to upsample

  • label_column – The column to draw unique labels from

  • once (with_replace flag to enable sampling of the same row more than) –

  • False (default) –

  • random_state – Set the random_state during sampling

Returns:

An upsampled DataFrame

Return type:

df

Datasets

Preprocessors: pd.Series child with an action sequence & forward method

class opensoundscape.ml.datasets.AudioFileDataset(*args: Any, **kwargs: Any)[source]

Base class for audio datasets with OpenSoundscape (use in place of torch Dataset)

Custom Dataset classes should subclass this class or its children.

Datasets in OpenSoundscape contain a Preprocessor object which is responsible for the procedure of generating a sample for a given input. The DataLoader handles a dataframe of samples (and potentially labels) and uses a Preprocessor to generate samples from them.

Parameters:
  • samples

    the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index of (path,start_time,end_time) per clip, OR - a list or np.ndarray of audio file paths

    Notes for input dataframe:
    • df must have audio paths in the index.

    • If label_df has labels, the class names should be the columns, and

    the values of each row should be 0 or 1.
    • If data does not have labels, label_df will have no columns

  • preprocessor – an object of BasePreprocessor or its children which defines the operations to perform on input samples

Returns:

sample (AudioSample object)

Raises:

PreprocessingError if exception is raised during __getitem__

Effects:
self.invalid_samples will contain a set of paths that did not successfully

produce a list of clips with start/end times, if split_files_into_clips=True

class_counts()[source]

count number of each label

head(n=5)[source]

out-of-place copy of first n samples

performs df.head(n) on self.label_df

Parameters:
  • n – number of first samples to return, see pandas.DataFrame.head()

  • [default – 5]

Returns:

a new dataset object

sample(**kwargs)[source]

out-of-place random sample

creates copy of object with n rows randomly sampled from label_df

Args: see pandas.DataFrame.sample()

Returns:

a new dataset object

class opensoundscape.ml.datasets.AudioSplittingDataset(*args: Any, **kwargs: Any)[source]

class to load clips of longer files rather than one sample per file

Internally creates even-lengthed clips split from long audio files.

If file labels are provided, applies copied labels to all clips from a file

NOTE: If you’ve already created a dataframe with clip start and end times, you can use AudioFileDataset. This class is only necessary if you wish to automatically split longer files into clips (providing only the file paths).

Parameters:

make_clip_df (see AudioFileDataset and) –

CAM (Class Activation Maps)

Class activation maps (CAM) for OpenSoundscape models

class opensoundscape.ml.cam.CAM(base_image, activation_maps=None, gbp_maps=None)[source]

Object to hold and view Class Activation Maps, including guided backprop

Stores activation maps as .activation_maps, and guided backprop as .gbp_cams

each is a Series indexed by class

create_rgb_heatmaps(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), gbp_normalization_q=99)[source]

create rgb numpy array of heatmaps overlaid on the sample

Can choose a subset of classes and activation/backprop modes

Parameters:
  • class_subset – iterable of classes to visualize with activation maps - default None plots all classes - each item must be in the index of self.gbp_map / self.activation_maps - note that a class None is created by cnn.generate_cams() when classes are not specified during CNN.generate_cams()

  • mode – str selecting which maps to visualize, one of: ‘activation’ [default]: overlay activation map ‘backprop’: overlay guided back propogation result ‘backprop_and_activation’: overlay product of both maps None: do not overlay anything on the original sample

  • show_base – if False, does not plot the image of the original sample [default: True]

  • alpha – opacity of the activation map overlap [default: 0.5]

  • color_cycle – iterable of colors activation maps - cycles through the list using one color per class

  • gbp_normalization_q – guided backprop is normalized such that the q’th percentile of the map is 1. [default: 99]. This helps avoid gbp maps that are too dark to see. Lower values make brighter and noiser maps, higher values make darker and smoother maps.

Returns:

numpy array of shape [w, h, 3] representing the image with CAM heatmaps

plot(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), figsize=None, plt_show=True, save_path=None, gbp_normalization_q=99)[source]

Plot per-class activation maps, guided back propogations, or their products

Parameters:
  • class_subset – see create_rgb_heatmaps

  • mode – see create_rgb_heatmaps

  • show_base – see create_rgb_heatmaps

  • alpha – see create_rgb_heatmaps

  • color_cycle – see create_rgb_heatmaps

  • gbp_normalization_q – see create_rgb_heatmaps

  • figsize – the figure size for the plot [default: None]

  • plt_show – if True, runs plt.show() [default: True] - ignored if return_numpy=True

  • save_path – path to save image to [default: None does not save file]

Returns:

(fig, ax) of matplotlib figure, or np.array if return_numpy=True

Note: if base_image does not have 3 channels, channels are averaged then copied across 3 RGB channels to create a greyscale image

Note 2: If return_numpy is true, fig and ax are never created, it simply creates

a numpy array representing the image with the CAMs overlaid and returns it

Loss

loss function classes to use with opensoundscape models

class opensoundscape.ml.loss.BCEWithLogitsLoss_hot(*args: Any, **kwargs: Any)[source]

use pytorch’s nn.BCEWithLogitsLoss for one-hot labels by simply converting y from long to float

class opensoundscape.ml.loss.CrossEntropyLoss_hot(*args: Any, **kwargs: Any)[source]

use pytorch’s nn.CrossEntropyLoss for one-hot labels by converting labels from 1-hot to integer labels

throws a ValueError if labels are not one-hot

class opensoundscape.ml.loss.ResampleLoss(*args: Any, **kwargs: Any)[source]
opensoundscape.ml.loss.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None)[source]

helper function for BCE loss in ResampleLoss class

opensoundscape.ml.loss.reduce_loss(loss, reduction)[source]

Reduce loss as specified.

Parameters:
  • loss (Tensor) – Elementwise loss tensor.

  • reduction (str) – Options are “none”, “mean” and “sum”.

Returns:

Reduced loss tensor.

Return type:

Tensor

opensoundscape.ml.loss.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]

Apply element-wise weight and reduce loss.

Parameters:
  • loss (Tensor) – Element-wise loss.

  • weight (Tensor) – Element-wise weights.

  • reduction (str) – Same as built-in losses of PyTorch.

  • avg_factor (float) – Avarage factor when computing the mean of losses.

Returns:

Processed loss values.

Return type:

Tensor

Safe Dataset

Dataset wrapper to handle errors gracefully in Preprocessor classes

A SafeDataset handles errors in a potentially misleading way: If an error is raised while trying to load a sample, the SafeDataset will instead load a different sample. The indices of any samples that failed to load will be stored in ._invalid_indices.

The behavior may be desireable for training a model, but could cause silent errors when predicting a model (replacing a bad file with a different file), and you should always be careful to check for ._invalid_indices after using a SafeDataset.

based on an implementation by @msamogh in nonechucks (github.com/msamogh/nonechucks/)

class opensoundscape.ml.safe_dataset.SafeDataset(dataset, invalid_sample_behavior)[source]

A wrapper for a Dataset that handles errors when loading samples

WARNING: When iterating, will skip the failed sample, but when using within a DataLoader, finds the next good sample and uses it for the current index (see __getitem__).

Note that this class does not subclass DataSet. Instead, it contains a .dataset attribute that is a DataSet (or AudioFileDataset / AudioSplittingDataset, which subclass DataSet).

Parameters:
  • dataset – a torch Dataset instance or child such as AudioFileDataset, AudioSplittingDataset

  • eager_eval – If True, checks if every file is able to be loaded during initialization (logs _valid_indices and _invalid_indices)

Attributes: _vlid_indices and _invalid_indices can be accessed later to check which samples raised Exceptions. _invalid_samples is a set of all index values for samples that raised Exceptions.

__getitem__(index)[source]

If loading an index fails, keeps trying the next index until success

_safe_get_item()[source]

Tries to load a sample, returns None if error occurs

__iter__()[source]

generator that skips samples that raise errors when loading

report(log=None)[source]

write _invalid_samples to log file, give warning, & return _invalid_samples

Sample

Class for holding information on a single sample

class opensoundscape.sample.AudioSample(source, start_time=None, duration=None, labels=None, trace=None)[source]

A class containing information about a single audio sample

self.preprocessing_exception is intialized as None and will contain the exception raised during preprocessing if any exception occurs

property categorical_labels

list of indices with value==1 in self.labels

property end_time

calculate sample end time as start_time + duration

classmethod from_series(labels_series)[source]

initialize AudioSample from a pandas Series (optionally containing labels)

  • if series name (dataframe index) is tuple, extracts [‘file’,’start_time’,’end_time’]

these values to (source, start_time, duration=end_time-start_time) - otherwise, series name extracted as source; start_time and duraiton will be none

Extracts source (file), start_time, and end_time from multi-index pd.Series (one row of a pd.DataFrame with multi index [‘file’,’start_time’,’end_time’]). The argument series is saved as self.labels Creates an AudioSample object.

Parameters:

labels – a pd.Series with name = file path or [‘file’,’start_time’,’end_time’] and index as classes with 0/1 values as labels. Labels can have no values (just a name) if sample does not have labels.

class opensoundscape.sample.Sample(data=None)[source]

Class for holding information on a single sample

a Sample in OpenSoundscape contains information about a single sample, for instance its data and labels

Subclass this class to create Samples of specific types

opensoundscape.sample.collate_audio_samples_to_dict(samples)[source]

generate batched tensors of data and labels (in a dictionary)

returns collated samples: a dictionary with keys “samples” and “labels”

assumes that s.data is a Tensor and s.labels is a list/array for each sample S

Parameters:
  • samples – iterable of AudioSample objects (or other objects

  • list/array) (with attributes .data as Tensor and .labels as) –

Returns:

dictionary of {

“samples”:batched tensor of samples, “labels”: batched tensor of labels,

}

Sampling

classes for strategically sampling within a DataLoader

class opensoundscape.ml.sampling.ClassAwareSampler(*args: Any, **kwargs: Any)[source]

In each batch of samples, pick a limited number of classes to include and give even representation to each class

class opensoundscape.ml.sampling.ImbalancedDatasetSampler(*args: Any, **kwargs: Any)[source]

Samples elements randomly from a given list of indices for imbalanced dataset :param indices: a list of indices :type indices: list, optional :param num_samples: number of samples to draw :type num_samples: int, optional :param callback_get_label func: a callback-like function which takes two arguments:

dataset and index

Based on Imbalanced Dataset Sampling by davinnovation (https://github.com/ufoym/imbalanced-dataset-sampler)

Metrics

opensoundscape.metrics.multi_target_metrics(targets, scores, class_names, threshold)[source]

generate various metrics for a set of scores and labels (targets)

Parameters:
  • targets – 0/1 lables in 2d array

  • scores – continuous values in 2d array

  • class_names – list of strings

  • threshold – scores >= threshold result in prediction of 1, while scores < threshold result in prediction of 0

Returns:

dictionary of various overall and per-class metrics - precision, recall, F1 are np.nan if no 1-labels for a class - au_roc, avg_precision are np.nan if all labels are either 0 or 1

Definitions: - au_roc: area under the receiver operating characteristic curve - avg_precision: average precision (same as area under PR curve) - Jaccard: Jaccard similarity coefficient score (intersection over union) - hamming_loss: fraction of labels that are incorrectly predicted

Return type:

metrics_dict

opensoundscape.metrics.predict_multi_target_labels(scores, threshold)[source]

Generate boolean multi-target predicted labels from continuous scores

For each sample, each class score is compared to a threshold. Any class can be predicted 1 or 0, independent of other classes.

This function internally uses torch.Tensors to optimize performance

Note: threshold can be a single value or list of per-class thresholds

Parameters:
  • scores – 2d np.array, 2d list, 2d torch.Tensor, or pd.DataFrame containing continuous scores

  • threshold

    a number or list of numbers with a threshold for each class - if a single number, used as a threshold for all classes (columns) - if a list, length should match number of columns in scores. Each

    value in the list will be used as a threshold for each respective class (column).

Returns: 1/0 values with 1 if score exceeded threshold and 0 otherwise

See also: predict_single_target_labels

opensoundscape.metrics.predict_single_target_labels(scores)[source]

Generate boolean single target predicted labels from continuous scores

For each row, the single highest scoring class will be labeled 1 and all other classes will be labeled 0.

This function internally uses torch.Tensors to optimize performance

Parameters:

scores – 2d np.array, 2d list, 2d torch.Tensor, or pd.DataFrame containing continuous scores

Returns: boolean value where each row has 1 for the highest scoring class and 0 for all other classes. Returns same datatype as input.

See also: predict_multi_target_labels

opensoundscape.metrics.single_target_metrics(targets, scores)[source]

generate various metrics for a set of scores and labels (targets)

Predicts 1 for the highest scoring class per sample and 0 for all other classes.

Parameters:
  • targets – 0/1 lables in 2d array

  • scores – continuous values in 2d array

Returns:

dictionary of various overall and per-class metrics

Return type:

metrics_dict

Image Augmentation

Transforms and augmentations for PIL.Images

opensoundscape.preprocess.img_augment.time_split(img, seed=None)[source]

Given a PIL.Image, split into left/right parts and swap

Randomly chooses the slicing location For example, if h chosen

abcdefghijklmnop

^

hijklmnop + abcdefg

Parameters:

img – A PIL.Image

Returns:

A PIL.Image

Actions

Actions for augmentation and preprocessing pipelines

This module contains Action classes which act as the elements in Preprocessor pipelines. Action classes have go(), on(), off(), and set() methods. They take a single sample of a specific type and return the transformed or augmented sample, which may or may not be the same type as the original.

See the preprocessor module and Preprocessing tutorial for details on how to use and create your own actions.

class opensoundscape.preprocess.actions.Action(fn, is_augmentation=False, **kwargs)[source]

Action class for an arbitrary function

The function must take the sample as the first argument

Note that this allows two use cases: (A) regular function that takes an input object as first argument

eg. Audio.from_file(path,**kwargs)

  1. method of a class, which takes ‘self’ as the first argument, eg. Spectrogram.bandpass(self,**kwargs)

Other arguments are an arbitrary list of kwargs.

class opensoundscape.preprocess.actions.AudioClipLoader(**kwargs)[source]

Action to load clips from an audio file

Loads an audio file or part of a file to an Audio object. Will load entire audio file if _start_time and _end_time are None. If _start_time and _end_time are provided, loads the audio only in the specified interval.

see Audio.from_file() for documentation.

Parameters:

Audio.from_file() (see) –

class opensoundscape.preprocess.actions.AudioTrim(**kwargs)[source]

Action to trim/extend audio to desired length

Parameters:

actions.trim_audio (see) –

class opensoundscape.preprocess.actions.BaseAction[source]

Parent class for all Actions (used in Preprocessor pipelines)

New actions should subclass this class.

Subclasses should set self.requires_labels = True if go() expects (X,y) instead of (X). y is a row of a dataframe (a pd.Series) with index (.name) = original file path, columns=class names, values=labels (0,1). X is the sample, and can be of various types (path, Audio, Spectrogram, Tensor, etc). See Overlay for an example of an Action that uses labels.

set(**kwargs)[source]

only allow keys that exist in self.params

class opensoundscape.preprocess.actions.Overlay(is_augmentation=True, **kwargs)[source]

Action Class for augmentation that overlays samples on eachother

Overlay is a flavor of “mixup” augmentation, where two samples are overlayed on top of eachother. The samples are blended with a weighted average, where the weight may be chosen randomly from a range of values.

In this implementation, the overlayed samples are chosen from a dataframe of audio files and labels. The dataframe must have the audio file paths as the index, and the labels as columns. The labels are used to choose overlayed samples based on an “overlay_class” argument.

Parameters:
  • overlay_df – dataframe of audio files (index) and labels to use for overlay

  • update_labels (bool) – if True, labels of sample are updated to include labels of overlayed sample

  • criterion_fn – function that takes AudioSample and returns True or False - if True, perform overlay - if False, do not perform overlay Default is always_true, perform overlay on all samples

  • values (See overlay() for **kwargs and default) –

class opensoundscape.preprocess.actions.SpectrogramToTensor(fn=<function Spectrogram.to_image>, is_augmentation=False, **kwargs)[source]

Action to create Tesnsor of desired shape from Spectrogram

calls .to_image on sample.data, which should be type Spectrogram

**kwargs are passed to Spectrogram.to_image()

go(sample, **kwargs)[source]

converts sample.data from Spectrogram to Tensor

opensoundscape.preprocess.actions.audio_add_noise(audio, noise_dB=-30, signal_dB=0, color='white')[source]

Generates noise and adds to audio object

Parameters:
  • audio – an Audio object

  • noise_dB – number or range: dBFS of noise signal generated - if number, crates noise with dB dBFS level - if (min,max) tuple, chooses noise dBFS randomly from range with a uniform distribution

  • signal_dB – dB (decibels) gain to apply to the incoming Audio before mixing with noise [default: -3 dB] - like noise_dB, can specify (min,max) tuple to use random uniform choice in range

Returns: Audio object with noise added

opensoundscape.preprocess.actions.audio_random_gain(audio, dB_range=(-30, 0), clip_range=(-1, 1))[source]

Applies a randomly selected gain level to an Audio object

Gain is selected from a uniform distribution in the range dB_range

Parameters:
  • audio – an Audio object

  • dB_range – (min,max) decibels of gain to apply - dB gain applied is chosen from a uniform random distribution in this range

Returns: Audio object with gain applied

opensoundscape.preprocess.actions.frequency_mask(tensor, max_masks=3, max_width=0.2)[source]

add random horizontal bars over Tensor

Parameters:
  • tensor – input Torch.tensor sample

  • max_masks – max number of horizontal bars [default: 3]

  • max_width – maximum size of horizontal bars as fraction of sample height

Returns:

augmented tensor

opensoundscape.preprocess.actions.image_to_tensor(img, greyscale=False)[source]

Convert PIL image to RGB or greyscale Tensor (PIL.Image -> Tensor)

convert PIL.Image w/range [0,255] to torch Tensor w/range [0,1]

Parameters:
  • img – PIL.Image

  • greyscale – if False, converts image to RGB (3 channels). If True, converts image to one channel.

opensoundscape.preprocess.actions.overlay(sample, overlay_df, update_labels, overlay_class=None, overlay_prob=1, max_overlay_num=1, overlay_weight=0.5, criterion_fn=<function always_true>)[source]

iteratively overlay 2d samples on top of eachother

Overlays (blends) image-like samples from overlay_df on top of the sample with probability overlay_prob until stopping condition. If necessary, trims overlay audio to the length of the input audio.

Optionally provide criterion_fn which takes sample and returns True/False to determine whether to perform overlay on this sample.

Overlays can be used in a few general ways:
  1. a separate df where any file can be overlayed (overlay_class=None)

  2. same df as training, where the overlay class is “different” ie,

    does not contain overlapping labels with the original sample

  3. same df as training, where samples from a specific class are used

    for overlays

Parameters:
  • sample – AudioSample with .labels: labels of the original sample and .preprocessor: the preprocessing pipeline

  • overlay_df – a labels dataframe with audio files as the index and classes as columns

  • update_labels – if True, add overlayed sample’s labels to original sample

  • overlay_class

    how to choose files from overlay_df to overlay Options [default: “different”]: None - Randomly select any file from overlay_df “different” - Select a random file from overlay_df containing none

    of the classes this file contains

    specific class name - always choose files from this class

  • overlay_prob – the probability of applying each subsequent overlay

  • max_overlay_num

    the maximum number of samples to overlay on original - for example, if overlay_prob = 0.5 and max_overlay_num=2,

    1/2 of samples will recieve 1 overlay and 1/4 will recieve an additional second overlay

  • overlay_weight – a float > 0 and < 1, or a list of 2 floats [min, max] between which the weight will be randomly chosen. e.g. [0.1,0.7] An overlay_weight <0.5 means more emphasis on original sample.

  • criterion_fn – function that takes AudioSample and returns True or False - if True, perform overlay - if False, do not perform overlay Default is always_true, perform overlay on all samples

Returns:

overlayed sample, (possibly updated) labels

Example

check if sample is from a xeno canto file (has “XC” in name), and only perform overlay on xeno canto files ``` def is_xc(audio_sample):

return “XC” in Path(audio_sample.source).stem

s=overlay(s, overlay_df, False, criterion_fn=is_xc) ```

opensoundscape.preprocess.actions.scale_tensor(tensor, input_mean=0.5, input_std=0.5)[source]

linear scaling of tensor values using torch.transforms.Normalize

(Tensor->Tensor)

WARNING: This does not perform per-image normalization. Instead, it takes as arguments a fixed u and s, ie for the entire dataset, and performs X=(X-input_mean)/input_std.

Parameters:
  • input_mean – mean of input sample pixels (average across dataset)

  • input_std – standard deviation of input sample pixels (average across dataset)

  • sd ((these are NOT the target mu and) –

  • img (but the original mu and sd of) –

  • mu=0 (for which the output will have) –

  • std=1)

Returns:

modified tensor

opensoundscape.preprocess.actions.tensor_add_noise(tensor, std=1)[source]

Add gaussian noise to sample (Tensor -> Tensor)

Parameters:

std – standard deviation for Gaussian noise [default: 1]

Note: be aware that scaling before/after this action will change the effect of a fixed stdev Gaussian noise

opensoundscape.preprocess.actions.time_mask(tensor, max_masks=3, max_width=0.2)[source]

add random vertical bars over sample (Tensor -> Tensor)

Parameters:
  • tensor – input Torch.tensor sample

  • max_masks – maximum number of vertical bars [default: 3]

  • max_width – maximum size of bars as fraction of sample width

Returns:

augmented tensor

opensoundscape.preprocess.actions.torch_color_jitter(tensor, brightness=0.3, contrast=0.3, saturation=0.3, hue=0)[source]

Wraps torchvision.transforms.ColorJitter

(Tensor -> Tensor) or (PIL Img -> PIL Img)

Parameters:
  • tensor – input sample

  • brightness=0.3

  • contrast=0.3

  • saturation=0.3

  • hue=0

Returns:

modified tensor

opensoundscape.preprocess.actions.torch_random_affine(tensor, degrees=0, translate=(0.3, 0.1), fill=0)[source]

Wraps for torchvision.transforms.RandomAffine

(Tensor -> Tensor) or (PIL Img -> PIL Img)

Parameters:
  • tensor – torch.Tensor input saple

  • 0 (degrees =) –

  • = (translate) –

  • 0-255 (fill =) –

  • channels (duplicated across) –

Returns:

modified tensor

Note: If applying per-image normalization, we recommend applying RandomAffine after image normalization. In this case, an intermediate gray value is ~0. If normalization is applied after RandomAffine on a PIL image, use an intermediate fill color such as (122,122,122).

opensoundscape.preprocess.actions.trim_audio(sample, extend=True, random_trim=False, tol=1e-05)[source]

trim audio clips (Audio -> Audio)

Trims an audio file to desired length Allows audio to be trimmed from start or from a random time Optionally extends audio shorter than clip_length with silence

Parameters:
  • sample – AudioSample with .data=Audio object, .duration as sample duration

  • extend – if True, clips shorter than sample.duration are extended with silence to required length

  • random_trim – if True, chooses a random segment of length sample.duration from the input audio. If False, the file is trimmed from 0 seconds to sample.duration seconds.

  • tol – tolerance for considering a clip to be of the correct length (sec)

Returns:

trimmed audio

Preprocessors

Preprocessor classes: tools for preparing and augmenting audio samples

class opensoundscape.preprocess.preprocessors.AudioPreprocessor(sample_duration, sample_rate)[source]

Child of BasePreprocessor that only loads audio and resamples

Parameters:
  • sample_duration – length in seconds of audio samples generated

  • sample_rate – target sample rate. [default: None] does not resample

class opensoundscape.preprocess.preprocessors.BasePreprocessor(sample_duration=None)[source]

Class for defining an ordered set of Actions and a way to run them

Custom Preprocessor classes should subclass this class or its children

Preprocessors have one job: to transform samples from some input (eg a file path) to some output (eg an AudioSample with .data as torch.Tensor) using a specific procedure defined by the .pipeline attribute. The procedure consists of Actions ordered by the Preprocessor’s .pipeline. Preprocessors have a forward() method which sequentially applies the Actions in the pipeline to produce a sample.

Parameters:
  • action_dict – dictionary of name:Action actions to perform sequentially

  • sample_duration – length of audio samples to generate (seconds)

forward(sample, break_on_type=None, break_on_key=None, bypass_augmentations=False, trace=False)[source]

perform actions in self.pipeline on a sample (until a break point)

Actions with .bypass = True are skipped. Actions with .is_augmentation = True can be skipped by passing bypass_augmentations=True.

Parameters:
  • sample – any of - (path, start time) tuple - pd.Series with (file, start_time, end_time) as .name (eg index of a pd.DataFrame from which row was taken) - AudioSample object

  • break_on_type – if not None, the pipeline will be stopped when it reaches an Action of this class. The matching action is not performed.

  • break_on_key – if not None, the pipeline will be stopped when it reaches an Action whose index equals this value. The matching action is not performed.

  • clip_times

    can be either - None: the file is treated as a single sample - dictionary {“start_time”:float,”end_time”:float}:

    the start and end time of clip in audio

  • bypass_augmentations – if True, actions with .is_augmentatino=True are skipped

  • trace (boolean - default False) – if True, saves the output of each pipeline step in the sample_info output argument - should be utilized for analysis/debugging on samples of interest

Returns:

sample (instance of AudioSample class)

insert_action(action_index, action, after_key=None, before_key=None)[source]

insert an action in specific specific position

This is an in-place operation

Inserts a new action before or after a specific key. If after_key and before_key are both None, action is appended to the end of the index.

Parameters:
  • action_index – string key for new action in index

  • action – the action object, must be subclass of BaseAction

  • after_key – insert the action immediately after this key in index

  • before_key – insert the action immediately before this key in index Note: only one of (after_key, before_key) can be specified

remove_action(action_index)[source]

alias for self.drop(…,inplace=True), removes an action

This is an in-place operation

Parameters:

action_index – index of action to remove

class opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor(sample_duration, overlay_df=None, height=None, width=None, channels=1)[source]

Child of BasePreprocessor that creates specrogram Tensors w/augmentation

loads audio, creates spectrogram, performs augmentations, creates tensor

by default, does not resample audio, but bandpasses to 0-11.025 kHz (to ensure all outputs have same scale in y-axis) can change with .pipeline.bandpass.set(min_f=,max_f=)

Parameters:
  • sample_duration – length in seconds of audio samples generated If not None, longer clips are trimmed to this length. By default, shorter clips will be extended (modify random_trim_audio and trim_audio to change behavior).

  • overlay_df – if not None, will include an overlay action drawing samples from this df

  • height – height of output sample (frequency axis) - default None will use the original height of the spectrogram

  • width – width of output sample (time axis) - default None will use the originalwidth of the spectrogram

  • channels – number of channels in output sample (default 1)

preprocessors.utils

Utilities for preprocessing

exception opensoundscape.preprocess.utils.PreprocessingError[source]

Custom exception indicating that a Preprocessor pipeline failed

opensoundscape.preprocess.utils.get_args(func)[source]

get list of arguments and default values from a function

opensoundscape.preprocess.utils.get_reqd_args(func)[source]

get list of required arguments and default values from a function

opensoundscape.preprocess.utils.show_tensor(tensor, channel=None, transform_from_zero_centered=True, invert=False)[source]

helper function for displaying a sample as an image

Parameters:
  • tensor – torch.Tensor of shape [c,w,h] with values centered around zero

  • channel – specify an integer to plot only one channel, otherwise will attempt to plot all channels

  • transform_from_zero_centered – if True, transforms values from [-1,1] to [0,1]

  • invert – if true, flips value range via x=1-x

opensoundscape.preprocess.utils.show_tensor_grid(tensors, columns, channel=None, transform_from_zero_centered=True, invert=False, labels=None)[source]

create image of nxn tensors

Parameters:
  • tensors – list of samples

  • columns – number of columns in grid

  • labels – title of each subplot

  • args (for other) –

  • show_tensor() (see) –

Tensor Augment

Augmentations and transforms for torch.Tensors

opensoundscape.preprocess.tensor_augment.freq_mask(spec, F=30, max_masks=3, replace_with_zero=False)[source]

draws horizontal bars over the image

Parameters:
  • spec – a torch.Tensor representing a spectrogram

  • F – maximum frequency-width of bars in pixels

  • max_masks – maximum number of bars to draw

  • replace_with_zero – if True, bars are 0s, otherwise, mean img value

Returns:

Augmented tensor

opensoundscape.preprocess.tensor_augment.time_mask(spec, T=40, max_masks=3, replace_with_zero=False)[source]

draws vertical bars over the image

Parameters:
  • spec – a torch.Tensor representing a spectrogram

  • T – maximum time-width of bars in pixels

  • max_masks – maximum number of bars to draw

  • replace_with_zero – if True, bars are 0s, otherwise, mean img value

Returns:

Augmented tensor

RIBBIT

Detect periodic vocalizations with RIBBIT

This module provides functionality to search audio for periodically fluctuating vocalizations.

opensoundscape.ribbit.calculate_pulse_score(amplitude, amplitude_sample_rate, pulse_rate_range, plot=False, nfft=1024)[source]

Search for amplitude pulsing in an audio signal in a range of pulse repetition rates (PRR)

scores an audio amplitude signal by highest value of power spectral density in the PRR range

Parameters:
  • amplitude – a time series of the audio signal’s amplitude (for instance a smoothed raw audio signal)

  • amplitude_sample_rate – sample rate in Hz of amplitude signal, normally ~20-200 Hz

  • pulse_rate_range – [min, max] values for amplitude modulation in Hz

  • plot=False – if True, creates a plot visualizing the power spectral density

  • nfft=1024 – controls the resolution of the power spectral density (see scipy.signal.welch)

Returns:

pulse rate score for this audio segment (float)

opensoundscape.ribbit.ribbit(spectrogram, signal_band, pulse_rate_range, clip_duration, clip_overlap=0, final_clip=None, noise_bands=None, spec_clip_range=(-100, -20), plot=False)[source]

Run RIBBIT detector to search for periodic calls in audio

Searches for periodic energy fluctuations at specific repetition rates and frequencies.

Parameters:
  • spectrogram – opensoundscape.Spectrogram object of an audio file

  • signal_band – [min, max] frequency range of the target species, in Hz

  • pulse_rate_range – [min,max] pulses per second for the target species

  • clip_duration – the length of audio (in seconds) to analyze at one time - each clip is analyzed independently and recieves a ribbit score

  • clip_overlap (float) – overlap between consecutive clips (sec)

  • final_clip (str) –

    behavior if final clip is less than clip_duration seconds long. By default, discards remaining audio if less than clip_duration seconds long [default: None]. Options: - None: Discard the remainder (do not make a clip) - “remainder”: Use only remainder of Audio (final clip will be shorter than

    clip_duration)

    • ”full”: Increase overlap with previous clip to yield a clip with

      clip_duration length

    Note that the “extend” option is not supported for RIBBIT.

  • noise_bands – list of frequency ranges to subtract from the signal_band For instance: [ [min1,max1] , [min2,max2] ] - if None, no noise bands are used - default: None

  • spec_clip_range – tuple of (low,high) spectrogram values. The values in spectrogram will be clipped to this range (spectrogram.limit_range()) - Default of (-100,-20) matches default decibel_limits parameter of earlier opensoundscape versions, which clipped spectrogram values to this range when the spectrogram was initialized.

  • plot=False – if True, plot the power spectral density for each clip

Returns:

DataFrame with columns [‘start_time’,’end_time’,’score’], with a row for each clip.

Notes

__PARAMETERS__ RIBBIT requires the user to select a set of parameters that describe the target vocalization. Here is some detailed advice on how to use these parameters.

Signal Band: The signal band is the frequency range where RIBBIT looks for the target species. It is best to pick a narrow signal band if possible, so that the model focuses on a specific part of the spectrogram and has less potential to include erronious sounds.

Noise Bands: Optionally, users can specify other frequency ranges called noise bands. Sounds in the noise_bands are _subtracted_ from the signal_band. Noise bands help the model filter out erronious sounds from the recordings, which could include confusion species, background noise, and popping/clicking of the microphone due to rain, wind, or digital errors. It’s usually good to include one noise band for very low frequencies – this specifically eliminates popping and clicking from being registered as a vocalization. It’s also good to specify noise bands that target confusion species. Another approach is to specify two narrow noise_bands that are directly above and below the signal_band.

Pulse Rate Range: This parameters specifies the minimum and maximum pulse rate (the number of pulses per second, also known as pulse repetition rate) RIBBIT should look for to find the focal species. For example, choosing pulse_rate_range = [10, 20] means that RIBBIT should look for pulses no slower than 10 pulses per second and no faster than 20 pulses per second.

Clip Duration: The clip_duration parameter tells RIBBIT how many seconds of audio to analyze at one time. Generally, you should choose a clip_length that is similar to the length of the target species vocalization, or a little bit longer. For very slowly pulsing vocalizations, choose a longer window so that at least 5 pulses can occur in one window (0.5 pulses per second -> 10 second window). Typical values for are 0.3 to 10 seconds. Also, clip_overlap can be used for overlap between sequential clips. This is more computationally expensive but will be more likely to center a target sound in the clip (with zero overlap, the target sound may be split up between adjacent clips).

Plot: We can choose to show the power spectrum of pulse repetition rate for each window by setting plot=True. The default is not to show these plots (plot=False).

__ALGORITHM__ This is the procedure RIBBIT follows: divide the audio into segments of length clip_duration for each clip:

  • calculate time series of energy in signal band (signal_band) and subtract noise band

energies (noise_bands) - calculate power spectral density of the amplitude time series - score the file based on the max value of power spectral density in the pulse rate range

Signal Processing

Signal processing tools for feature extraction and more

opensoundscape.signal_processing.cwt_peaks(audio, center_frequency, wavelet='morl', peak_threshold=0.2, peak_separation=None, plot=False)[source]

compute a cwt, post-process, then extract peaks

Performs a continuous wavelet transform (cwt) on an audio signal at a single frequency. It then squares, smooths, and normalizes the signal. Finally, it detects peaks in the resulting signal and returns the times and magnitudes of detected peaks. It is used as a feature extractor for Ruffed Grouse drumming detection.

Parameters:
  • audio – an Audio object

  • center_frequency – the target frequency to extract peaks from

  • wavelet – (str) name of a pywt wavelet, eg ‘morl’ (see pywt docs)

  • peak_threshold – minimum height of peaks - if None, no minimum peak height - see “height” argument to scipy.signal.find_peaks

  • peak_separation – minimum time between detected peaks, in seconds - if None, no minimum distance - see “distance” argument to scipy.signal.find_peaks

Returns:

list of times (from beginning of signal) of each peak peak_levels: list of magnitudes of each detected peak

Return type:

peak_times

Note

consider downsampling audio to reduce computational cost. Audio must have sample rate of at least 2x target frequency.

opensoundscape.signal_processing.detect_peak_sequence_cwt(audio, sample_rate=400, window_len=60, center_frequency=50, wavelet='morl', peak_threshold=0.2, peak_separation=0.0375, dt_range=(0.05, 0.8), dy_range=(-0.2, 0), d2y_range=(-0.05, 0.15), max_skip=3, duration_range=(1, 15), points_range=(9, 100), plot=False)[source]

Use a continuous wavelet transform to detect accellerating sequences

This function creates a continuous wavelet transform (cwt) feature and searches for accelerating sequences of peaks in the feature. It was developed to detect Ruffed Grouse drumming events in audio signals. Default parameters are tuned for Ruffed Grouse drumming detection.

Analysis is performed on analysis windows of fixed length without overlap. Detections from each analysis window across the audio file are aggregated.

Parameters:
  • audio – Audio object

  • sample_rate=400 – resample audio to this sample rate (Hz)

  • window_len=60 – length of analysis window (sec)

  • center_frequency=50 – target audio frequency of cwt

  • wavelet='morl' – (str) pywt wavelet name (see pywavelets docs)

  • peak_threshold=0.2 – height threhsold (0-1) for peaks in normalized signal

  • peak_separation=15/400 – min separation (sec) for peak finding

  • dt_range= (0.05, 0.8) – sequence detection point-to-point criterion 1 - Note: the upper limit is also used as sequence termination criterion 2

  • dy_range= (-0.2, 0) – sequence detection point-to-point criterion 2

  • d2y_range= (-0.05, 0.15) – sequence detection point-to-point criterion 3

  • max_skip=3 – sequence termination criterion 1: max sequential invalid points

  • duration_range= (1, 15) – sequence criterion 1: length (sec) of sequence

  • points_range= (9, 100) – sequence criterion 2: num points in sequence

  • plot=False – if True, plot peaks and detected sequences with pyplot

Returns:

dataframe summarizing detected sequences

Note: for Ruffed Grouse drumming, which is very low pitched, audio is resampled to 400 Hz. This greatly increases the efficiency of the cwt, but will only detect frequencies up to 400/2=200Hz. Generally, choose a resample frequency as low as possible but >=2x the target frequency

Note: the cwt signal is normalized on each analysis window, so changing the analysis window size can change the detection results.

Note: if there is an incomplete window remaining at the end of the audio file, it is discarded (not analyzed).

opensoundscape.signal_processing.find_accel_sequences(t, dt_range=(0.05, 0.8), dy_range=(-0.2, 0), d2y_range=(-0.05, 0.15), max_skip=3, duration_range=(1, 15), points_range=(5, 100))[source]

detect accelerating/decelerating sequences in time series

developed for deteting Ruffed Grouse drumming events in a series of peaks extracted from cwt signal

The algorithm computes the forward difference of t, y(t). It iterates through the [y(t), t] points searching for sequences of points that meet a set of conditions. It begins with an empty candidate sequence.

“Point-to-point criterea”: Valid ranges for dt, dy, and d2y are checked for each subsequent point and are based on previous points in the candidate sequence. If they are met, the point is added to the candidate sequence.

“Continuation criterea”: Conditions for max_skip and the upper bound of dt are used to determine when a sequence should be terminated.

  • max_skip: max number of sequential invalid points before terminating

  • dt<=dt_range[1]: if dt is long, sequence should be broken

“Sequence criterea”: When a sequence is terminated, it is evaluated on conditions for duration_range and points_range. If it meets these conditions, it is saved as a detected sequence.

  • duration_range: length of sequence in seconds from first to last point

  • points_range: number of points included in sequence

When a sequence is terminated, the search continues with the next point and an empty sequence.

Parameters:
  • t – (list or np.array) times of all detected peaks (seconds)

  • dt_range= (0.05,0.8) – valid values for t(i) - t(i-1)

  • dy_range= (-0.2,0) – valid values for change in y (grouse: difference in time between consecutive beats should decrease)

  • d2y_range= (-.05,.15) – limit change in dy: should not show large decrease (sharp curve downward on y vs t plot)

  • max_skip=3 – max invalid points between valid points for a sequence (grouse: should not have many noisy points between beats)

  • duration_range= (1,15) – total duration of sequence (sec)

  • points_range= (9,100) – total number of points in sequence

Returns:

lists of t and y for each detected sequence

Return type:

sequences_t, sequences_y

opensoundscape.signal_processing.frequency2scale(frequency, wavelet, sample_rate)[source]

determine appropriate wavelet scale for desired center frequency

Parameters:
  • frequency – desired center frequency of wavelet in Hz (1/seconds)

  • wavelet – (str) name of pywt wavelet, eg ‘morl’ for Morlet

  • sample_rate – sample rate in Hz (1/seconds)

Returns:

(float) scale parameter for pywt.ctw() to extract desired frequency

Return type:

scale

Note: this function is not exactly an inverse of pywt.scale2frequency(), because that function returns frequency in sample-units (cycles/sample) rather than frequency in Hz (cycles/second). In other words, freuquency_hz = pywt.scale2frequency(w,scale)*sr.

opensoundscape.signal_processing.gcc(x, y, cc_filter='phat', epsilon=0.001)[source]

Generalized cross correlation of two signals

Computes a generalized cross correlation in frequency response.

The generalized cross correlation algorithm described in Knapp and Carter [1].

In the case of cc_filter=’cc’, gcc simplifies to cross correlation and is equivalent to scipy.signal.correlate and numpy.correlate.

code adapted from github.com/axeber01/ngcc

Parameters:
  • x – 1d numpy array of audio samples

  • y – 1d numpy array of audio samples

  • cc_filter – which filter to use in the gcc. ‘phat’ - Phase transform. Default. ‘roth’ - Roth correlation (1971) ‘scot’ - Smoothed Coherence Transform, ‘ht’ - Hannan and Thomson ‘cc’ - normal cross correlation with no filter ‘cc_norm’ - normal cross correlation normalized by the length and amplitude of the signal

  • epsilon – small value used to ensure denominator when applying a filter is non-zero.

Returns:

1d numpy array of gcc values

Return type:

gcc

see also: tdoa() uses this function to estimate time delay between two signals

[1] Knapp, C.H. and Carter, G.C (1976) The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. Acoust. Speech Signal Process, 24, 320-327. http://dx.doi.org/10.1109/TASSP.1976.1162830

opensoundscape.signal_processing.tdoa(signal, reference_signal, max_delay, cc_filter='phat', sample_rate=1, return_max=False)[source]

Estimate time difference of arrival between two signals

estimates time delay by finding the maximum of the generalized cross correlation (gcc) of two signals. The two signals are discrete-time series with the same sample rate.

Only the central portion of the signal, from max_delay after the start and max_delay before the end, is used for the calculation. All of the reference signal is used. This means that tdoa(sig, ref_sig, max_delay) will not necessarily be the same as -tdoa(ref_sig, sig, max_delay

For example, if the signal arrives 2.5 seconds _after_ the reference signal, returns 2.5; if it arrives 0.5 seconds _before_ the reference signal, returns -0.5.

Parameters:
  • signal – np.array or list object containing the signal of interest

  • reference_signal – np.array or list containing the reference signal. Both audio recordings must be time-synchronized.

  • max_delay – maximum possible tdoa (seconds) between the two signals. Cannot be longer than 1/2 the duration of the signal.

  • +max_delay. (The tdoa returned will be between -max_delay and) –

  • cc_filter – see gcc()

  • sample_rate – sample rate (Hz) of signals; both signals must have same sample rate

  • return_max

    if True, returns the maximum value of the generalized cross correlation

    For example, if max_delay=0.5, the tdoa returned will be the delay between -0.5 and +0.5 seconds, that maximizes the cross-correlation. This is useful if you know the maximum possible delay between the two signals, and want to ignore any tdoas outside of that range. e.g. if receivers are 100m apart, and the speed of sound is 340m/s, then the maximum possible delay is 0.294 seconds.

Returns:

estimated delay from reference signal to signal, in seconds (note that default samping rate is 1.0 samples/second)

if return_max is True, returns a second value, the maximum value of the result of generalized cross correlation

See also: gcc() if you want the raw output of generalized cross correlation

opensoundscape.signal_processing.thresholded_event_durations(x, threshold, normalize=False, sample_rate=1)[source]

Detect positions and durations of events over threshold in 1D signal

This function takes a 1D numeric vector and searches for segments that are continuously greater than a threshold value. The input signal can optionally be normalized, and if a sample rate is provided the start positions will be in the units of [sr]^-1 (ie if sr is Hz, start positions will be in seconds).

Parameters:
  • x – 1d input signal, a vector of numeric values

  • threshold – minimum value of signal to be a detection

  • normalize – if True, performs x=x/max(x)

  • sample_rate – sample rate of input signal

Returns:

start time of each detected event durations: duration (# samples/sr) of each detected event

Return type:

start_times

Localization

Tools for localizing audio events from synchronized recording arrays

class opensoundscape.localization.SpatialEvent(receiver_files, receiver_locations, max_delay, min_n_receivers=3, start_time=0, start_timestamp=None, duration=None, class_name=None, bandpass_range=None, cc_threshold=None, cc_filter=None, speed_of_sound=343)[source]

Class that estimates the location of a single sound event

Uses receiver locations and time-of-arrival of sounds to estimate sound source location

estimate_location(localization_algorithm='gillette', use_stored_tdoas=True, return_self=False)[source]

Estimate spatial location of this event. This method first estimates the time delays (TDOAS) using cross-correlation, then estimates the location from those TDOAS. Localization is performed in 2d or 3d according to the dimensions of self.receiver_locations (x,y) or (x,y,z) Note: if self.tdoas or self.receiver_locations is None, first calls self._estimate_delays() to estimate the time delays.

If you want to change some parameters of the localization (e.g. try a different cc_threshold, or bandpass_range), you can set the appropriate attribute (e.g. self.cc_threshold = 0.01) before calling self.estimate_location().

Parameters:
  • localization_algorithm (-) – algorithm to use for estimating the location of a sound event from the locations and time delays of a set of detections. Options are ‘gillette’ or ‘soundfinder’. Default is ‘gillette’.

  • use_stored_tdoas (-) – if True, uses the tdoas stored in self.tdoas to estimate the location. If False, first calls self._estimate_delays() to estimate the tdoas. default: True

  • return_self (-) – if True, returns the SpatialEvent object itself. This is used under the hood for parallelization.

Returns:

meters)

Return type:

Location estimate as cartesian coordinates (x,y) or (x,y,z) (units

Effects:

sets the value of self.location_estimate to the same value as the returned location

class opensoundscape.localization.SynchronizedRecorderArray(file_coords, start_timestamp=None, speed_of_sound=343)[source]

Class with utilities for localizing sound events from array of recorders

localize_detections()[source]

Attempt to localize a sound event for each detection of each class. First, creates candidate events with: create_candidate_events()

Create SpatialEvent objects for all simultaneous, spatially clustered detections of a class

Then, attempts to localize each candidate event via time delay of arrival information: For each candidate event:

  • calculate relative time of arrival with generalized cross correlation (event.estimate_delays())

  • if enough cross correlation values exceed a threshold, attempt to localize the event

    using the time delays and spatial locations of each receiver with event.estimate_location()

  • if the residual distance rms value is below a cutoff threshold, consider the event

    to be successfully localized

check_files_missing_coordinates(detections)[source]

Check that all files in detections have coordinates in file_coords :returns:

  • a list of files that are in detections but not in file_coords

create_candidate_events(detections, min_n_receivers, max_receiver_dist, cc_threshold, bandpass_ranges, cc_filter, max_delay=None)[source]

Takes the detections dictionary and groups detections that are within max_receiver_dist of each other. :param detections: a dictionary of detections, with multi-index (file,start_time,end_time), and

one column per class with 0/1 values for non-detection/detection The times in the index imply the same real world time across all files: eg 0 seconds assumes that the audio files all started at the same time, not on different dates/times

Parameters:
  • min_n_receivers – if fewer nearby receivers have a simultaneous detection, do not create candidate event

  • max_receiver_dist – the maximum distance between recorders to consider a detection as a single event

  • bandpass_ranges – dictionary of form {“class name”: [low_f, high_f]} for audio bandpass filtering during

  • max_delay – the maximum delay (in seconds) to consider between receivers for a single event if None, defaults to max_receiver_dist / SPEED_OF_SOUND

Returns:

a list of SpatialEvent objects to attempt to localize

localize_detections(detections, max_receiver_dist, localization_algorithm='gillette', max_delay=None, min_n_receivers=3, cc_threshold=0, cc_filter='phat', bandpass_ranges=None, residual_threshold=numpy.inf, return_unlocalized=False, num_workers=1)[source]

Attempt to localize locations for all detections

Algorithm

The user provides a table of class detections from each recorder with timestamps. The object’s self.file_coords dataframe contains a table listing the spatial location of the recorder for each unique audio file in the table of detections. The audio recordings must be synchronized such that timestamps from each recording correspond to the exact same real-world time.

Localization of sound events proceeds in four steps:

  1. Grouping of detections into candidate events (self.create_candidate_events()):

    Simultaneous and spatially clustered detections of a class are selected as targets for localization of a single real-world sound event.

    For each detection of a species, the grouping algorithm treats the reciever with the detection as a “reference receiver”, then selects all detections of the species at the same time and within max_receiver_dist of the reference reciever (the “surrounding detections”). This selected group of simulatneous, spatially-clustered detections of a class beomes one “candidate event” for subsequent localization.

    If the number of recorders in the candidate event is fewer than min_n_receivers, the candidate event is discarded.

    This step creates a highly redundant set of candidate events to localize, because each detection is treated separately with its recorder as the ‘reference recorder’. Thus, the localized events created by this algorithm may contain multiple instances representing the same real-world sound event.

  2. Estimate time delays with cross correlation:

    For each candidate event, the time delay between the reference reciever’s detection and the surrounding recorders’ detections is estimated through generalized cross correlation.

    If bandpass_ranges are provided, cross correlation is performed on audio that has been bandpassed to class-specific low and high frequencies.

    If the max value of the cross correlation is below cc_threshold, the corresponding time delay is discarded and not used during localization. This provides a way of filtering out undesired time delays that do not correspond to two recordings of the same sound event.

    If the number of estimated time delays in the candidate event is fewer than min_n_receivers after filtering by cross correlation threshold, the candidate event is discarded.

  3. Estimate locations

    The location of the event is estimated based on the locations and time delays of each detection.

    location estimation from the locations and time delays at a set of receivers is performed using one of two algorithms, described in localization_algorithm below.

  4. Filter by spatial residual error

    The residual errors represent descrepencies between (a) time of arrival of the event at a reciever and (b) distance from reciever to estimated location.

    Estimated locations are discarded if the root mean squared spatial residual is greater than residual_rms_threshold

param detections:

a dictionary of detections, with multi-index (file,start_time,end_time), and one column per class with 0/1 values for non-detection/detection The times in the index imply the same real world time across all files: eg 0 seconds assumes that the audio files all started at the same time, not on different dates/times

param max_receiver_dist:

float (meters) Radius around a recorder in which to use other recorders for localizing an event. Simultaneous detections at receivers within this distance (meters) of a receiver with a detection will be used to attempt to localize the event.

param max_delay:

float, optional Maximum absolute value of time delay estimated during cross correlation of two signals For instance, 0.2 means that the maximal cross-correlation in the range of delays between -0.2 to 0.2 seconds will be used to estimate the time delay. if None (default), the max delay is set to max_receiver_dist / SPEED_OF_SOUND

param min_n_receivers:

int Minimum number of receivers that must detect an event for it to be localized [default: 3]

param localization_algorithm:

str, optional algorithm to use for estimating the location of a sound event from the locations and time delays of a set of detections. [Default: ‘gillette’] Options:

  • ‘gillette’: linear closed-form algorithm of Gillette and Silverman 2008 [1]

  • ‘soundfinder’: GPS location algorithm of Wilson et al. 2014 [2]

param cc_threshold:

float, optional Threshold for cross correlation: if the max value of the cross correlation is below this value, the corresponding time delay is discarded and not used during localization. Default of 0 does not discard any delays.

param cc_filter:

str, optional Filter to use for generalized cross correlation. See signalprocessing.gcc function for options. Default is “phat”.

param bandpass_ranges:

dict, optional Dictionary of form {“class name”: [low_f, high_f]} for audio bandpass filtering during cross correlation. [Default: None] does not bandpass audio. Bandpassing audio to the frequency range of the relevant sound is recommended for best cross correlation results.

param residual_threshold:

discard localized events if the root mean squared residual of the TDOAs exceeds this value (distance in meters) [default: np.inf does not filter out any events by residual]

param return_unlocalized:
bool, optional. If True, returns the unlocalized events as well. These are events that were not successfully localized.

for example because too few receivers had detections, or too few receivers passed the cc_threshold, or the TDOA residuals were too high.

Two lists [localized_events, unlocalized events] will be returned.

param num_workers:

int, optional. Number of workers to use for parallelization. Default is 1 (no parallelization)

returns:

A list of localized events, each of which is a SpatialEvent object. If return_unlocalized is True, returns:

2 lists: list of localized events, list of un-localized events

[1] M. D. Gillette and H. F. Silverman, “A Linear Closed-Form Algorithm for Source Localization From Time-Differences of Arrival,” IEEE Signal Processing Letters

[2] Wilson, David R., Matthew Battiston, John Brzustowski, and Daniel J. Mennill. “Sound Finder: A New Software Approach for Localizing Animals Recorded with a Microphone Array.” Bioacoustics 23, no. 2 (May 4, 2014): 99–112. https://doi.org/10.1080/09524622.2013.827588.

make_nearby_files_dict(r_max)[source]

create dictinoary listing nearby files for each file

pre-generate a dictionary listing all close files for each audio file dictionary will have a key for each audio file, and value listing all other receivers within r_max of that receiver

eg {ARU_0.mp3: [ARU_1.mp3, ARU_2.mp3…], ARU_1… }

Note: could manually create this dictionary to only list _simulataneous_ nearby files if the detection dataframe contains files from different times

The returned dictionary is used in create_candidate_events as a look-up table for

recordings nearby a detection in any given file

Parameters:

r_max – maximum distance from each recorder in which to include other recorders in the list of ‘nearby recorders’, in meters

Returns:

dictionary with keys for each file and values = list of nearby recordings

opensoundscape.localization.calc_speed_of_sound(temperature=20)[source]

Calculate speed of sound in air, in meters per second

Calculate speed of sound for a given temperature in Celsius (Humidity has a negligible effect on speed of sound and so this functionality is not implemented)

Parameters:

temperature – ambient air temperature in Celsius

Returns:

the speed of sound in air in meters per second

opensoundscape.localization.calculate_tdoa_residuals(receiver_locations, tdoas, location_estimate, speed_of_sound)[source]

Calculate the residual distances of the TDOA localization algorithm

The residual represents the discrepancy between (difference in distance of each reciever to estimated location) and (observed tdoa), and has units of meters. Residuals are calculated as follows:

expected = calculated time difference of arrival between reference and

another receiver, based on the locations of the receivers and estimated event location

observed = observed tdoas provided to localization algorithm

residual time = expected - observed (in seconds)

residual distance = speed of sound * residual time (in meters)

Parameters:
  • receiver_location – The list of coordinates (in m) of each receiver, as [x,y] for 2d or or [x,y,z] for 3d.

  • tdoas – List of time delays of arival for the sound at each receiver, relative to the first receiver in the list (tdoas[0] should be 0)

  • location_estimate – The estimated location of the sound, as (x,y) or (x,y,z) in meters

  • speed_of_sound – The speed of sound in m/s

Returns:

np.array containing the residuals in units of meters, one per receiver

opensoundscape.localization.gillette_localize(receiver_locations, arrival_times, speed_of_sound=343)[source]

Uses the Gillette and Silverman [1] localization algorithm to localize a sound event from a set of TDOAs. :param receiver_locations: a list of [x,y] or [x,y,z] locations for each receiver

locations should be in meters, e.g., the UTM coordinate system.

Parameters:
  • arrival_times – a list of TDOA times (arrival times) for each receiver The times should be in seconds.

  • speed_of_sound – speed of sound in m/s

Returns:

a tuple of (x,y,z) coordinates of the sound source

Return type:

coords

Algorithm from: [1] M. D. Gillette and H. F. Silverman, “A Linear Closed-Form Algorithm for Source Localization From Time-Differences of Arrival,” IEEE Signal Processing Letters

opensoundscape.localization.localize(receiver_locations, tdoas, algorithm, speed_of_sound=343)[source]

Perform TDOA localization on a sound event. :param receiver_locations: a list of [x,y,z] locations for each receiver

locations should be in meters, e.g., the UTM coordinate system.

Parameters:
  • tdoas – a list of TDOA times (onset times) for each recorder The times should be in seconds.

  • speed_of_sound – speed of sound in m/s

  • algorithm – the algorithm to use for localization Options: ‘soundfinder’, ‘gillette’

Returns:

The estimated source location in meters.

opensoundscape.localization.lorentz_ip(u, v=None)[source]

Compute Lorentz inner product of two vectors

For vectors u and v, the Lorentz inner product for 3-dimensional case is defined as

u[0]*v[0] + u[1]*v[1] + u[2]*v[2] - u[3]*v[3]

Or, for 2-dimensional case as

u[0]*v[0] + u[1]*v[1] - u[2]*v[2]

Parameters:
  • u – vector with shape either (3,) or (4,)

  • v – vector with same shape as x1; if None (default), sets v = u

Returns:

value of Lorentz IP

Return type:

float

opensoundscape.localization.soundfinder_localize(receiver_locations, arrival_times, speed_of_sound=343, invert_alg='gps', center=True, pseudo=True)[source]

Use the soundfinder algorithm to perform TDOA localization on a sound event Localize a sound event given relative arrival times at multiple receivers. This function implements a localization algorithm from the equations described in [1]. Localization can be performed in a global coordinate system in meters (i.e., UTM), or relative to recorder locations in meters.

This implementation follows [2] with corresponding variable names.

Parameters:
  • receiver_locations – a list of [x,y,z] locations for each receiver locations should be in meters, e.g., the UTM coordinate system.

  • arrival_times – a list of TDOA times (onset times) for each recorder The times should be in seconds.

  • sound (speed of) – speed of sound in m/s

  • invert_alg – what inversion algorithm to use (only ‘gps’ is implemented)

  • center – whether to center recorders before computing localization result. Computes localization relative to centered plot, then translates solution back to original recorder locations. (For behavior of original Sound Finder, use True)

  • pseudo – whether to use the pseudorange error (True) or sum of squares discrepancy (False) to pick the solution to return (For behavior of original Sound Finder, use False. However, in initial tests, pseudorange error appears to perform better.)

Returns:

The solution (x,y,z) in meters.

[1] Wilson, David R., Matthew Battiston, John Brzustowski, and Daniel J. Mennill. “Sound Finder: A New Software Approach for Localizing Animals Recorded with a Microphone Array.” Bioacoustics 23, no. 2 (May 4, 2014): 99–112. https://doi.org/10.1080/09524622.2013.827588.

[2] Global locationing Systems handout, 2002 http://web.archive.org/web/20110719232148/http://www.macalester.edu/~halverson/math36/GPS.pdf

opensoundscape.localization.travel_time(source, receiver, speed_of_sound)[source]

Calculate time required for sound to travel from a souce to a receiver

Parameters:
  • source – cartesian location [x,y] or [x,y,z] of sound source, in meters

  • receiver – cartesian location [x,y] or [x,y,z] of sound receiver, in meters

  • speed_of_sound – speed of sound in m/s

Returns:

time in seconds for sound to travel from source to receiver

utils

Utilities for opensoundscape

exception opensoundscape.utils.GetDurationError[source]

raised if librosa.get_duration(path=f) causes an error

opensoundscape.utils.binarize(x, threshold)[source]

return a list of 0, 1 by thresholding vector x

opensoundscape.utils.generate_clip_times_df(full_duration, clip_duration, clip_overlap=0, final_clip=None, rounding_precision=10)[source]

generate start and end times for even-lengthed clips

The behavior for incomplete final clips at the end of the full_duration depends on the final_clip parameter.

This function only creates a dataframe with start and end times, it does not perform any actual trimming of audio or other objects.

Parameters:
  • full_duration – The amount of time (seconds) to split into clips

  • clip_duration (float) – The duration in seconds of the clips

  • clip_overlap (float) – The overlap of the clips in seconds [default: 0]

  • final_clip (str) –

    Behavior if final_clip is less than clip_duration seconds long. By default, discards remaining time if less than clip_duration seconds long [default: None]. Options:

    • None: Discard the remainder (do not make a clip)

    • ”extend”: Extend the final clip beyond full_duration to reach clip_duration length

    • ”remainder”: Use only remainder of full_duration (final clip will be shorter than clip_duration)

    • ”full”: Increase overlap with previous clip to yield a clip with clip_duration length.

      Note: returns entire original audio if it is shorter than clip_duration

  • rounding_precision (int or None) – number of decimals to round start/end times to - pass None to skip rounding

Returns:

DataFrame with columns for ‘start_time’ and ‘end_time’ of each clip

Return type:

clip_df

opensoundscape.utils.generate_opacity_colormaps(colors=['#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'])[source]

Create a colormap for each color from transparent to opaque

opensoundscape.utils.identity(x)[source]

return the input unchanged

opensoundscape.utils.inrange(x, r)[source]

return true if x is in range [r[0],r1] (inclusive)

opensoundscape.utils.isNan(x)[source]

check for nan by equating x to itself

opensoundscape.utils.jitter(x, width, distribution='gaussian')[source]

Jitter (add random noise to) each value of x

Parameters:
  • x – scalar, array, or nd-array of numeric type

  • width – multiplier for random variable (stdev for ‘gaussian’ or r for ‘uniform’)

  • distribution – ‘gaussian’ (default) or ‘uniform’ if ‘gaussian’: draw jitter from gaussian with mu = 0, std = width if ‘uniform’: draw jitter from uniform on [-width, width]

Returns:

x + random jitter

Return type:

jittered_x

opensoundscape.utils.linear_scale(array, in_range=(0, 1), out_range=(0, 255))[source]

Translate from range in_range to out_range

Inputs:

in_range: The starting range [default: (0, 1)] out_range: The output range [default: (0, 255)]

Outputs:

new_array: A translated array

opensoundscape.utils.make_clip_df(files, clip_duration, clip_overlap=0, final_clip=None, return_invalid_samples=False, raise_exceptions=False)[source]

generate df of fixed-length clip start/end times for a set of files

Used internally to prepare a dataframe listing clips of longer audio files

This function creates a single dataframe with audio files as the index and columns: ‘start_time’, ‘end_time’. It will list clips of a fixed duration from the beginning to end of each audio file.

Note: if a label dataframe is passed as files, the labels for each file will be copied to all clips having the corresponding file. If the label dataframe contains multiple rows for a single file, the labels in the _first_ row containing the file path are used as labels for resulting clips.

Parameters:
  • files – list of audio file paths, or dataframe with file path as index - if dataframe, columns represent classes and values represent class labels. Labels for a file will be copied to all clips belonging to that file in the returned clip dataframe.

  • clip_duration (float) – see generate_clip_times_df

  • clip_overlap (float) – see generate_clip_times_df

  • final_clip (str) – see generate_clip_times_df

  • return_invalid_samples (bool) – if True, returns additional value, a list of samples that caused exceptions

  • raise_exceptions (bool) – if True, if exceptions are raised when attempting to check the duration of an audio file, the exception will be raised. If False [default], adds a row to the dataframe with np.nan for ‘start_time’ and ‘end_time’ for that file path.

Returns:

dataframe multi-index (‘file’,’start_time’,’end_time’)
  • if files is a dataframe, will contain same columns as files

  • otherwise, will have no columns

if return_invalid_samples==True, returns (clip_df, invalid_samples)

Return type:

clip_df

Note: default behavior for raise_exceptions is the following:

if an exception is raised (for instance, trying to get the duration of the file), the dataframe will have one row with np.nan for ‘start_time’ and ‘end_time’ for that file path.

opensoundscape.utils.min_max_scale(array, feature_range=(0, 1))[source]

rescale vaues in an a array linearly to feature_range

opensoundscape.utils.overlap(r1, r2)[source]

“calculate the amount of overlap between two real-numbered ranges

ranges must be [low,high] where low <= high

opensoundscape.utils.overlap_fraction(r1, r2)[source]

“calculate the fraction of r1 (low, high range) that overlaps with r2

opensoundscape.utils.rescale_features(X, rescaling_vector=None)[source]

rescale all features by dividing by the max value for each feature

optionally provide the rescaling vector (1xlen(X) np.array), so that you can rescale a new dataset consistently with an old one

returns rescaled feature set and rescaling vector

opensoundscape.utils.sigmoid(x)[source]

sigmoid function