Audio Utilities for loading and modifying Audio objects

Note: Out-of-place operations

Functions that modify Audio (and Spectrogram) objects are “out of place”, meaning that they return a new Audio object instead of modifying the original object. This means that running a line ` audio_object.resample(22050) # WRONG! ` will not change the sample rate of audio_object! If your goal was to overwrite audio_object with the new, resampled audio, you would instead write ` audio_object = audio_object.resample(22050) `

class, sample_rate, resample_type='kaiser_fast', max_duration=None)

Container for audio samples

Initialization requires sample array. To load audio file, use Audio.from_file()

Initializing an Audio object directly requires the specification of the sample rate. Use Audio.from_file or Audio.from_bytesio with sample_rate=None to use a native sampling rate.

  • samples (np.array) – The audio samples
  • sample_rate (integer) – The sampling rate for the audio samples
  • resample_type (str) – The resampling method to use [default: “kaiser_fast”]
  • max_duration (None or integer) – The maximum duration in seconds allowed for the audio file (longer files will raise an exception)[default: None] If None, no limit is enforced

An initialized Audio object

bandpass(low_f, high_f, order)

Bandpass audio signal with a butterworth filter

Uses a phase-preserving algorithm (scipy.signal’s butter and solfiltfilt)

  • low_f – low frequency cutoff (-3 dB) in Hz of bandpass filter
  • high_f – high frequency cutoff (-3 dB) in Hz of bandpass filter
  • order – butterworth filter order (integer) ~= steepness of cutoff

Return duration of Audio

Returns:The duration of the Audio
Return type:duration (float)

Extend audio file by adding silence to the end

Parameters:length – the final length in seconds of the extended file
Returns:a new Audio object of the desired length
classmethod from_bytesio(bytesio, sample_rate=None, max_duration=None, resample_type='kaiser_fast')

Read from bytesio object

Read an Audio object from a BytesIO object. This is primarily used for passing Audio over HTTP.

  • bytesio – Contents of WAV file as BytesIO
  • sample_rate – The final sampling rate of Audio object [default: None]
  • max_duration – The maximum duration of the audio file [default: None]
  • resample_type – The librosa method to do resampling [default: “kaiser_fast”]

An initialized Audio object

classmethod from_file(path, sample_rate=None, resample_type='kaiser_fast', max_duration=None)

Load audio from files

Deal with the various possible input types to load an audio file and generate a spectrogram

  • path (str, Path) – path to an audio file
  • sample_rate (int, None) – resample audio with value and resample_type, if None use source sample_rate (default: None)
  • resample_type – method used to resample_type (default: kaiser_fast)
  • max_duration – the maximum length of an input file, None is no maximum (default: None)

attributes samples and sample_rate

Return type:


loop(length=None, n=None)

Extend audio file by looping it

  • length – the final length in seconds of the looped file (cannot be used with n)[default: None]
  • n – the number of occurences of the original audio sample (cannot be used with length) [default: None] For example, n=1 returns the original sample, and n=2 returns two concatenated copies of the original sample

a new Audio object of the desired length or repetitions

resample(sample_rate, resample_type=None)

Resample Audio object

  • sample_rate (scalar) – the new sample rate
  • resample_type (str) – resampling algorithm to use [default: None (uses self.resample_type of instance)]

a new Audio object of the desired sample rate


Save Audio to file

NOTE: currently, only saving to .wav format supported

Parameters:path – destination for output

Create frequency spectrum from an Audio object using fft

Returns:fft, frequencies
split(clip_duration, clip_overlap=0, final_clip=None)

Split Audio into even-lengthed clips

The Audio object is split into clips of a specified duration and overlap

  • clip_duration (float) – The duration in seconds of the clips
  • clip_overlap (float) – The overlap of the clips in seconds [default: 0]
  • final_clip (str) –

    Behavior if final_clip is less than clip_duration seconds long. By default, discards remaining audio if less than clip_duration seconds long [default: None]. Options:

    • ”remainder”: Include the remainder of the Audio (clip will not have clip_duration length)
    • ”full”: Increase the overlap to yield a clip with clip_duration length
    • ”extend”: Similar to remainder but extend (repeat) the clip to reach clip_duration length
    • None: Discard the remainder

[“audio”, “begin_time”, “end_time”]

Return type:

A list of dictionaries with keys


Given a time, convert it to the corresponding sample

Parameters:time – The time to multiply with the sample_rate
Returns:The rounded sample
Return type:sample
trim(start_time, end_time)

Trim Audio object in time

If start_time is less than zero, output starts from time 0 If end_time is beyond the end of the sample, trims to end of sample

  • start_time – time in seconds for start of extracted clip
  • end_time – time in seconds for end of extracted clip

a new Audio object containing samples from start_time to end_time


Custom exception indicating we can’t load input


Custom exception indicating length of audio is too long, destination, prefix, clip_duration, clip_overlap=0, final_clip=None, dry_run=False)

Split audio into clips and save them to a folder

  • audio – The input Audio to split
  • destination – A folder to write clips to
  • prefix – A name to prepend to the written clips
  • clip_duration – The duration of each clip in seconds
  • clip_overlap – The overlap of each clip in seconds [default: 0]
  • final_clip (str) –

    Behavior if final_clip is less than clip_duration seconds long. [default: None] By default, ignores final clip entirely. Possible options (any other input will ignore the final clip entirely),

    • ”remainder”: Include the remainder of the Audio (clip will not have clip_duration length)
    • ”full”: Increase the overlap to yield a clip with clip_duration length
    • ”extend”: Similar to remainder but extend (repeat) the clip to reach clip_duration length
    • None: Discard the remainder
  • dry_run (bool) – If True, skip writing audio and just return clip DataFrame [default: False]

pandas.DataFrame containing begin and end times for each clip from the source audio

Audio Tools set of tools that filter or modify audio files or sample arrays (not Audio objects)

opensoundscape.audio_tools.bandpass_filter(signal, low_f, high_f, sample_rate, order=9)

perform a butterworth bandpass filter on a discrete time signal using scipy.signal’s butter and solfiltfilt (phase-preserving version of sosfilt)

  • signal – discrete time signal (audio samples, list of float)
  • low_f – -3db point (?) for highpass filter (Hz)
  • high_f – -3db point (?) for highpass filter (Hz)
  • sample_rate – samples per second (Hz)
  • order=9 – higher values -> steeper dropoff

filtered time signal

opensoundscape.audio_tools.butter_bandpass(low_f, high_f, sample_rate, order=9)

generate coefficients for bandpass_filter()

  • low_f – low frequency of butterworth bandpass filter
  • high_f – high frequency of butterworth bandpass filter
  • sample_rate – audio sample rate
  • order=9 – order of butterworth filter

set of coefficients used in sosfiltfilt()

opensoundscape.audio_tools.clipping_detector(samples, threshold=0.6)

count the number of samples above a threshold value

  • samples – a time series of float values
  • threshold=0.6 – minimum value of sample to count as clipping

number of samples exceeding threshold

opensoundscape.audio_tools.convolve_file(in_file, out_file, ir_file, input_gain=1.0)

apply an impulse_response to a file using ffmpeg’s afir convolution

ir_file is an audio file containing a short burst of noise recorded in a space whose acoustics are to be recreated

this makes the files ‘sound as if’ it were recorded in the location that the impulse response (ir_file) was recorded

  • in_file – path to an audio file to process
  • out_file – path to save output to
  • ir_file – path to impulse response file
  • input_gain=1.0 – ratio for in_file sound’s amplitude in (0,1)

os response of ffmpeg command

opensoundscape.audio_tools.mixdown_with_delays(files_to_mix, destination, delays=None, levels=None, duration='first', verbose=0, create_txt_file=False)

use ffmpeg to mixdown a set of audio files, each starting at a specified time (padding beginnings with zeros)

  • files_to_mix – list of audio file paths
  • destination – path to save mixdown to
  • delays=None – list of delays (how many seconds of zero-padding to add at beginning of each file)
  • levels=None – optionally provide a list of relative levels (amplitudes) for each input
  • duration='first' – ffmpeg option for duration of output file: match duration of ‘longest’,’shortest’,or ‘first’ input file
  • verbose=0 – if >0, prints ffmpeg command and doesn’t suppress ffmpeg output (command line output is returned from this function)
  • create_txt_file=False – if True, also creates a second output file which lists all files that were included in the mixdown

ffmpeg command line output

opensoundscape.audio_tools.silence_filter(filename, smoothing_factor=10, window_len_samples=256, overlap_len_samples=128, threshold=None)

Identify whether a file is silent (0) or not (1)

Load samples from an mp3 file and identify whether or not it is likely to be silent. Silence is determined by finding the energy in windowed regions of these samples, and normalizing the detected energy by the average energy level in the recording.

If any windowed region has energy above the threshold, returns a 0; else returns 1.

  • filename (str) – file to inspect
  • smoothing_factor (int) – modifier to window_len_samples
  • window_len_samples – number of samples per window segment
  • overlap_len_samples – number of samples to overlap each window segment
  • threshold – threshold value (experimentally determined)

0 if file contains no significant energy over bakcground 1 if file contains significant energy over bakcground

If threshold is None: returns net_energy over background noise

opensoundscape.audio_tools.window_energy(samples, window_len_samples=256, overlap_len_samples=128)

Calculate audio energy with a sliding window

Calculate the energy in an array of audio samples

  • samples (np.ndarray) – array of audio samples loaded using librosa.load
  • window_len_samples – samples per window
  • overlap_len_samples – number of samples shared between consecutive windows

list of energy level (float) for each window