API Documentation¶

Audio¶

audio.py: Utilities for dealing with audio files

class opensoundscape.audio.Audio(samples, sample_rate, resample_type='kaiser_fast', max_duration=None)¶

Container for audio samples

Initializing an Audio object directly requires the specification of the sample rate. Use Audio.from_file or Audio.from_bytesio with sample_rate=None to use a native sampling rate.

Parameters:	samples (np.array) – The audio samples sample_rate (integer) – The sampling rate for the audio samples resample_type (str) – The resampling method to use [default: “kaiser_fast”] max_duration (None or integer) – The maximum duration allowed for the audio file [default: None]
Returns:	An initialized Audio object

bandpass(low_f, high_f, order)¶

bandpass audio signal frequencies

uses a phase-preserving algorithm (scipy.signal’s butter and solfiltfilt)

Parameters:	low_f – low frequency cutoff (-3 dB) in Hz of bandpass filter high_f – high frequency cutoff (-3 dB) in Hz of bandpass filter order – butterworth filter order (integer) ~= steepness of cutoff

duration()¶

Return duration of Audio

Output:: duration (float): The duration of the Audio

extend(length)¶

Extend audio file by looping it

Parameters:	length – the final length in seconds of the extended file
Returns:	a new Audio object of the desired length

classmethod from_bytesio(bytesio, sample_rate=None, max_duration=None, resample_type='kaiser_fast')¶

Read from bytesio object

Read an Audio object from a BytesIO object. This is primarily used for passing Audio over HTTP.

Parameters:	bytesio – Contents of WAV file as BytesIO sample_rate – The final sampling rate of Audio object [default: None] max_duration – The maximum duration of the audio file [default: None] resample_type – The librosa method to do resampling [default: “kaiser_fast”]
Returns:	An initialized Audio object

classmethod from_file(path, sample_rate=None, resample_type='kaiser_fast', max_duration=None)¶

Load audio from files

Deal with the various possible input types to load an audio file and generate a spectrogram

Parameters:	path (str, Path) – path to an audio file sample_rate (int, None) – resample audio with value and resample_type, if None use source sample_rate (default: None) resample_type – method used to resample_type (default: kaiser_fast) max_duration – the maximum length of an input file, None is no maximum (default: None)
Returns:	attributes samples and sample_rate
Return type:	Audio

save(path)¶

save Audio to file

Parameters:	path – destination for output

spectrum()¶

create frequency spectrum from an Audio object using fft

Parameters:	self –
Returns:	fft, frequencies

split(clip_duration, clip_overlap=0, final_clip=None)¶

Split Audio into clips

The Audio object is split into clips of a specified duration and overlap

Parameters:

clip_duration – The duration in seconds of the clips
clip_overlap – The overlap of the clips in seconds [default: 0]
final_clip –
Possible options (any other input will ignore the final clip entirely), - “remainder”: Include the remainder of the Audio

(clip will not have clip_duration length)
- ”full”: Increase the overlap to yield a clip with clip_duration
- ”extend”: Similar to remainder but extend the clip to clip_duration

Results:: A list of dictionaries with keys: [“audio”, “begin_time”, “end_time”]

time_to_sample(time)¶

Given a time, convert it to the corresponding sample

Parameters:	time – The time to multiply with the sample_rate
Returns:	The rounded sample
Return type:	sample

trim(start_time, end_time)¶

trim Audio object in time

Parameters:	start_time – time in seconds for start of extracted clip end_time – time in seconds for end of extracted clip
Returns:	a new Audio object containing samples from start_time to end_time

exception opensoundscape.audio.OpsoLoadAudioInputError¶: Custom exception indicating we can’t load input

exception opensoundscape.audio.OpsoLoadAudioInputTooLong¶: Custom exception indicating length of audio is too long

opensoundscape.audio.split_and_save(audio, destination, prefix, clip_duration, clip_overlap=0, final_clip=None, dry_run=False)¶

Split audio into clips and save them to a folder

Parameters:

audio – The input Audio to split
destination – A folder to write clips to
prefix – A name to prepend to the written clips
clip_duration – The duration of each clip in seconds
clip_overlap – The overlap of each clip in seconds [default: 0]
final_clip –
Possible options (any other input will ignore the final clip entirely) [default: None] - “remainder”: Include the remainder of the Audio

(clip will not have clip_duration length)
- ”full”: Increase the overlap to yield a clip with clip_duration
- ”extend”: Similar to remainder but extend the clip to clip_duration
dry_run – If True, skip writing audio and just return clip DataFrame [default: False]

Returns:

pandas.DataFrame containing begin and end times for each clip from the source audio

Audio Tools¶

audio_tools.py: set of tools that filter or modify audio files or sample arrays (not Audio objects)

opensoundscape.audio_tools.bandpass_filter(signal, low_f, high_f, sample_rate, order=9)¶

perform a butterworth bandpass filter on a discrete time signal using scipy.signal’s butter and solfiltfilt (phase-preserving version of sosfilt)

Parameters:	signal – discrete time signal (audio samples, list of float) low_f – -3db point (?) for highpass filter (Hz) high_f – -3db point (?) for highpass filter (Hz) sample_rate – samples per second (Hz) order=9 – higher values -> steeper dropoff
Returns:	filtered time signal

opensoundscape.audio_tools.butter_bandpass(low_f, high_f, sample_rate, order=9)¶

generate coefficients for bandpass_filter()

Parameters:	low_f – low frequency of butterworth bandpass filter high_f – high frequency of butterworth bandpass filter sample_rate – audio sample rate order=9 – order of butterworth filter
Returns:	set of coefficients used in sosfiltfilt()

opensoundscape.audio_tools.clipping_detector(samples, threshold=0.6)¶

count the number of samples above a threshold value

Parameters:	samples – a time series of float values threshold=0.6 – minimum value of sample to count as clipping
Returns:	number of samples exceeding threshold

opensoundscape.audio_tools.convolve_file(in_file, out_file, ir_file, input_gain=1.0)¶

apply an impulse_response to a file using ffmpeg’s afir convolution

ir_file is an audio file containing a short burst of noise recorded in a space whose acoustics are to be recreated

this makes the files ‘sound as if’ it were recorded in the location that the impulse response (ir_file) was recorded

Parameters:	in_file – path to an audio file to process out_file – path to save output to ir_file – path to impulse response file input_gain=1.0 – ratio for in_file sound’s amplitude in (0,1)
Returns:	os response of ffmpeg command

opensoundscape.audio_tools.mixdown_with_delays(files_to_mix, destination, delays=None, levels=None, duration='first', verbose=0, create_txt_file=False)¶

use ffmpeg to mixdown a set of audio files, each starting at a specified time (padding beginnings with zeros)

Parameters:

files_to_mix – list of audio file paths
destination – path to save mixdown to
delays=None – list of delays (how many seconds of zero-padding to add at beginning of each file)
levels=None – optionally provide a list of relative levels (amplitudes) for each input
duration='first' – ffmpeg option for duration of output file: match duration of ‘longest’,’shortest’,or ‘first’ input file
verbose=0 – if >0, prints ffmpeg command and doesn’t suppress ffmpeg output (command line output is returned from this function)
create_txt_file=False – if True, also creates a second output file which lists all files that were included in the mixdown

Returns:

ffmpeg command line output

opensoundscape.audio_tools.silence_filter(filename, smoothing_factor=10, window_len_samples=256, overlap_len_samples=128, threshold=None)¶

Identify whether a file is silent (0) or not (1)

Load samples from an mp3 file and identify whether or not it is likely to be silent. Silence is determined by finding the energy in windowed regions of these samples, and normalizing the detected energy by the average energy level in the recording.

If any windowed region has energy above the threshold, returns a 0; else returns 1.

Parameters:	filename (str) – file to inspect smoothing_factor (int) – modifier to window_len_samples window_len_samples – number of samples per window segment overlap_len_samples – number of samples to overlap each window segment threshold – threshold value (experimentally determined)
Returns:	0 if file contains no significant energy over bakcground 1 if file contains significant energy over bakcground

If threshold is None: returns net_energy over background noise

opensoundscape.audio_tools.window_energy(samples, window_len_samples=256, overlap_len_samples=128)¶

Calculate audio energy with a sliding window

Calculate the energy in an array of audio samples

Parameters:	samples (np.ndarray) – array of audio samples loaded using librosa.load window_len_samples – samples per window overlap_len_samples – number of samples shared between consecutive windows
Returns:	list of energy level (float) for each window

Commands¶

opensoundscape.commands.run_command(cmd)¶

Run a command returning output, error

Input:: cmd: A string containing some command
Output:: (stdout, stderr): A tuple of standard out and standard error

opensoundscape.commands.run_command_return_code(cmd)¶

Run a command returning the return code

Input:: cmd: A string containing some command
Output:: return_code: The return code of the function

Completions¶

Config¶

opensoundscape.config.get_default_config()¶

Get the default configuration file as a dictionary

Output:: dict: A dictionary containing the default Opensoundscape configuration

opensoundscape.config.validate(config)¶

Validate a configuration string

Input:: config: A string containing an Opensoundscape configuration
Output:: dict: A dictionary of the validated Opensoundscape configuration

opensoundscape.config.validate_file(fname)¶

Validate a configuration file

Input:: fname: A filename containing an Opensoundscape configuration
Output:: dict: A dictionary of the validated Opensoundscape configuration

Console Checks¶

Utilities related to console checks on docopt args

Console¶

console.py: Entrypoint for opensoundscape

opensoundscape.console.build_docs()¶: Run sphinx-build for our project

opensoundscape.console.entrypoint()¶: The Opensoundscape entrypoint for console interaction

Data Selection¶

opensoundscape.data_selection.add_binary_numeric_labels(input_df, label, input_column='Labels', output_column='NumericLabels')¶

Add binary numeric labels to dataframe based on label

Given a dataframe and a label from input_column produce a new dataframe with an output_column and a label map

Parameters:	input_df – A dataframe label – The label to set to 1 input_column – The column to read labels from output_column – The column to write numeric labels to

Output:: output_df: A dataframe with an additional output_column label_map: A dictionary, keys are f”not_{label}” and f”{label}”, values are 0 and 1

opensoundscape.data_selection.add_numeric_labels(input_df, input_column='Labels', output_column='NumericLabels')¶

Add numeric labels to dataframe

Given a dataframe with input_column produce a new dataframe with an output_column and a label map

Parameters:	input_df – A dataframe input_column – The column to read labels from output_column – The column to write numeric labels to

Output:: output_df: A dataframe with an additional output_column label_map: A dictionary, keys are the unique labels and monotonically increasing values starting at 0

opensoundscape.data_selection.expand_multi_labeled(input_df, column_header='Labels', label_separator='|')¶

Given a multi-labeled dataframe, generate a singly-labeled dataframe

Given a Dataframe with a “Labels” column that is multi-labeled (e.g. “hello|world”) split the row into singly labeled rows.

Parameters:	input_df – A Dataframe with a multi-labeled column column_header – The column containing multiple labels [default: “Labels”] label_separator – Multiple labels are separated by this [default: “\|”]

Output:: output_df: A Dataframe with singly-labeled column in column_header

opensoundscape.data_selection.train_valid_split(input_df, stratify_from_column='Labels', train_size=0.8, random_state=101)¶

Split a dataframe into train and validation dataframes

Given an input dataframe with a labels column split each unique label into a train size and 1 - train_size for training and validation sets. If stratify_from_column is None don’t stratify.

Parameters:	input_df – A dataframe stratify_from_column – Name of the column that labels should come from [default: “Labels”] - given None will not attempt stratified sampling train_size – The decimal fraction to use for the training set [default: 0.8] random_state – The random state to use for train_test_split [default: 101]

Output:: train_df: A Dataframe containing the training set valid_df: A Dataframe containing the validation set

opensoundscape.data_selection.upsample(input_df, label_column='Labels', random_state=None)¶

Given a input DataFrame upsample to maximum value

Upsampling removes the class imbalance in your dataset. Rows for each label are repeated up to max_count // rows. Then, we randomly sample the rows to fill up to max_count.

Input:: input_df: A DataFrame to upsample label_column: The column to draw unique labels from random_state: Set the random_state during sampling
Output:: df: An upsampled DataFrame

Datasets¶

class opensoundscape.datasets.SingleTargetAudioDataset(df, label_dict, filename_column='Destination', from_audio=True, label_column=None, height=224, width=224, add_noise=False, save_dir=None, random_trim_length=None, extend_short_clips=False, max_overlay_num=0, overlay_prob=0.2, overlay_weight='random', overlay_class=None, audio_sample_rate=22050, debug=None)¶

Single Target Audio -> Image Dataset

Given a DataFrame with audio files in one of the columns, generate a Dataset of spectrogram images for basic machine learning tasks.

This class provides access to several types of augmentations that act on audio and images with the following arguments: - add_noise: for adding RandomAffine and ColorJitter noise to images - random_trim_length: for only using a short random clip extracted from the training data - max_overlay_num / overlay_prob / overlay_weight:

controlling the maximum number of additional spectrograms to overlay, the probability of overlaying an individual spectrogram, and the weight for the weighted sum of the spectrograms

Additional augmentations on tensors are available when calling train() from the module opensoundscape.torch.train.

Input:

df: A DataFrame with a column containing audio files label_dict: a dictionary mapping numeric labels to class names,

for example: {0:’American Robin’,1:’Northern Cardinal’}

pass None if you wish to retain numeric labels

filename_column: The column in the DataFrame which contains paths to: data [default: Destination]

from_audio: Whether the raw dataset is audio [default: True] label_column: The column with numeric labels if present [default: None] height: Height for resulting Tensor [default: 224] width: Width for resulting Tensor [default: 224] add_noise: Apply RandomAffine and ColorJitter filters [default: False] save_dir: Save images to a directory [default: None] random_trim_length: Extract a clip of this many seconds of audio

starting at a random time. If None, the original clip will be used [default: None]

extend_short_clips: If a file to be overlaid or trimmed from is too: short, extend it to the desired length by repeating it. [default: False]
max_overlay_num: The maximum number of additional images to overlay,: each with probability overlay_prob [default: 0]
overlay_prob: Probability of an image from a different class being: overlayed (combined as a weighted sum) on the training image. typical values: 0, 0.66 [default: 0.2]
overlay_weight: The weight given to the overlaid image during: augmentation. When ‘random’, will randomly select a different weight between 0.2 and 0.5 for each overlay. When not ‘random’, should be a float between 0 and 1 [default: ‘random’]
overlay_class: The label of the class that overlays should be drawn from.: Must be specified if max_overlay_num > 0. If ‘different’, draws overlays from any class that is not the same class as the audio. If set to a class label, draws overlays from that class. When creating a presence/absence classifier, set overlay_class equal to the absence class label [default: None]
audio_sample_rate: resample audio to this sample rate; specify None to: use original audio sample rate [default: 22050]
debug: path to save img files, images are created from the tensor: immediately before it is returned. When None, does not save images. [default: None]

Output:

Dictionary:: { “X”: (3, H, W) , “y”: (1) if label_column != None }

image_from_audio(audio, mode='RGB')¶

Create a PIL image from audio

Inputs:: audio: audio object mode: PIL image mode, e.g. “L” or “RGB” [default: RGB]

overlay_random_image(original_image, original_length, original_class, original_path)¶

Overlay an image from another class

Select a random file from a different class. Trim if necessary to the same length as the given image. Overlay the images on top of each other with a weight

class opensoundscape.datasets.SplitterDataset(wavs, annotations=False, label_corrections=None, overlap=1, duration=5, output_directory='segments', include_last_segment=False, column_separator='t', species_separator='|')¶

A PyTorch Dataset for splitting a WAV files

Inputs:

wavs: A list of WAV files to split annotations: Should we search for corresponding annotations files? (default: False) label_corrections: Specify a correction labels CSV file w/ column headers “raw” and “corrected” (default: None) overlap: How much overlap should there be between samples (units: seconds, default: 1) duration: How long should each segment be? (units: seconds, default: 5) output_directory Where should segments be written? (default: segments/) include_last_segment: Do you want to include the last segment? (default: False) column_separator: What character should we use to separate columns (default: ” “) species_separator: What character should we use to separate species (default: “|”)

Effects:

Segments will be written to the output_directory

Outputs:

output: A list of CSV rows (separated by column_separator) containing: the source audio, segment begin time (seconds), segment end time (seconds), segment audio, and present classes separated by species_separator if annotations were requested

opensoundscape.datasets.annotations_with_overlaps_with_clip(df, begin, end)¶

Determine if any rows overlap with current segment

Inputs:: df: A dataframe containing a Raven annotation file begin: The begin time of the current segment (unit: seconds) end: The end time of the current segment (unit: seconds)
Output:: sub_df: A dataframe of annotations which overlap with the begin/end times

opensoundscape.datasets.get_md5_digest(input_string)¶

Generate MD5 sum for a string

Inputs:: input_string: An input string
Outputs:: output: A string containing the md5 hash of input string

Grad Cam¶

Helpers¶

opensoundscape.helpers.binarize(x, threshold)¶: return a list of 0, 1 by thresholding vector x

opensoundscape.helpers.bound(x, bounds)¶: restrict x to a range of bounds = [min, max]

opensoundscape.helpers.file_name(path)¶: get file name without extension from a path

opensoundscape.helpers.hex_to_time(s)¶: convert a hexidecimal, Unix time string to a datetime timestamp

opensoundscape.helpers.isNan(x)¶: check for nan by equating x to itself

opensoundscape.helpers.jitter(x, width, distribution='gaussian')¶

Jitter (add random noise to) each value of x

Parameters:	x – scalar, array, or nd-array of numeric type width – multiplier for random variable (stdev for ‘gaussian’ or r for ‘uniform’) distribution – ‘gaussian’ (default) or ‘uniform’ if ‘gaussian’: draw jitter from gaussian with mu = 0, std = width if ‘uniform’: draw jitter from uniform on [-width, width]
Returns:	x + random jitter
Return type:	jittered_x

opensoundscape.helpers.linear_scale(array, in_range=(0, 1), out_range=(0, 255))¶

Translate from range in_range to out_range

Inputs:: in_range: The starting range [default: (0, 1)] out_range: The output range [default: (0, 255)]
Outputs:: new_array: A translated array

opensoundscape.helpers.min_max_scale(array, feature_range=(0, 1))¶: rescale vaues in an a array linearly to feature_range

opensoundscape.helpers.rescale_features(X, rescaling_vector=None)¶

rescale all features by dividing by the max value for each feature

optionally provide the rescaling vector (1xlen(X) np.array), so that you can rescale a new dataset consistently with an old one

returns rescaled feature set and rescaling vector

opensoundscape.helpers.run_command(cmd)¶: run a bash command with Popen, return response

opensoundscape.helpers.sigmoid(x)¶: sigmoid function

Localization¶

opensoundscape.localization.calc_speed_of_sound(temperature=20)¶

Calculate speed of sound in meters per second

Calculate speed of sound for a given temperature in Celsius (Humidity has a negligible effect on speed of sound and so this functionality is not implemented)

Parameters:	temperature – ambient temperature in Celsius
Returns:	the speed of sound in meters per second

opensoundscape.localization.localize(receiver_positions, arrival_times, temperature=20.0, invert_alg='gps', center=True, pseudo=True)¶

Perform TDOA localization on a sound event

Localize a sound event given relative arrival times at multiple receivers. This function implements a localization algorithm from the equations described in the class handout (“Global Positioning Systems”). Localization can be performed in a global coordinate system in meters (i.e., UTM), or relative to recorder positions in meters.

Parameters:

receiver_positions – a list of [x,y,z] positions for each receiver Positions should be in meters, e.g., the UTM coordinate system.
arrival_times – a list of TDOA times (onset times) for each recorder The times should be in seconds.
temperature – ambient temperature in Celsius
invert_alg – what inversion algorithm to use
center – whether to center recorders before computing localization result. Computes localization relative to centered plot, then translates solution back to original recorder locations. (For behavior of original Sound Finder, use True)
pseudo – whether to use the pseudorange error (True) or sum of squares discrepancy (False) to pick the solution to return (For behavior of original Sound Finder, use False. However, in initial tests, pseudorange error appears to perform better.)

Returns:

The solution (x,y,z,b) with the lower sum of squares discrepancy b is the error in the pseudorange (distance to mics), b=c*delta_t (delta_t is time error)

opensoundscape.localization.lorentz_ip(u, v=None)¶

Compute Lorentz inner product of two vectors

For vectors u and v, the Lorentz inner product for 3-dimensional case is defined as

u[0]*v[0] + u[1]*v[1] + u[2]*v[2] - u[3]*v[3]

Or, for 2-dimensional case as

u[0]*v[0] + u[1]*v[1] - u[2]*v[2]

Args: u: vector with shape either (3,) or (4,) v: vector with same shape as x1; if None (default), sets v = u
Returns: float: value of Lorentz IP

opensoundscape.localization.travel_time(source, receiver, speed_of_sound)¶

Calculate time required for sound to travel from a souce to a receiver

Parameters:	source – cartesian position [x,y] or [x,y,z] of sound source receiver – cartesian position [x,y] or [x,y,z] of sound receiver speed_of_sound – speed of sound in m/s
Returns:	time in seconds for sound to travel from source to receiver

Metrics¶

class opensoundscape.metrics.Metrics(classes, dataset_len)¶

Basic Example

See opensoundscape.torch.train for an in-depth example

``` dataset = Dataset(…) dataloader = DataLoader(dataset, …) classes = [0, 1, 2, 3, 4] # An example list of classes for epoch in epochs:

metrics = Metrics(classes, len(dataset)) for batch in dataloader:

X, y = batch[“X”], batch[“y”] targets = y.squeeze(0) # dim: (batch_size) … loss = … # dim: (0) predictions = … # dim: (batch_size) metrics.accumulate_batch_metrics(

loss.item(), targets.cpu(), predictions.cpu()

)

metrics_dictionary = metrics.compute_epoch_metrics()

```

accumulate_batch_metrics(loss, targets, predictions)¶

For a batch, accumulate loss and confusion matrix

For validation pass 0 for loss.

Parameters:	loss – The loss for this batch targets – The correct y labels predictions – The predicted labels

compute_epoch_metrics()¶

Compute metrics from learning

Computes the loss and accuracy, precision, recall, and f1 scores from the confusion matrix and returns dictionary with metric name as keys and their corresponding values

Returns:	[loss, accuracy, precision, recall, f1, confusion_matrix]
Return type:	dictionary with keys

Pulse Finder¶

PyTorch Prediction¶

DEPRECATED: use opensoundscape.torch.predict instead

these functions are currently used only to support localization.py the module contains a pytorch prediction function (deprecated) and some additional functionality for using gradcam

opensoundscape.pytorch_prediction.activation_region_limits(gcam, threshold=0.2)¶

calculate bounds of a GradCam activation region

Parameters:	gcam – a 2-d array gradcam activation array generated by gradcam_region() threshold=0.2 – minimum value of gradcam (0-1) to count as ‘activated’
Returns:	[ [min row, max_row], [min_col, max_col] ] indices of gradcam elements exceeding threshold

opensoundscape.pytorch_prediction.activation_region_to_box(activation_region, threshold=0.2)¶

draw a rectangle of the activation box as a boolean array (useful for plotting a mask over a spectrogram)

Parameters:	activation_region – a 2-d gradcam activation array threshold=0.2 – minimum value of activation to count as ‘activated’
Returns:	mask 2-d array of 0, 1 where 1’s form a solid box of activated region

opensoundscape.pytorch_prediction.gradcam_region(model, img_paths, img_shape, predictions=None, save_gcams=True, box_threshold=0.2)¶

Compute the GradCam activation region (the area of an image that was most important for classification in the CNN)

Parameters:	model – a pytorch model object img_paths – list of paths to image files = None (predictions) – [list of float] optionally, provide model predictions per file to avoid re-computing = True (save_gcams) – bool, if False only box regions around gcams are saved
Returns:	limits of the box surrounding the gcam activation region, as indices: [ [min row, max row], [min col, max col] ] gcams: (only returned if save_gcams == True) arrays with gcam activation values, shape = shape of image
Return type:	boxes

opensoundscape.pytorch_prediction.in_box(x, y, box_lims)¶

check if an x, y position falls within a set of limits

Parameters:	x – first index y – second index box_lims – [[x low,x high], [y low,y high]]

Returns: True if (x,y) is in box_lims, otherwise False

opensoundscape.pytorch_prediction.predict(model, img_paths, img_shape, batch_size=1, num_workers=12, apply_softmax=True)¶

get multi-class model predictions from a pytorch model for a set of images

Parameters:	model – a pytorch model object (not path to weights) img_paths – a list of paths to RGB png spectrograms batch_size=1 – pytorch parallelization parameter num_workers=12 – pytorch parallelization parameter apply_softmax=True – if True, performs a softmax on raw output of network

returns: df of predictions indexed by file

Raven¶

raven.py: Utilities for dealing with Raven files

opensoundscape.raven.annotation_check(directory)¶

Check Raven annotations files for a non-null class

Input:: directory: The path which contains Raven annotations file
Output:: None

opensoundscape.raven.generate_class_corrections(directory)¶

Generate a CSV to specify any class overrides

Input:

directory: The path which contains Raven annotations files ending in *.selections.txt.lower

Output:

csv (string): A multiline string containing a CSV file with two columns: raw and corrected

opensoundscape.raven.lowercase_annotations(directory)¶

Convert Raven annotation files to lowercase

Input:: directory: The path which contains Raven annotations file
Output:: None

opensoundscape.raven.query_annotations(directory, cls)¶

Given a directory of Raven annotations, query for a specific class

Input:: directory: The path which contains Raven annotations file cls: The class which you would like to query for
Output:: output (string): A multiline string containing annotation file and rows matching the query cls

Species Table¶

Spectrogram¶

spectrogram.py: Utilities for dealing with spectrograms

class opensoundscape.spectrogram.Spectrogram(spectrogram, frequencies, times)¶

Immutable spectrogram container

amplitude(freq_range=None)¶

create an amplitude vs time signal from spectrogram

by summing pixels in the vertical dimension

Args: freq_range=None: sum Spectrogrm only in this range of [low, high] frequencies in Hz (if None, all frequencies are summed)

Returns:	a time-series array of the vertical sum of spectrogram value

bandpass(min_f, max_f)¶

extract a frequency band from a spectrogram

crops the 2-d array of the spectrograms to the desired frequency range

Parameters:	min_f – low frequency in Hz for bandpass high_f – high frequency in Hz for bandpass
Returns:	bandpassed spectrogram object

classmethod from_audio(audio, window_type='hann', window_samples=512, overlap_samples=256, decibel_limits=(-100, -20))¶

create a Spectrogram object from an Audio object

Parameters:	window_type="hann" – see scipy.signal.spectrogram docs for description of window parameter window_samples=512 – number of audio samples per spectrogram window (pixel) overlap_samples=256 – number of samples shared by consecutive windows = (decibel_limits) – limit the dB values to (min,max) (lower values set to min, higher values set to max)
Returns:	opensoundscape.spectrogram.Spectrogram object

classmethod from_file()¶

create a Spectrogram object from a file

Parameters:	file – path of image to load
Returns:	opensoundscape.spectrogram.Spectrogram object

limit_db_range(min_db=-100, max_db=-20)¶

Limit the decibel values of the spectrogram to range from min_db to max_db

values less than min_db are set to min_db values greater than max_db are set to max_db

similar to Audacity’s gain and range parameters

Parameters:	min_db – values lower than this are set to this max_db – values higher than this are set to this
Returns:	Spectrogram object with db range applied

linear_scale(feature_range=(0, 1))¶

Linearly rescale spectrogram values to a range of values using in_range as decibel_limits

Parameters:	feature_range – tuple of (low,high) values for output
Returns:	Spectrogram object with values rescaled to feature_range

min_max_scale(feature_range=(0, 1))¶

Linearly rescale spectrogram values to a range of values using in_range as minimum and maximum

Parameters:	feature_range – tuple of (low,high) values for output
Returns:	Spectrogram object with values rescaled to feature_range

net_amplitude(signal_band, reject_bands=None)¶

create amplitude signal in signal_band and subtract amplitude from reject_bands

rescale the signal and reject bands by dividing by their bandwidths in Hz (amplitude of each reject_band is divided by the total bandwidth of all reject_bands. amplitude of signal_band is divided by badwidth of signal_band. )

Parameters:	signal_band – [low,high] frequency range in Hz (positive contribution) band (reject) – list of [low,high] frequency ranges in Hz (negative contribution)

return: time-series array of net amplitude

plot(inline=True, fname=None, show_colorbar=False)¶

Plot the spectrogram with matplotlib.pyplot

Parameters:	inline=True – fname=None – specify a string path to save the plot to (ending in .png/.pdf) show_colorbar – include image legend colorbar from pyplot

to_image(shape=None, mode='RGB', spec_range=[-100, -20])¶

create a Pillow Image from spectrogram linearly rescales values from db_range (default [-100, -20]) to [255,0] (ie, -20 db is loudest -> black, -100 db is quietest -> white)

Parameters:	destination – a file path (string) shape=None – tuple of image dimensions, eg (224,224) mode="RGB" – RGB for 3-channel color or “L” for 1-channel grayscale spec_range=[-100,-20] – the lowest and highest possible values in the spectrogram
Returns:	Pillow Image object

trim(start_time, end_time)¶

extract a time segment from a spectrogram

Parameters:	start_time – in seconds end_time – in seconds
Returns:	spectrogram object from extracted time segment

Taxa¶

a set of utilites for converting between scientific and common names of bird species in different naming systems (xeno canto and bird net)

opensoundscape.taxa.bn_common_to_sci(common)¶: convert bird net common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated

opensoundscape.taxa.common_to_sci(common)¶: convert bird net common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated

opensoundscape.taxa.get_species_list()¶: list of scientific-names (lowercase-hyphenated) of species in the loaded species table

opensoundscape.taxa.sci_to_bn_common(scientific)¶: convert scientific name as lowercase-hyphenated to birdnet common name as lowercasenospaces

opensoundscape.taxa.sci_to_xc_common(scientific)¶: convert scientific name as lowercase-hyphenated to xeno-canto common name as lowercasenospaces

opensoundscape.taxa.xc_common_to_sci(common)¶: convert xeno-canto common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated

Torch Spectrogram Augmentation¶

These functions were implemented for PyTorch in the following repository https://github.com/zcaceres/spec_augment The original paper is available on https://arxiv.org/abs/1904.08779

Torch Training¶

opensoundscape.torch.train.train(save_dir, model, train_dataset, valid_dataset, optimizer, loss_fn, epochs=25, batch_size=1, num_workers=0, log_every=5, tensor_augment=False, debug=False, print_logging=True, save_scores=False)¶

Train a model

Input:

save_dir: A directory to save intermediate results model: A binary torch model,

e.g. torchvision.models.resnet18(pretrained=True)

must override classes, e.g. model.fc = torch.nn.Linear(model.fc.in_features, 2)

train_dataset: The training Dataset, e.g. created by SingleTargetAudioDataset() valid_dataset: The validation Dataset, e.g. created by SingleTargetAudioDataset() optimizer: A torch optimizer, e.g. torch.optim.SGD(model.parameters(), lr=1e-3) loss_fn: A torch loss function, e.g. torch.nn.CrossEntropyLoss() epochs: The number of epochs [default: 25] batch_size: The size of the batches [default: 1] num_workers: The number of cores to use for batch preparation [default: 1] log_every: Log statistics when epoch % log_every == 0 [default: 5] tensor_augment: Whether or not to use the tensor augment procedures [default: False] debug: Whether or not to write intermediate images [default: False] print_logging: Whether to print training progress to stdout [default: True] save_scores: Whether to save the scores on the train/val set each epoch [default: False]

Side Effects:

Write a file epoch-{epoch}.tar containing (rate of log_every): - Model state dictionary - Optimizer state dictionary - Labels in YAML format - Train: loss, accuracy, precision, recall, and f1 score - Validation: accuracy, precision, recall, and f1 score - train_dataset.label_dict Write a metadata file with parameter values to save_dir/metadata.txt

Output:

None

Effects:

model parameters are saved to