Machine Learning¶

Data Selection¶

opensoundscape.data_selection.add_binary_numeric_labels(input_df, label, input_column='Labels', output_column='NumericLabels')¶

Add binary numeric labels to dataframe based on label

Given a dataframe and a label from input_column produce a new dataframe with an output_column and a label map

Parameters:	input_df – A dataframe label – The label to set to 1 input_column – The column to read labels from output_column – The column to write numeric labels to
Returns:	A dataframe with an additional output_column label_map: A dictionary, keys are f”not_{label}” and f”{label}”, values are 0 and 1
Return type:	output_df

opensoundscape.data_selection.add_numeric_labels(input_df, input_column='Labels', output_column='NumericLabels')¶

Add numeric labels to dataframe

Given a dataframe with input_column produce a new dataframe with an output_column and a label map

Parameters:	input_df – A dataframe input_column – The column to read labels from output_column – The column to write numeric labels to
Returns:	A dataframe with an additional output_column label_map: A dictionary, keys are the unique labels and monotonically increasing values starting at 0
Return type:	output_df

opensoundscape.data_selection.expand_multi_labeled(input_df, column_header='Labels', label_separator='|')¶

Given a multi-labeled dataframe, generate a singly-labeled dataframe

Given a Dataframe with a “Labels” column that is multi-labeled (e.g. “hello|world”) split the row into singly labeled rows.

Parameters:	input_df – A Dataframe with a multi-labeled column column_header – The column containing multiple labels [default: “Labels”] label_separator – Multiple labels are separated by this [default: “\|”]
Returns:	A Dataframe with singly-labeled column in column_header
Return type:	output_df

opensoundscape.data_selection.train_valid_split(input_df, stratify_from_column='Labels', train_size=0.8, random_state=101)¶

Split a dataframe into train and validation dataframes

Given an input dataframe with a labels column split each unique label into a train size and 1 - train_size for training and validation sets. If stratify_from_column is None don’t stratify.

Parameters:	input_df – A dataframe stratify_from_column – Name of the column that labels should come from [default: “Labels”] - given None will not attempt stratified sampling train_size – The decimal fraction to use for the training set [default: 0.8] random_state – The random state to use for train_test_split [default: 101]
Returns:	A Dataframe containing the training set valid_df: A Dataframe containing the validation set
Return type:	train_df

opensoundscape.data_selection.upsample(input_df, label_column='Labels', random_state=None)¶

Given a input DataFrame upsample to maximum value

Upsampling removes the class imbalance in your dataset. Rows for each label are repeated up to max_count // rows. Then, we randomly sample the rows to fill up to max_count.

Parameters:	input_df – A DataFrame to upsample label_column – The column to draw unique labels from random_state – Set the random_state during sampling
Returns:	An upsampled DataFrame
Return type:	df

Datasets¶

class opensoundscape.datasets.SingleTargetAudioDataset(df, label_dict, filename_column='Destination', from_audio=True, label_column=None, height=224, width=224, add_noise=False, save_dir=None, random_trim_length=None, extend_short_clips=False, max_overlay_num=0, overlay_prob=0.2, overlay_weight='random', overlay_class=None, audio_sample_rate=22050, debug=None)¶

Single Target Audio -> Image Dataset

Given a DataFrame with audio files in one of the columns, generate a Dataset of spectrogram images for basic machine learning tasks.

This class provides access to several types of augmentations that act on audio and images with the following arguments: - add_noise: for adding RandomAffine and ColorJitter noise to images - random_trim_length: for only using a short random clip extracted from the training data - max_overlay_num / overlay_prob / overlay_weight:

controlling the maximum number of additional spectrograms to overlay, the probability of overlaying an individual spectrogram, and the weight for the weighted sum of the spectrograms

Additional augmentations on tensors are available when calling train() from the module opensoundscape.torch.train.

Parameters:	df – A DataFrame with a column containing audio files label_dict – a dictionary mapping numeric labels to class names, - for example: {0:’American Robin’,1:’Northern Cardinal’} - pass None if you wish to retain numeric labels filename_column – The column in the DataFrame which contains paths to data [default: Destination] from_audio – Whether the raw dataset is audio [default: True] label_column – The column with numeric labels if present [default: None] height – Height for resulting Tensor [default: 224] width – Width for resulting Tensor [default: 224] add_noise – Apply RandomAffine and ColorJitter filters [default: False] save_dir – Save images to a directory [default: None] random_trim_length – Extract a clip of this many seconds of audio starting at a random time. If None, the original clip will be used [default: None] extend_short_clips – If a file to be overlaid or trimmed from is too short, extend it to the desired length by repeating it. [default: False] max_overlay_num – The maximum number of additional images to overlay, each with probability overlay_prob [default: 0] overlay_prob – Probability of an image from a different class being overlayed (combined as a weighted sum) on the training image. typical values: 0, 0.66 [default: 0.2] overlay_weight – The weight given to the overlaid image during augmentation. When ‘random’, will randomly select a different weight between 0.2 and 0.5 for each overlay. When not ‘random’, should be a float between 0 and 1 [default: ‘random’] overlay_class – The label of the class that overlays should be drawn from. Must be specified if max_overlay_num > 0. If ‘different’, draws overlays from any class that is not the same class as the audio. If set to a class label, draws overlays from that class. When creating a presence/absence classifier, set overlay_class equal to the absence class label [default: None] audio_sample_rate – resample audio to this sample rate; specify None to use original audio sample rate [default: 22050] debug – path to save img files, images are created from the tensor immediately before it is returned. When None, does not save images. [default: None]
Returns:	{ “X”: (3, H, W) , “y”: (1) if label_column != None }
Return type:	Dictionary

image_from_audio(audio, mode='RGB')¶

Create a PIL image from audio

Parameters:	audio – audio object mode – PIL image mode, e.g. “L” or “RGB” [default: RGB]

overlay_random_image(original_image, original_length, original_class, original_path)¶

Overlay an image from another class

Select a random file from a different class. Trim if necessary to the same length as the given image. Overlay the images on top of each other with a weight

class opensoundscape.datasets.SplitterDataset(wavs, annotations=False, label_corrections=None, overlap=1, duration=5, output_directory='segments', include_last_segment=False, column_separator='t', species_separator='|')¶

A PyTorch Dataset for splitting a WAV files

Segments will be written to the output_directory

Parameters:

wavs – A list of WAV files to split
annotations – Should we search for corresponding annotations files? (default: False)
label_corrections – Specify a correction labels CSV file w/ column headers “raw” and “corrected” (default: None)
overlap – How much overlap should there be between samples (units: seconds, default: 1)
duration – How long should each segment be? (units: seconds, default: 5)
Where should segments be written? (default (output_directory) – segments/)
include_last_segment – Do you want to include the last segment? (default: False)
column_separator – What character should we use to separate columns (default: ” “)
species_separator – What character should we use to separate species (default: “|”)

Returns:

A list of CSV rows (separated by column_separator) containing: the source audio, segment begin time (seconds), segment end time (seconds), segment audio, and present classes separated by species_separator if annotations were requested

Return type:

output

opensoundscape.datasets.annotations_with_overlaps_with_clip(df, begin, end)¶

Determine if any rows overlap with current segment

Parameters:	df – A dataframe containing a Raven annotation file begin – The begin time of the current segment (unit: seconds) end – The end time of the current segment (unit: seconds)
Returns:	A dataframe of annotations which overlap with the begin/end times
Return type:	sub_df

opensoundscape.datasets.get_md5_digest(input_string)¶

Generate MD5 sum for a string

Parameters:	input_string – An input string
Returns:	A string containing the md5 hash of input string
Return type:	output

Grad Cam¶

Metrics¶

class opensoundscape.metrics.Metrics(classes, dataset_len)¶

Basic Example

See opensoundscape.torch.train for an in-depth example

``` dataset = Dataset(…) dataloader = DataLoader(dataset, …) classes = [0, 1, 2, 3, 4] # An example list of classes for epoch in epochs:

metrics = Metrics(classes, len(dataset)) for batch in dataloader:

X, y = batch[“X”], batch[“y”] targets = y.squeeze(0) # dim: (batch_size) … loss = … # dim: (0) predictions = … # dim: (batch_size) metrics.accumulate_batch_metrics(

loss.item(), targets.cpu(), predictions.cpu()

)

metrics_dictionary = metrics.compute_epoch_metrics()

```

accumulate_batch_metrics(loss, targets, predictions)¶

For a batch, accumulate loss and confusion matrix

For validation pass 0 for loss.

Parameters:	loss – The loss for this batch targets – The correct y labels predictions – The predicted labels

compute_epoch_metrics()¶

Compute metrics from learning

Computes the loss and accuracy, precision, recall, and f1 scores from the confusion matrix and returns dictionary with metric name as keys and their corresponding values

Returns:	[loss, accuracy, precision, recall, f1, confusion_matrix]
Return type:	dictionary with keys

PyTorch Prediction¶

opensoundscape.torch.predict.predict(model, prediction_dataset, batch_size=1, num_workers=1, apply_softmax=True, label_dict=None)¶

Generate predictions on a dataset from a binary pytorch model object

Parameters:

model – A binary torch model, e.g. torchvision.models.resnet18(pretrained=True) - must override classes, e.g. model.fc = torch.nn.Linear(model.fc.in_features, 2)
prediction_dataset – a pytorch dataset object that returns tensors, such as datasets.SingleTargetAudioDataset()
batch_size – The size of the batches (# files) [default: 1]
num_workers – The number of cores to use for batch preparation [default: 1] - if you want to use all the cores on your machine, set it to 0 (this could freeze your computer)
apply_softmax – Apply a softmax activation layer to the raw outputs of the model
label_dict – List of names of each class, with indices corresponding to NumericLabels [default: None] - if None, the dataframe returned will have numeric column names - if list of class names, returned dataframe will have class names as column names

Returns:

A dataframe with the CNN prediction results for each class and each file

Notes

if label_dict is not None, the returned dataframe’s columns will be class names instead of numeric labels

PyTorch Spectrogram Augmentation¶

These functions were implemented for PyTorch in the following repository https://github.com/zcaceres/spec_augment The original paper is available on https://arxiv.org/abs/1904.08779

PyTorch Training¶

opensoundscape.torch.train.train(save_dir, model, train_dataset, valid_dataset, optimizer, loss_fn, epochs=25, batch_size=1, num_workers=0, log_every=5, tensor_augment=False, debug=False, print_logging=True, save_scores=False)¶

Train a model

Parameters:

save_dir – A directory to save intermediate results
model – A binary torch model, - e.g. torchvision.models.resnet18(pretrained=True) - must override classes, e.g. model.fc = torch.nn.Linear(model.fc.in_features, 2)
train_dataset – The training Dataset, e.g. created by SingleTargetAudioDataset()
valid_dataset – The validation Dataset, e.g. created by SingleTargetAudioDataset()
optimizer – A torch optimizer, e.g. torch.optim.SGD(model.parameters(), lr=1e-3)
loss_fn – A torch loss function, e.g. torch.nn.CrossEntropyLoss()
epochs – The number of epochs [default: 25]
batch_size – The size of the batches [default: 1]
num_workers – The number of cores to use for batch preparation [default: 1]
log_every – Log statistics when epoch % log_every == 0 [default: 5]
tensor_augment – Whether or not to use the tensor augment procedures [default: False]
debug – Whether or not to write intermediate images [default: False]
print_logging – Whether to print training progress to stdout [default: True]
save_scores – Whether to save the scores on the train/val set each epoch [default: False]

Effects:

Write a file epoch-{epoch}.tar containing (rate of log_every):

Model state dictionary
Optimizer state dictionary
Labels in YAML format
Train: loss, accuracy, precision, recall, and f1 score
Validation: accuracy, precision, recall, and f1 score
train_dataset.label_dict

Write a metadata file with parameter values to save_dir/metadata.txt

Returns:	None