opensoundscape.ml package

Submodules

opensoundscape.ml.cam module

Class activation maps (CAM) for OpenSoundscape models

class opensoundscape.ml.cam.CAM(base_image, activation_maps=None, gbp_maps=None)[source]

Bases: object

Object to hold and view Class Activation Maps, including guided backprop

Stores activation maps as .activation_maps, and guided backprop as .gbp_cams

each is a Series indexed by class

create_rgb_heatmaps(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), gbp_normalization_q=99)[source]

create rgb numpy array of heatmaps overlaid on the sample

Can choose a subset of classes and activation/backprop modes

Parameters:
  • class_subset – iterable of classes to visualize with activation maps - default None plots all classes - each item must be in the index of self.gbp_map / self.activation_maps - note that a class None is created by cnn.generate_cams() when classes are not specified during CNN.generate_cams()

  • mode – str selecting which maps to visualize, one of: ‘activation’ [default]: overlay activation map ‘backprop’: overlay guided back propogation result ‘backprop_and_activation’: overlay product of both maps None: do not overlay anything on the original sample

  • show_base – if False, does not plot the image of the original sample [default: True]

  • alpha – opacity of the activation map overlap [default: 0.5]

  • color_cycle – iterable of colors activation maps - cycles through the list using one color per class

  • gbp_normalization_q – guided backprop is normalized such that the q’th percentile of the map is 1. [default: 99]. This helps avoid gbp maps that are too dark to see. Lower values make brighter and noiser maps, higher values make darker and smoother maps.

Returns:

numpy array of shape [w, h, 3] representing the image with CAM heatmaps if mode is None, returns the original sample if show_base is False, returns just the heatmaps if mode is None _and_ show_base is False, returns None

plot(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), figsize=None, plt_show=True, save_path=None, gbp_normalization_q=99)[source]

Plot per-class activation maps, guided back propogations, or their products

Do not pass both mode=None and show_base=False.

Parameters:
  • class_subset – see create_rgb_heatmaps

  • mode – see create_rgb_heatmaps

  • show_base – see create_rgb_heatmaps

  • alpha – see create_rgb_heatmaps

  • color_cycle – see create_rgb_heatmaps

  • gbp_normalization_q – see create_rgb_heatmaps

  • figsize – the figure size for the plot [default: None]

  • plt_show – if True, runs plt.show() [default: True] - ignored if return_numpy=True

  • save_path – path to save image to [default: None does not save file]

Returns:

(fig, ax) of matplotlib figure, or np.array if return_numpy=True

Note: if base_image does not have 3 channels, channels are averaged then copied across 3 RGB channels to create a greyscale image

Note 2: If return_numpy is true, fig and ax are never created, it simply creates

a numpy array representing the image with the CAMs overlaid and returns it

opensoundscape.ml.cam.normalize_q(x, q=99)[source]

Normalize x such that q-th percentile value is 1.0

opensoundscape.ml.cnn module

classes for pytorch machine learning models in opensoundscape

For tutorials, see notebooks on opensoundscape.org

class opensoundscape.ml.cnn.BaseClassifier(*args: Any, **kwargs: Any)[source]

Bases: SpectrogramClassifier

alias for SpectrogramClassifier

improves compatibility with older code / previous opso versions, which had a BaseClassifier class as a parent to the CNN class

class opensoundscape.ml.cnn.BaseModule[source]

Bases: object

property classifier_params

return the parameters of the classifier layer of the network

override this method if the classifier parameters should be retrieved in a different way

configure_optimizers(reset_optimizer=False, restart_scheduler=False)[source]

standard Lightning method to initialize an optimizer and learning rate scheduler

Lightning uses this function at the start of training. Weirdly it needs to return {“optimizer”: optimizer, “scheduler”: scheduler}.

Initializes the optimizer and learning rate scheduler using the parameters self.optimizer_params and self.scheduler_params, which are dictionaries with a key “class” and a key “kwargs” (containing a dictionary of keyword arguments to initialize the class with). We initialize the class with the kwargs and the appropriate first argument: optimizer=opt_cls(self.parameters(), **opt_kwargs) and scheduler=scheduler_cls(optimizer, **scheduler_kwargs)

You can also override this method and write one that returns {“optimizer”: optimizer, “scheduler”: scheduler}

Uses the attributes: - self.optimizer_params: dictionary with “class” key such as torch.optim.Adam,

and “kwargs”, dict of keyword args for class’s init

  • self.scheduler_params: dictionary with “class” key such as

    torch.optim.lr_scheduler.StepLR, and and “kwargs”, dict of keyword args for class’s init

  • self.lr_scheduler_step: int, number of times lr_scheduler.step() has been called
    • can set to -1 to restart learning rate schedule

    • can set to another value to start lr scheduler from an arbitrary position

Note: when used by lightning, self.optimizer and self.scheduler should not be modified directly, lightning handles these internally. Lightning will call the method without passing reset_optimizer or restart_scheduler, so default=False results in not modifying .optimizer or .scheduler

Documentation: https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningModule.html#lightning.pytorch.core.LightningModule.configure_optimizers :param reset_optimizer: if True, initializes the optimizer from scratch even if self.optimizer is not None :param reset_scheduler: if True, initializes the scheduler from scratch even if self.scheduler is not None

Returns:

dictionary with keys “optimizer” and “scheduler” containing the optimizer and learning rate scheduler objects to use during training

inference_dataloader_cls

a DataLoader class to use for inference, defaults to SafeAudioDataloader

loss_fn

specify a loss function to use for training, eg BCEWithLogitsLoss_hot,

by initializing a callable object or passing a function

lr_scheduler_params

dictionary with “class” and “kwargs” to class.__init__

for example, to use Cosine Annealing, set: ```python model.lr_scheduler_params = {

“class”: torch.optim.lr_scheduler.CosineAnnealingLR, “kwargs”:{

“T_max”: n_epochs, “eta_min”: 1e-7, “last_epoch”:self.current_epoch-1

}

type:

learning rate schedule

network

a pytorch Module such as Resnet18 or a custom object

optimizer_params

dictionary with “class” and “kwargs” to class.__init__

for example, to use Adam optimizer set: ```python my_instance.optimizer_params = {

“class”: torch.optim.Adam, “kwargs”: {

“lr”: 0.001, “weight_decay”: 0.0005,

},

}

type:

optimizer settings

predict_dataloader(samples, collate_fn=<function collate_audio_samples>, **kwargs)[source]

generate dataloader for inference (predict/validate/test)

Args: see self.inference_dataloader_cls docstring for arguments

**kwargs: any arguments to pass to the DataLoader __init__ Note: these arguments are fixed and should not be passed in kwargs: - shuffle=False: retain original sample order

preprocessor

an instance of BasePreprocessor or subclass that preprocesses audio samples into tensors

The preprocessor contains .pipline, and ordered set of Actions to run

preprocessor will have attributes .sample_duration (seconds) and .height, .width, .channels for output shape (input shape to self.network)

The pipeline can be modified by adding or removing actions, and by modifying parameters: ```python

my_obj.preprocessor.remove_action(‘add_noise’) my_obj.preprocessor.insert_action(‘add_noise’,Action(my_function),after_key=’frequency_mask’)

```

Or, the preprocessor can be replaced with a different or custom preprocessor, for instance: ```python from opensoundscape.preprocess import AudioPreprocessor my_obj.preprocessor = AudioPreprocessor(

sample_duration=5, sample_rate=22050

) # this preprocessor returns 1d arrays of the audio signal ```

scheduler

torch.optim.lr_scheduler object for learning rate scheduling

score_metric

choose one of the keys in self.torch_metrics to use as the overall score metric

this metric will be used to determine the best model during training

torch_metrics

object pairs to compute metrics during training/validation

Type:

specify torchmetrics “name”

train_dataloader(samples, bypass_augmentations=False, collate_fn=<function collate_audio_samples>, **kwargs)[source]

generate dataloader for training

train_loader samples batches of images + labels from training set

Args: see self.train_dataloader_cls docstring for arguments

**kwargs: any arguments to pass to the DataLoader __init__ Note: some arguments are fixed and should not be passed in kwargs: - shuffle=True: shuffle samples for training - bypass_augmentations=False: apply augmentations to training samples

train_dataloader_cls

a DataLoader class to use for training, defaults to SafeAudioDataloader

training_step(samples, batch_idx)[source]

a standard Lightning method used within the training loop, acting on each batch

returns loss

Effects:

logs metrics and loss to the current logger

use_amp

if True, uses automatic mixed precision for training

validation_step(samples, batch_idx, dataloader_idx=0)[source]

currently only used for lightning

not used by SpectrogramClassifier

class opensoundscape.ml.cnn.CNN(*args: Any, **kwargs: Any)[source]

Bases: SpectrogramClassifier

alias for SpectrogramClassifier

improves comaptibility with older code / previous opso versions

exception opensoundscape.ml.cnn.ChannelDimCheckError[source]

Bases: Exception

class opensoundscape.ml.cnn.InceptionV3(*args: Any, **kwargs: Any)[source]

Bases: SpectrogramClassifier

Child of SpectrogramClassifier class for InceptionV3 architecture

classmethod from_torch_dict()[source]
training_step(samples, batch_idx)[source]

Training step for pytorch lightning

Parameters:
  • batch – a batch of data from the DataLoader

  • batch_idx – index of the batch

Returns:

loss value for the batch

Return type:

loss

class opensoundscape.ml.cnn.SpectrogramClassifier(*args: Any, **kwargs: Any)[source]

Bases: SpectrogramModule, Module

current_epoch

track number of trained epochs

property device
embed(samples, target_layer=None, progress_bar=True, return_preds=False, avgpool=True, return_dfs=True, audio_root=None, **dataloader_kwargs)[source]

Generate embeddings (intermediate layer outputs) for audio files/clips

Note: to capture embeddings on multiple layers, use self.__call__ with intermediate_layers argument directly. This wrapper only allows one target_layer.

Note: Output can be n-dimensional array (return_dfs=False) or pd.DataFrame with multi-index like .predict() (return_dfs=True). If avgpool=False, return_dfs is forced to False since we can’t create a DataFrame with >2 dimensions.

Parameters:
  • samples – same as CNN.predict(): list of file paths, OR pd.DataFrame with index containing audio file paths, OR a pd.DataFrame with multi-index (file, start_time, end_time)

  • target_layers – layers from self.model._modules to extract outputs from - if None, attempts to use self.model.embedding_layer as default

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

  • return_preds – bool, if True, returns two outputs (embeddings, logits)

  • avgpool – bool, if True, applies global average pooling to intermediate outputs i.e. averages across all dimensions except first to get a 1D vector per sample

  • return_dfs – bool, if True, returns embeddings as pd.DataFrame with multi-index like .predict(). if False, returns np.array of embeddings [default: True]. If avg_pool=False, overrides to return np.array since we can’t have a df with >2 dimensions

  • audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files

  • self.predict_dataloader() (dataloader_kwargs are passed to)

Returns: (embeddings, preds) if return_preds=True or embeddings if return_preds=False

types are pd.DataFrame if return_dfs=True, or np.array if return_dfs=False

eval(targets=None, scores=None, reset_metrics=True)[source]

compute single-target or multi-target metrics from targets and scores

Or, compute metrics on accumulated values in the TorchMetrics if targets is None

By default, the overall model score is “map” (mean average precision) for multi-target models (self.single_target=False) and “f1” (average of f1 score across classes) for single-target models).

update self.torch_metrics to include the desired metrics

Parameters:
  • targets – 0/1 for each sample and each class

  • None (- if targets is) – (using accumulated values)

  • self.torch_metrics (runs metric.compute() on each of) – (using accumulated values)

  • scores – continuous values in 0/1 for each sample and class

  • None

  • ignored (this is)

  • reset_metrics – if True, resets the metrics after computing them [default: True]

Returns:

value)

Return type:

dictionary of metrics (name

Raises:

AssertionError – if targets are outside of range [0,1]

generate_cams(samples, method='gradcam', classes=None, target_layers=None, guided_backprop=False, progress_bar=True, **kwargs)[source]

Generate a activation and/or backprop heatmaps for each sample

Parameters:
  • samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths

  • method

    method to use for activation map. Can be str (choose from below) or a class of pytorch_grad_cam (any subclass of BaseCAM), or None if None, activation maps will not be created [default:’gradcam’]

    str can be any of the following:

    ”gradcam”: pytorch_grad_cam.GradCAM, “hirescam”: pytorch_grad_cam.HiResCAM, “scorecam”: pytorch_grad_cam.ScoreCAM, “gradcam++”: pytorch_grad_cam.GradCAMPlusPlus, “ablationcam”: pytorch_grad_cam.AblationCAM, “xgradcam”: pytorch_grad_cam.XGradCAM, “eigencam”: pytorch_grad_cam.EigenCAM, “eigengradcam”: pytorch_grad_cam.EigenGradCAM, “layercam”: pytorch_grad_cam.LayerCAM, “fullgrad”: pytorch_grad_cam.FullGrad, “gradcamelementwise”: pytorch_grad_cam.GradCAMElementWise,

  • classes (list) – list of classes, will create maps for each class [default: None] if None, creates an activation map for the highest scoring class on a sample-by-sample basis

  • target_layers (list) –

    list of target layers for GradCAM - if None [default] attempts to use architecture’s default target_layer Note: only architectures created with opensoundscape 0.9.0+ will have a default target layer. See pytorch_grad_cam docs for suggestions. Note: if multiple layers are provided, the activations are merged across

    layers (rather than returning separate activations per layer)

  • guided_backprop – bool [default: False] if True, performs guided backpropagation for each class in classes. AudioSamples will have attribute .gbp_maps, a pd.Series indexed by class name

  • SafeAudioDataloader (**kwargs are passed to) – (incl: batch_size, num_workers, split_file_into_clips, bypass_augmentations, raise_errors, overlap_fraction, final_clip, other DataLoader args)

Returns:

a list of AudioSample objects with .cam attribute, an instance of the CAM class ( visualize with sample.cam.plot()). See the CAM class for more details

See pytorch_grad_cam documentation for references to the source of each method.

generate_samples(samples, invalid_samples_log=None, return_invalid_samples=False, audio_root=None, **dataloader_kwargs)[source]

Generate AudioSample objects. Input options same as .predict()

Parameters:
  • samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path as str or pathlib.Path

  • args (see .predict() documentation for other)

  • **dataloader_kwargs – any arguments to inference_dataloader_cls.__init__ except samples (uses samples) and collate_fn (uses identity) (Note: default class is SafeAudioDataloader)

Returns:

a list of AudioSample objects - if return_invalid_samples is True, returns second value: list of paths to samples that failed to preprocess

Example: ` from opensoundscappe.preprocess.utils import show_tensor_grid samples = generate_samples(['/path/file1.wav','/path/file2.wav']) tensors = [s.data for s in samples] show_tensor_grid(tensors,columns=3) `

classmethod load(path, unpickle=True)[source]

load a model saved using CNN.save()

Parameters:
  • path – path to file saved using CNN.save()

  • unpickle – if True, passes weights_only=False to torch.load(). This is necessary if the model was saved with pickle=True, which saves the entire model object. If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]

Returns:

new CNN instance

Note: Note that if you used pickle=True when saving, the model object might not load properly across different versions of OpenSoundscape.

load_weights(path, strict=True)[source]

load network weights state dict from a file

For instance, load weights saved with .save_weights() in-place operation

Parameters:
  • path – file path with saved weights

  • strict – (bool) see torch.load()

log_file

specify a path to save output to a text file

logging_level

amount of logging to self.log_file. 0 for nothing, 1,2,3 for increasing logged info

loss_hist

mean batch loss during training

Type:

dictionary of epoch

name = 'SpectrogramClassifier'
predict(samples, batch_size=1, num_workers=0, activation_layer=None, split_files_into_clips=True, clip_overlap=None, clip_overlap_fraction=None, clip_step=None, overlap_fraction=None, final_clip=None, bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, wandb_session=None, return_invalid_samples=False, progress_bar=True, audio_root=None, **dataloader_kwargs)[source]

Generate predictions on a set of samples

Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)

Parameters:
  • samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path (str or pathlib.Path)

  • batch_size – Number of files to load simultaneously [default: 1]

  • num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]

  • activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]

  • split_files_into_clips – If True, internally splits and predicts on clips from longer audio files Otherwise, assumes each row of samples corresponds to one complete sample

  • clip_overlap_fraction – see opensoundscape.utils.generate_clip_times_df

  • clip_overlap – see opensoundscape.utils.generate_clip_times_df

  • clip_step – see opensoundscape.utils.generate_clip_times_df

  • final_clip – see opensoundscape.utils.generate_clip_times_df

  • overlap_fraction – deprecated alias for clip_overlap_fraction

  • bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.

  • invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb

  • return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

  • audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files

  • **dataloader_kwargs – additional arguments to self.predict_dataloader()

Returns:

df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess

Effects:

(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. Afte prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.

(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file

Note: if loading an audio file raises a PreprocessingError, the scores

for that sample will be np.nan

run_validation(validation_df, progress_bar=True, **kwargs)[source]

run validation on a validation set

override this to customize the validation step eg, could run validation on multiple datasets and save performance of each in self.valid_metrics[current_epoch][validation_dataset_name]

Parameters:
  • validation_df – dataframe of validation samples

  • progress_bar – if True, show a progress bar with tqdm

  • **kwargs – passed to self.predict_dataloader()

Returns:

dictionary of evaluation metrics calculated with self.torch_metrics

Return type:

metrics

Effects:

updates self.valid_metrics[current_epoch] with metrics for the current epoch

save(path, save_hooks=False, pickle=False)[source]

save model with weights using torch.save()

load from saved file with cnn.load_model(path)

Parameters:
  • path – file path for saved model object

  • save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()

  • pickle – if True, saves the entire model object using torch.save() Note: if using pickle=True, entire object is pickled, which means that saving and loading model objects across OpenSoundscape versions might not work properly. pickle=True is useful for resuming training, because it retains the state of the optimizer, scheduler, loss function, etc pickle=False is recommended for saving models for inference/deployment/sharing [default: False]

save_weights(path)[source]

save just the weights of the network

This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())

Parameters:

path – location to save weights file

train(train_df, validation_df=None, epochs=1, batch_size=1, num_workers=0, save_path='.', save_interval=1, log_interval=10, validation_interval=1, reset_optimizer=False, restart_scheduler=False, invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, progress_bar=True, audio_root=None, **dataloader_kwargs)[source]

train the model on samples from train_dataset

If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().

Parameters:
  • train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)

  • validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]

  • epochs – number of epochs to train for (1 epoch constitutes 1 view of each training sample)

  • batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation

  • num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)

  • save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]

  • save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.

  • log_interval – interval in batches to print training loss/metrics

  • validation_interval – interval in epochs to test the model on the validation set Note that model will only update it’s best score and save best.model file on epochs that it performs validation.

  • reset_optimizer – if True, resets the optimizer rather than retaining state_dict of self.optimizer [default: False]

  • restart_scheduler – if True, resets the learning rate scheduler rather than retaining state_dict of self.scheduler [default: False]

  • invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example: ` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.train(...,wandb_session=session) session.finish() `

  • audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

  • **dataloader_kwargs – additional arguments passed to train_dataloader()

Effects:

If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.

verbose

amount of logging to stdout. 0 for nothing, 1,2,3 for increasing printed output

class opensoundscape.ml.cnn.SpectrogramModule(architecture, classes, sample_duration, single_target=False, preprocessor_dict=None, preprocessor_cls=<class 'opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor'>, **preprocessor_kwargs)[source]

Bases: BaseModule

Parent class for both SpectrogramClassifier (pytorch) and LightningSpectrogramModule (lightning)

implements functionality that is shared between both pure PyTorch and Lightning classes/workflows

Parameters:
  • architecture – a pytorch Module such as Resnet18 or a custom object

  • classes – list of class names

  • sample_duration – duration of audio samples in seconds

  • single_target – if True, predict only class with max score

  • channels – number of channels in input data

  • sample_height – height of input data

  • sample_width – width of input data

  • preprocessor_dict – dictionary defining preprocessor and parameters, can be generated with preprocessor.to_dict() if not None, will override other preprocessor arguments (sample_duration, sample_height, sample_width, channels)

  • preprocessor_cls – a class object that inherits from BasePreprocessor if preprocessor_dict is None, this class will be instantiated to set self.preprocessor

  • **preprocessor_kwargs – additional arguments to pass to the initialization of the preprocessor class this is ignored if preprocessor_dict is not None

change_classes(new_classes)[source]

change the classes that the model predicts

replaces the network’s final linear classifier layer with a new layer with random weights and the correct number of output features

will raise an error if self.network.classifier_layer is not the name of a torch.nn.Linear layer, since we don’t know how to replace it otherwise

Parameters:

new_classes – list of class names

property classifier

return the classifier layer of the network, based on .network.classifier_layer string

freeze_feature_extractor()[source]

freeze all layers except self.classifier

prepares the model for transfer learning where only the classifier is trained

uses the attribute self.network.classifier_layer (via the .classifier attribute) to identify the classifier layer

if this is not set will raise Exception - use freeze_layers_except() instead

freeze_layers_except(train_layers=None)[source]

Freeze all parameters of a model except the parameters in the target_layer(s)

Freezing parameters means that the optimizer will not update the weights

Modifies the model in place!

Parameters:
  • model – the model to freeze the parameters of

  • train_layers – layer or list/iterable of the layers whose parameters should not be frozen For example: pass model.classifier to train only the classifier

Example 1: ` freeze_all_layers_except(model, model.classifier) `

Example 2: freeze all but 2 layers ` freeze_all_layers_except(model, [model.layer1, model.layer2]) `

lr_scheduler_step

track number of calls to lr_scheduler.step()

set to -1 to restart learning rate schedule from initial lr

this value is used to initialize the lr_scheduler’s last_epoch parameter it is tracked separately from self.current_epoch because the lr_scheduler might be stepped more or less than 1 time per epoch

Note that the initial learning rate is set via self.optimizer_params[‘kwargs’][‘lr’]

network

a pytorch Module such as Resnet18 or a custom object

for convenience, __init__ also allows user to provide string matching a key from opensoundscape.ml.cnn_architectures.ARCH_DICT.

List options: opensoundscape.ml.cnn_architectures.list_architectures()

property single_target
unfreeze()[source]

Unfreeze all layers & parameters of self.network

Enables gradient updates for all layers & parameters

Modifies the object in place

opensoundscape.ml.cnn.get_channel_dim(model)[source]
opensoundscape.ml.cnn.list_model_classes()[source]

return list of available action function keyword strings (can be used to initialize Action class)

opensoundscape.ml.cnn.load_model(path, device=None, unpickle=True)[source]

load a saved model object

This function handles models saved either as pickled objects or as a dictionary including weights, preprocessing parameters, architecture name, etc.

Note that pickled objects may not load properly across different versions of OpenSoundscape, while the dictionary format does not retain the full training state for resuming model training.

Parameters:
  • path – file path of saved model

  • device – which device to load into, eg ‘cuda:1’ [default: None] will choose first gpu if available, otherwise cpu

  • unpickle – if True, passes weights_only=False to torch.load(). This is necessary if the

  • with`pickle=True` (model was saved) – If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]

  • object. (which saves the entire model) – If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]

Returns:

a model object with loaded weights

opensoundscape.ml.cnn.register_model_cls(model_cls)[source]

add class to MODEL_CLS_DICT

this allows us to recreate the class when loading saved model file with load_model()

opensoundscape.ml.cnn.use_resample_loss(model, train_df)[source]

Modify a model to use ResampleLoss for multi-target training

ResampleLoss may perform better than BCE Loss for multitarget problems in some scenarios.

Parameters:
  • model – CNN object

  • train_df – dataframe of labels, used to calculate class frequency

opensoundscape.ml.cnn_architectures module

Module to initialize PyTorch CNN architectures with custom output shape

This module allows the use of several built-in CNN architectures from PyTorch. The architecture refers to the specific layers and layer input/output shapes (including convolution sizes and strides, etc) - such as the ResNet18 or Inception V3 architecture.

We provide wrappers which modify the output layer to the desired shape (to match the number of classes). The way to change the output layer shape depends on the architecture, which is why we need a wrapper for each one. This code is based on pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

To use these wrappers, for example, if your model has 10 output classes, write

my_arch=resnet18(10)

Then you can initialize a model object from opensoundscape.ml.cnn with your architecture:

model=CNN(my_arch,classes,sample_duration)

or override an existing model’s architecture:

model.network = my_arch

Note: the InceptionV3 architecture must be used differently than other architectures - the easiest way is to simply use the InceptionV3 class in opensoundscape.ml.cnn.

opensoundscape.ml.cnn_architectures.alexnet(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for AlexNet architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.change_conv2d_channels(conv2d, num_channels=3, reuse_weights=True)[source]

Modify the number of input channels for a pytorch CNN

This function changes the input shape of a torch.nn.Conv2D layer to accommodate a different number of channels. It attempts to retain weights in the following manner: - If num_channels is less than the original, it will average weights across the original channels and apply them to all new channels. - if num_channels is greater than the original, it will cycle through the original channels, copying them to the new channels

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

  • reuse_weights – if True (default), averages (if num_channels<original)

  • through (or cycles) – and adds them to the new Conv2D

opensoundscape.ml.cnn_architectures.change_fc_output_size(fc, num_classes)[source]

Modify the number of output nodes of a fully connected layer

Parameters:
  • fc – the fully connected layer of the model that should be modified

  • num_classes – number of output nodes for the new fc

opensoundscape.ml.cnn_architectures.densenet121(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for densenet121 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.efficientnet_b0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_b0 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

Note: in v0.10.2, changed from using NVIDIA/DeepLearningExamples:torchhub repo

implementatiuon to native pytorch implementation

opensoundscape.ml.cnn_architectures.efficientnet_b4(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_b4 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

Note: in v0.10.2, changed from using NVIDIA/DeepLearningExamples:torchhub repo

implementatiuon to native pytorch implementation

opensoundscape.ml.cnn_architectures.freeze_params(model)[source]

disable gradient updates for all model parameters

opensoundscape.ml.cnn_architectures.generic_make_arch(constructor, weights, num_classes, embed_layer, cam_layer, name, input_conv2d_layer, linear_clf_layer, freeze_feature_extractor=False, num_channels=3)[source]

works when first layer is conv2d and last layer is fully-connected Linear

input_size = 224

Parameters:
  • constructor – function that creates a torch.nn.Module and takes weights argument

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html Passed to constructor()

  • num_classes – number of output nodes for the final layer

  • embed_layer – specify which layers outputs should be accessed for “embeddings”

  • cam_layer – specify a default layer for GradCAM/etc visualizations

  • name – name of the architecture, used for the constructor_name attribute to re-load from saved version

  • input_conv2d_layer – name of first Conv2D layer that can be accessed with .get_submodule() string formatted as .-delimited list of attribute names or list indices, e.g. “features.0”

  • linear_clf_layer – name of final Linear classification fc layer that can be accessed with .get_submodule() string formatted as .-delimited list of attribute names or list indices, e.g. “classifier.0.fc”

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.inception_v3(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for Inception v3 architecture

Input: 229x229

WARNING: expects (299,299) sized images and has auxiliary output. See InceptionV3 class in opensoundscape.ml.cnn for use.

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.list_architectures()[source]

return list of available architecture keyword strings

opensoundscape.ml.cnn_architectures.register_arch(func)[source]

add architecture to ARCH_DICT

opensoundscape.ml.cnn_architectures.resnet101(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet101 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet152(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet152 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet18(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet18 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet34(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet34 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet50(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet50 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.set_layer_from_name(module, layer_name, new_layer)[source]

assign an attribute of an object using a string name

Parameters:
  • module – object to assign attribute to

  • layer_name – string name of attribute to assign the attribute_name is formatted with . delimiter and can contain either attribute names or list indices e.g. “network.classifier.0.0.fc” sets network.classifier[0][0].fc this type of string is given by torch.nn.Module.named_modules()

  • new_layer – replace layer with this torch.nn.Module instance

  • also (see) – torch.nn.Module.named_modules(), torch.nn.Module.get_submodule()

opensoundscape.ml.cnn_architectures.squeezenet1_0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for squeezenet architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.unfreeze_params(model)[source]

enable gradient updates for all model parameters

opensoundscape.ml.cnn_architectures.vgg11_bn(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for vgg11 architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

opensoundscape.ml.dataloaders module

class opensoundscape.ml.dataloaders.SafeAudioDataloader(*args: Any, **kwargs: Any)[source]

Bases: DataLoader

Create DataLoader for inference, wrapping a SafeDataset

SafeDataset contains AudioFileDataset or AudioSampleDataset depending on sample type

During inference, we allow the user to pass any of 3 things to samples: - list of file paths - Dataframe with file as index - Dataframe with file, start_time, end_time of clips as index

If file as index, default split_files_into_clips=True means that it will automatically determine the number of clips that can be created from the file (with overlap between subsequent clips based on overlap_fraction)

Parameters:
  • samples – any of the following: - list of file paths - Dataframe with file as index - Dataframe with file, start_time, end_time of clips as index - CategoricalLabels object

  • preprocessor – preprocessor object, eg AudioPreprocessor or SpectrogramPreprocessor

  • split_files_into_clips=True – use AudioSplittingDataset to automatically split audio files into appropriate-lengthed clips

  • clip_overlap_fraction – see opensoundscape.utils.generate_clip_times_df

  • clip_overlap – see opensoundscape.utils.generate_clip_times_df

  • clip_step – see opensoundscape.utils.generate_clip_times_df

  • final_clip – see opensoundscape.utils.generate_clip_times_df

  • overlap_fraction – deprecated alias for clip_overlap_fraction

  • bypass_augmentations – if True, don’t apply any augmentations [default: True]

  • raise_errors – if True, raise errors during preprocessing [default: False]

  • collate_fn – function to collate list of AudioSample objects into batches [default: idenitty] returns list of AudioSample objects, use collate_fn=opensoundscape.sample.collate_audio_samples to return a tuple of (data, labels) tensors

  • audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files

  • **kwargs – any arguments to torch.utils.data.DataLoader

Returns:

DataLoader that returns lists of AudioSample objects when iterated (if collate_fn is identity)

preprocessor

do not override or modify this attribute, as it will have no effect

samples

do not override or modify this attribute, as it will have no effect

opensoundscape.ml.datasets module

Preprocessors: pd.Series child with an action sequence & forward method

class opensoundscape.ml.datasets.AudioFileDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

Base class for audio datasets with OpenSoundscape (use in place of torch Dataset)

Custom Dataset classes should subclass this class or its children.

Datasets in OpenSoundscape contain a Preprocessor object which is responsible for the procedure of generating a sample for a given input. The DataLoader handles a dataframe of samples (and potentially labels) and uses a Preprocessor to generate samples from them.

Parameters:
  • samples

    the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index of (path,start_time,end_time) per clip, OR - a list or np.ndarray of audio file paths

    Notes for input dataframe:
    • df must have audio paths in the index.

    • If label_df has labels, the class names should be the columns, and

    the values of each row should be 0 or 1.
    • If data does not have labels, label_df will have no columns

  • preprocessor – an object of BasePreprocessor or its children which defines the operations to perform on input samples

  • audio_root – optionally pass a root directory (pathlib.Path or str) to prepend to each file path - if None (default), samples must contain full paths to files

Returns:

sample (AudioSample object)

Raises:

PreprocessingError if exception is raised during __getitem__

Effects:
self.invalid_samples will contain a set of paths that did not successfully

produce a list of clips with start/end times, if split_files_into_clips=True

audio_root

path to prepend to all audio file paths when loading

bypass_augmentations

if True, skips Actions with .is_augmentation=True

class_counts()[source]

count number of each label

classes

list of classes to which multi-hot labels correspond

classmethod from_categorical_df(categorical_labels, preprocessor, class_list, bypass_augmentations=False)[source]

Create AudioFileDataset from a DataFrame with a column listing categorical labels

e.g. where df[‘labels’] = [[‘a’,’b’], [], [‘a’,’c’]]

Parameters:
  • categorical_labels – DataFrame with index (file) or (file, start_time, end_time) and ‘label’ column containing lists of labels or integers corresponding to class names

  • preprocessor – Preprocessor object

  • bypass_augmentations – if True, skip augmentations with .is_augmentation=True

Returns:

AudioFileDataset object

head(n=5)[source]

out-of-place copy of first n samples

performs df.head(n) on self.label_df

Parameters:
  • n – number of first samples to return, see pandas.DataFrame.head()

  • [default – 5]

Returns:

a new dataset object

invalid_samples

set of file paths that raised exceptions during preprocessing

label_df

dataframe containing file paths, clip times, and multi-hot labels (one column per class)

preprocessor

Preprocessor object containing a .pipeline of ordered preprocessing operations

sample(**kwargs)[source]

out-of-place random sample

creates copy of object with n rows randomly sampled from label_df

Args: see pandas.DataFrame.sample()

Returns:

a new dataset object

class opensoundscape.ml.datasets.AudioSplittingDataset(*args: Any, **kwargs: Any)[source]

Bases: AudioFileDataset

class to load clips of longer files rather than one sample per file

Internally creates even-lengthed clips split from long audio files.

If file labels are provided, applies copied labels to all clips from a file

NOTE: If you’ve already created a dataframe with clip start and end times, you can use AudioFileDataset. This class is only necessary if you wish to automatically split longer files into clips (providing only the file paths).

Parameters:
  • AudioFileDataset.__init__ (samples and preprocessor are passed to)

  • opensoundscape.utils.make_clip_df (**kwargs are passed to)

exception opensoundscape.ml.datasets.InvalidIndexError[source]

Bases: Exception

opensoundscape.ml.lightning module

class opensoundscape.ml.lightning.LightningSpectrogramModule(*args: Any, **kwargs: Any)[source]

Bases: SpectrogramModule, LightningModule

fit_with_trainer(train_df, validation_df=None, epochs=1, batch_size=1, num_workers=0, save_path='.', invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, checkpoint_path=None, **kwargs)[source]

train the model on samples from train_dataset

If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().

Parameters:
  • train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)

  • validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]

  • batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation

  • num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)

  • save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]

  • save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.

  • log_interval – interval in batches to print training loss/metrics

  • validation_interval – interval in epochs to test the model on the validation set Note that model will only update it’s best score and save best.model file on epochs that it performs validation.

  • invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to (Note: can also pass logger kwarg with any Lightning logger object) - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example: ` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.fit_with_trainer(...,wandb_session=session) session.finish() `

  • **kwargs – any arguments to pytorch_lightning.Trainer(), such as accelerator, precision, logger, accumulate_grad_batches, etc. Note: the max_epochs kwarg is overridden by the epochs argument

Returns:

a trained pytorch_lightning.Trainer object

Effects:

If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.

forward(samples)[source]

standard Lightning method defining action to take on each batch for inference

typically returns logits (raw, untransformed model outputs)

load_weights(path, strict=True)[source]

load network weights state dict from a file

For instance, load weights saved with .save_weights() in-place operation

Parameters:
  • path – file path with saved weights

  • strict – (bool) see torch.Module.load_state_dict()

predict_with_trainer(samples, batch_size=1, num_workers=0, activation_layer=None, split_files_into_clips=True, clip_overlap=None, clip_overlap_fraction=None, clip_step=None, overlap_fraction=None, final_clip=None, bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, return_invalid_samples=False, lightning_trainer_kwargs=None, dataloader_kwargs=None)[source]

Generate predictions on a set of samples

Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)

Parameters:
  • samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path (str or pathlib.Path)

  • batch_size – Number of files to load simultaneously [default: 1]

  • num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]

  • activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]

  • split_files_into_clips – If True, internally splits and predicts on clips from longer audio files Otherwise, assumes each row of samples corresponds to one complete sample

  • clip_overlap_fraction – see opensoundscape.utils.generate_clip_times_df

  • clip_overlap – see opensoundscape.utils.generate_clip_times_df

  • clip_step – see opensoundscape.utils.generate_clip_times_df

  • final_clip – see opensoundscape.utils.generate_clip_times_df

  • overlap_fraction – deprecated alias for clip_overlap_fraction

  • bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.

  • invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb

  • return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]

  • lightning_trainer_kwargs – dictionary of keyword args to pass to __call__, which are then passed to lightning.Trainer.__init__ see lightning.Trainer documentation for options. [Default: None] passes no kwargs

  • dataloader_kwargs – dictionary of keyword args to self.predict_dataloader()

Returns:

df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess

Effects:

(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. After prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.

(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file

Note: if loading an audio file raises a PreprocessingError, the scores

for that sample will be np.nan

save(path, save_hooks=False, weights_only=False)[source]

save model with weights using Trainer.save_checkpoint()

load from saved file with LightningSpectrogramModule.load_from_checkpoint()

Note: saving and loading model objects across OpenSoundscape versions will not work properly. Instead, use .save_weights() and .load_weights() (but note that architecture, customizations to preprocessing, training params, etc will not be retained using those functions).

For maximum flexibilty in further use, save the model with both .save() and .save_torch_dict() or .save_weights().

Parameters:
  • path – file path for saved model object

  • save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()

save_weights(path)[source]

save just the weights of the network

This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())

Parameters:

path – location to save weights file

train(*args, **kwargs)[source]

inherit train() method from LightningModule rather than SpectrogramModule

this is just a method that sets True/False for trianing mode, it doesn’t perform training

opensoundscape.ml.loss module

loss function classes to use with opensoundscape models

class opensoundscape.ml.loss.BCEWithLogitsLoss_hot(*args: Any, **kwargs: Any)[source]

Bases: BCEWithLogitsLoss

use pytorch’s nn.BCEWithLogitsLoss for one-hot labels by simply converting y from long to float

Parameters:

**kwargs – passed to nn.BCEWithLogitsLoss

forward(x, target)[source]
class opensoundscape.ml.loss.CrossEntropyLoss_hot(*args: Any, **kwargs: Any)[source]

Bases: CrossEntropyLoss

use pytorch’s nn.CrossEntropyLoss for one-hot labels by converting labels from 1-hot to integer labels

throws a ValueError if labels are not one-hot

Parameters:

**kwargs – passed to nn.CrossEntropyLoss

forward(x, target)[source]
class opensoundscape.ml.loss.ResampleLoss(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(cls_score, label, weight=None, reduction_override=None)[source]
logit_reg_functions(labels, logits, weight=None)[source]
rebalance_weight(gt_labels)[source]
reweight_functions(label)[source]
opensoundscape.ml.loss.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None)[source]

helper function for BCE loss in ResampleLoss class

opensoundscape.ml.loss.reduce_loss(loss, reduction)[source]

Reduce loss as specified.

Parameters:
  • loss (Tensor) – Elementwise loss tensor.

  • reduction (str) – Options are “none”, “mean” and “sum”.

Returns:

Reduced loss tensor.

Return type:

Tensor

opensoundscape.ml.loss.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]

Apply element-wise weight and reduce loss.

Parameters:
  • loss (Tensor) – Element-wise loss.

  • weight (Tensor) – Element-wise weights.

  • reduction (str) – Same as built-in losses of PyTorch.

  • avg_factor (float) – Avarage factor when computing the mean of losses.

Returns:

Processed loss values.

Return type:

Tensor

opensoundscape.ml.safe_dataset module

Dataset wrapper to handle errors gracefully in Preprocessor classes

A SafeDataset handles errors in a potentially misleading way: If an error is raised while trying to load a sample, the SafeDataset will instead load a different sample. The indices of any samples that failed to load will be stored in ._invalid_indices.

The behavior may be desireable for training a model, but could cause silent errors when predicting a model (replacing a bad file with a different file), and you should always be careful to check for ._invalid_indices after using a SafeDataset.

based on an implementation by @msamogh in nonechucks (github.com/msamogh/nonechucks/)

class opensoundscape.ml.safe_dataset.SafeDataset(dataset, invalid_sample_behavior)[source]

Bases: object

A wrapper for a Dataset that handles errors when loading samples

WARNING: When iterating, will skip the failed sample, but when using within a DataLoader, finds the next good sample and uses it for the current index (see __getitem__).

Note that this class does not subclass DataSet. Instead, it contains a .dataset attribute that is a DataSet (or AudioFileDataset / AudioSplittingDataset, which subclass DataSet).

Parameters:
  • dataset – a torch Dataset instance or child such as AudioFileDataset, AudioSplittingDataset

  • eager_eval – If True, checks if every file is able to be loaded during initialization (logs _valid_indices and _invalid_indices)

Attributes: _vlid_indices and _invalid_indices can be accessed later to check which samples raised Exceptions. _invalid_samples is a set of all index values for samples that raised Exceptions.

__getitem__(index)[source]

If loading an index fails, keeps trying the next index until success

_safe_get_item()[source]

Tries to load a sample, returns None if error occurs

__iter__()[source]

generator that skips samples that raise errors when loading

report(log=None)[source]

write _invalid_samples to log file, give warning, & return _invalid_samples

opensoundscape.ml.sampling module

classes for strategically sampling within a DataLoader

class opensoundscape.ml.sampling.ClassAwareSampler(*args: Any, **kwargs: Any)[source]

Bases: Sampler

In each batch of samples, pick a limited number of classes to include and give even representation to each class

class opensoundscape.ml.sampling.ImbalancedDatasetSampler(*args: Any, **kwargs: Any)[source]

Bases: Sampler

Samples elements randomly from a given list of indices for imbalanced dataset :param indices: a list of indices :type indices: list, optional :param num_samples: number of samples to draw :type num_samples: int, optional :param callback_get_label func: a callback-like function which takes two arguments:

dataset and index

Based on Imbalanced Dataset Sampling by davinnovation (https://github.com/ufoym/imbalanced-dataset-sampler)

class opensoundscape.ml.sampling.RandomCycleIter(data, test_mode=False)[source]

Bases: object

opensoundscape.ml.sampling.class_aware_sample_generator(cls_iter, data_iter_list, n, num_samples_cls=1)[source]
opensoundscape.ml.sampling.get_sampler()[source]

opensoundscape.ml.shallow_classifier module

class opensoundscape.ml.shallow_classifier.MLPClassifier(*args: Any, **kwargs: Any)[source]

Bases: Module

initialize a fully connected NN with ReLU activations

fit(*args, **kwargs)[source]

fit the weights on features and labels, without batching

Args: see quick_fit()

forward(x)[source]
load(path, **kwargs)[source]

load object saved with self.save(); **kwargs like map_location are passed to torch.load

save(path)[source]
opensoundscape.ml.shallow_classifier.augmented_embed(embedding_model, sample_df, n_augmentation_variants, batch_size=1, num_workers=0, device=torch.device)[source]

Embed samples using augmentation during preprocessing

Parameters:
  • embedding_model – a model with an embed() method that takes a dataframe and returns

  • like (embeddings (e.g. a pretrained opensoundscape model or Bioacoustics Model Zoo model)

  • Perch

  • BirdNET

  • HawkEars)

  • sample_df – dataframe with samples to embed

  • n_augmentation_variants – number of augmented variants to generate for each sample

  • batch_size – batch size for embedding; default 1

  • num_workers – number of workers for embedding; default 0

  • device – torch.device to use; default is torch.device(‘cpu’)

Returns:

the embedded training samples and their labels, as torch.tensors

Return type:

x_train, y_train

opensoundscape.ml.shallow_classifier.fit_classifier_on_embeddings(embedding_model, classifier_model, train_df, validation_df, n_augmentation_variants=0, embedding_batch_size=1, embedding_num_workers=0, steps=1000, optimizer=None, criterion=None, device=torch.device)[source]

Embed samples with an embedding model, then fit a classifier on the embeddings

wraps embedding_model.embed() with quick_fit(clf,…)

Also supports generating augmented variations of the training samples

Note: if embedding takes a while and you might want to fit multiple times, consider embedding the samples first then running quick_fit(…) rather than calling this function.

Parameters:
  • embedding_model – a model with an embed() method that takes a dataframe and returns embeddings

  • Perch ((e.g. a pretrained opensoundscape model or Bioacoustics Model Zoo model like)

  • BirdNET

  • HawkEars)

  • classifier_model – a torch.nn.Module object to train, e.g. MLPClassifier or final layer of CNN

  • train_df – dataframe with training samples and labels; see opensoundscape.ml.cnn.train() train_df argument

  • validation_df – dataframe with validation samples and labels; see opensoundscape.ml.cnn.train() validation_df if None, skips validation

  • n_augmentation_variants – if 0 (default), embeds training samples without augmentation; if >0, embeds each training sample with stochastic augmentation num_augmentation_variants times

  • embedding_batch_size – batch size for embedding; default 1

  • embedding_num_workers – number of workers for embedding; default 0

  • steps – model fitting parameters, see quick_fit()

  • optimizer – model fitting parameters, see quick_fit()

  • criterion – model fitting parameters, see quick_fit()

  • device – model fitting parameters, see quick_fit()

Returns:

the embedded training and validation samples and their labels, as torch.tensor

Return type:

x_train, y_train, x_val, y_val

opensoundscape.ml.shallow_classifier.quick_fit(model, train_features, train_labels, validation_features=None, validation_labels=None, steps=1000, optimizer=None, criterion=None, device=torch.device)[source]

train a PyTorch model on features and labels without batching

Assumes all data can fit in memory, so that one step includes all data (i.e. step=epoch)

Defaults are for multi-target label problems and assume train_labels is an array of 0/1 of shape (n_samples, n_classes)

Parameters:
  • model (); generally shape (n_samples,n_features) – a torch.nn.Module object to train

  • train_features – input features for training, often embeddings; should be a valid input to

  • model

  • train_labels – labels for training, generally one-hot encoded with shape

  • (n_samples

  • criterion() (n_classes); should be a valid target for)

  • validation_features – input features for validation; if None, does not perform validation

  • validation_labels – labels for validation; if None, does not perform validation

  • steps – number of training steps (epochs); each step, all data is passed forward and

  • backward – [Default: 1000]

  • weights (and the optimizer updates the) – [Default: 1000]

  • optimizer – torch.optim optimizer to use; default None uses Adam

  • criterion – loss function to use; default None uses BCEWithLogitsLoss (appropriate for

  • classification) (multi-label)

  • device – torch.device to use; default is torch.device(‘cpu’)

opensoundscape.ml.utils module

Utilties for .ml

opensoundscape.ml.utils.apply_activation_layer(x, activation_layer=None)[source]

applies an activation layer to a set of scores

Parameters:
  • x – input values

  • activation_layer

    • None [default]: return original values

    • ’softmax’: apply softmax activation

    • ’sigmoid’: apply sigmoid activation

    • ’softmax_and_logit’: apply softmax then logit transform

Returns:

values with activation layer applied Note: if x is None, returns None

Note: casts x to float before applying softmax, since torch’s softmax implementation doesn’t support int or Long type

opensoundscape.ml.utils.cas_dataloader(dataset, batch_size, num_workers)[source]

Return a dataloader that uses the class aware sampler

Class aware sampler tries to balance the examples per class in each batch. It selects just a few classes to be present in each batch, then samples those classes for even representation in the batch.

Parameters:
  • dataset – a pytorch dataset type object

  • batch_size – see DataLoader

  • num_workers – see DataLoader

opensoundscape.ml.utils.check_labels(label_df, classes)[source]

check that classes and label_df.columns are the same, otherwise raise a helpful error

opensoundscape.ml.utils.collate_audio_samples_to_tensors(batch)[source]

takes a list of AudioSample objects, returns batched tensors

use this collate function with DataLoader if you want to use AudioFileDataset (or AudioSplittingDataset) but want the traditional output of PyTorch Dataloaders (returns two tensors:

the first is a tensor of the data with dim 0 as batch dimension, the second is a tensor of the labels with dim 0 as batch dimension)

Parameters:

batch – a list of AudioSample objects

Returns:

(Tensor of stacked AudioSample.data, Tensor of stacked AudioSample.label.values)

Example usage: ```

from opensoundscape import AudioFileDataset, SpectrogramPreprocessor

preprocessor = SpectrogramPreprocessor(sample_duration=2,height=224,width=224) audio_dataset = AudioFileDataset(label_df,preprocessor)

train_dataloader = DataLoader(

audio_dataset, batch_size=64, shuffle=True, collate_fn = collate_audio_samples_to_tensors

)

```

opensoundscape.ml.utils.get_batch(array, batch_size, batch_number)[source]

get a single slice of a larger array

using the batch size and batch index, from zero

Parameters:
  • array – iterable to split into batches

  • batch_size – num elements per batch

  • batch_number – index of batch

Returns:

one batch (subset of array)

Note: the final elements are returned as the last batch even if there are fewer than batch_size

Example

if array=[1,2,3,4,5,6,7] then:

  • get_batch(array,3,0) returns [1,2,3]

  • get_batch(array,3,3) returns [7]

Module contents