Machine Learning

PyTorch CNNs

classes for pytorch machine learning models in opensoundscape

For tutorials, see notebooks on opensoundscape.org

class opensoundscape.torch.models.cnn.CnnResampleLoss(architecture, classes, single_target=False)

Subclass of PytorchModel with ResampleLoss.

ResampleLoss may perform better than BCE Loss for multitarget problems in some scenarios.

Parameters:
  • architecture – a model architecture object, for example one generated with the torch.architectures.cnn_architectures module
  • classes – list of class names. Must match with training dataset classes.
  • single_target
    • True: model expects exactly one positive class per sample
    • False: samples can have an number of positive classes

    [default: False]

class opensoundscape.torch.models.cnn.InceptionV3(classes, freeze_feature_extractor=False, use_pretrained=True, single_target=False)
train_epoch()

perform forward pass, loss, backpropagation for one epoch

need to override parent because Inception returns different outputs from the forward pass (final and auxiliary layers)

Returns: (targets, predictions, scores) on training files

class opensoundscape.torch.models.cnn.InceptionV3ResampleLoss(classes, freeze_feature_extractor=False, use_pretrained=True, single_target=False)
class opensoundscape.torch.models.cnn.PytorchModel(architecture, classes, single_target=False)

Generic Pytorch Model with .train() and .predict()

flexible architecture, optimizer, loss function, parameters

for tutorials and examples see opensoundscape.org

methods include train(), predict(), save(), and load()

Parameters:
  • architecture – a model architecture object, for example one generated with the torch.architectures.cnn_architectures module
  • classes – list of class names. Must match with training dataset classes.
  • single_target
    • True: model expects exactly one positive class per sample
    • False: samples can have an number of positive classes

    [default: False]

load(path, load_weights=True, load_classifier_weights=True, load_optimizer_state_dict=True, verbose=False)

load model and optimizer state_dict from disk

the object should be saved with model.save() which uses torch.save with keys for ‘model_state_dict’ and ‘optimizer_state_dict’

Parameters:
  • path – where the file is saved
  • load_weights – if False, ignore network weights [default:True]
  • load_classifier_weights – if False, ignore classifier layer weights Use False to only load feature weights, eg to re-use trained cnn’s feature extractor for new class [default: True]
  • load_optimizer_state_dict – if False, ignore saved parameters for optimizer’s state [default: True]
  • verbose – if True, print missing and unused keys for model weights
predict(prediction_dataset, batch_size=1, num_workers=0, activation_layer=None, binary_preds=None, threshold=0.5, error_log=None)

Generate predictions on a dataset

Choose to return any combination of scores, labels, and single-target or multi-target binary predictions. Also choose activation layer for scores (softmax, sigmoid, softmax then logit, or None).

Note: the order of returned dataframes is (scores, preds, labels)

Parameters:
  • prediction_dataset – a Preprocessor or DataSset object that returns tensors, such as AudioToSpectrogramPreprocessor (no augmentation) or CnnPreprocessor (w/augmentation) from opensoundscape.datasets
  • batch_size – Number of files to load simultaneously [default: 1]
  • num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]
  • activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]
  • binary_preds – Optionally return binary (thresholded 0/1) predictions options: - ‘single_target’: max scoring class = 1, others = 0 - ‘multi_target’: scores above threshold = 1, others = 0 - None: do not create or return binary predictions [default: None]
  • threshold – prediction threshold for sigmoid scores. Only relevant when binary_preds == ‘multi_target’
  • error_log – if not None, saves a list of files that raised errors to the specified file location [default: None]
Returns: 3 DataFrames (or Nones), w/index matching prediciton_dataset.df
scores: post-activation_layer scores predictions: 0/1 preds for each class labels: labels from dataset (if available)
Note: if loading an audio file raises a PreprocessingError, the scores
and predictions for that sample will be np.nan

Note: if no return type selected for labels/scores/preds, returns None instead of a DataFrame in the returned tuple

save(path=None, save_weights=True, save_optimizer=True, extras={})

save model with weights (default location is self.save_path)

Parameters:
  • path – destination for saved model. if None, uses self.save_path
  • save_weights – if False, only save metadata/metrics [default: True]
  • save_optimizer – if False, don’t save self.optim.state_dict()
  • extras – arbitrary dictionary of things to save, eg valid-preds
train(train_dataset, valid_dataset, epochs=1, batch_size=1, num_workers=0, save_path='.', save_interval=1, log_interval=10, unsafe_sample_log='./unsafe_samples.log')

train the model on samples from train_dataset

If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().

Parameters:
  • train_dataset – a Preprocessor that loads sample (audio file + label) to Tensor in batches (see docs/tutorials for details)
  • valid_dataset – a Preprocessor for evaluating performance
  • epochs – number of epochs to train for [default=1] (1 epoch constitutes 1 view of each training sample)
  • batch_size – number of training files to load/process before re-calculating the loss function and backpropagation
  • num_workers – parallelization (ie, cores or cpus) Note: use 0 for single (root) process (not 1)
  • save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]
  • save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.
  • log_interval – interval in epochs to evaluate model with validation dataset and print metrics to the log
  • unsafe_sample_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file
train_epoch()

perform forward pass, loss, backpropagation for one epoch

Returns: (targets, predictions, scores) on training files

class opensoundscape.torch.models.cnn.Resnet18Binary(classes)

Subclass of PytorchModel with Resnet18 architecture

This subclass allows separate training parameters for the feature extractor and classifier

Parameters:
  • classes – list of class names. Must match with training dataset classes.
  • single_target
    • True: model expects exactly one positive class per sample
    • False: samples can have an number of positive classes

    [default: False]

class opensoundscape.torch.models.cnn.Resnet18Multiclass(classes, single_target=False)

Multi-class model with resnet18 architecture and ResampleLoss.

Can be single or multi-target.

Parameters:
  • classes – list of class names. Must match with training dataset classes.
  • single_target
    • True: model expects exactly one positive class per sample
    • False: samples can have an number of positive classes

    [default: False]

Notes - Allows separate parameters for feature & classifier blocks

via self.optimizer_params’s keys: “feature” and “classifier” (by using hand-built architecture)
  • Uses ResampleLoss which requires class counts as an input.
class opensoundscape.torch.models.utils.BaseModule

Base class for a pytorch model pipeline class.

All child classes should define load, save, etc

opensoundscape.torch.models.utils.cas_dataloader(dataset, batch_size, num_workers)

Return a dataloader that uses the class aware sampler

Class aware sampler tries to balance the examples per class in each batch. It selects just a few classes to be present in each batch, then samples those classes for even representation in the batch.

Parameters:
  • dataset – a pytorch dataset type object
  • batch_size – see DataLoader
  • num_workers – see DataLoader
opensoundscape.torch.models.utils.get_dataloader(dataset, batch_size=64, num_workers=1, shuffle=False, sampler='')

Create a DataLoader from a DataSet - chooses between normal pytorch DataLoader and ImbalancedDatasetSampler. - Sampler: None -> default DataLoader; ‘imbalanced’->ImbalancedDatasetSampler

Module to initialize PyTorch CNN architectures with custom output shape

This module allows the use of several built-in CNN architectures from PyTorch. The architecture refers to the specific layers and layer input/output shapes (including convolution sizes and strides, etc) - such as the ResNet18 or Inception V3 architecture.

We provide wrappers which modify the output layer to the desired shape (to match the number of classes). The way to change the output layer shape depends on the architecture, which is why we need a wrapper for each one. This code is based on pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

To use these wrappers, for example, if your model has 10 output classes, write

my_arch=resnet18(10)

Then you can initialize a model object from opensoundscape.torch.models.cnn with your architecture:

model=PytorchModel(classes,my_arch)

or override an existing model’s architecture:

model.network = my_arch

Note: the InceptionV3 architecture must be used differently than other architectures - the easiest way is to simply use the InceptionV3 class in opensoundscape.torch.models.cnn.

opensoundscape.torch.architectures.cnn_architectures.alexnet(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for AlexNet architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.densenet121(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for densenet121 architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.inception_v3(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for Inception v3 architecture

Input: 229x229

WARNING: expects (299,299) sized images and has auxiliary output. See InceptionV3 class in opensoundscape.torch.models.cnn for use.

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.resnet101(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for ResNet101 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.resnet152(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for ResNet152 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.resnet18(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for ResNet18 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.resnet34(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for ResNet34 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.resnet50(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for ResNet50 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.set_parameter_requires_grad(model, freeze_feature_extractor)

if necessary, remove gradients of all model parameters

if freeze_feature_extractor is True, we set requires_grad=False for all features in the feature extraction block. We would do this if we have a pre-trained CNN and only want to change the shape of the final layer, then train only that final classification layer without modifying the weights of the rest of the network.

opensoundscape.torch.architectures.cnn_architectures.squeezenet1_0(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for squeezenet architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
opensoundscape.torch.architectures.cnn_architectures.vgg11_bn(num_classes, freeze_feature_extractor=False, use_pretrained=True)

Wrapper for vgg11 architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer
  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
  • use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.

defines feature extractor and Architecture class for ResNet CNN

This implementation of the ResNet18 architecture allows for separate access to the feature extraction and classification blocks. This can be useful, for instance, to freeze the feature extractor and only train the classifier layer; or to specify different learning rates for the two blocks.

This implementation is used in the Resnet18Binary and Resnet18Multiclass classes of opensoundscape.torch.models.cnn.

class opensoundscape.torch.architectures.resnet.ResNetArchitecture(num_cls, weights_init='ImageNet', num_layers=18, init_classifier_weights=False)

ResNet architecture with 18 or 50 layers

This implementation enables separate access to feature and classification blocks.

Parameters:
  • num_cls – number of classes (int)
  • weights_init
    • “ImageNet”: load the pre-trained weights for ImageNet dataset
    • path: load weights from a path on your computer or a url
    • None: initialize with random weights
  • num_layers – 18 for Resnet18 or 50 for Resnet50
  • init_classifier_weights
    • if True, load the weights of the classification layer as well as

    feature extraction layers - if False (default), only load the weights of the feature extraction layers

load(init_path, init_classifier_weights=True, verbose=False)

load state dict (weights) of the feature+classifier optionally load only feature weights not classifier weights

Parameters:
  • init_path
    • url containing “http”: download weights from web
    • path: load weights from local path
  • init_classifier_weights
    • if True, load the weights of the classification layer as well as

    feature extraction layers - if False (default), only load the weights of the feature extraction layers

  • verbose – if True, print missing/unused keys [default: False]
class opensoundscape.torch.architectures.resnet.ResNetFeature(block, layers, zero_init_residual=False, groups=1, width_per_group=64, replace_stride_with_dilation=None, norm_layer=None)
class opensoundscape.torch.architectures.utils.BaseArchitecture

Base architecture for reference.

Loss Functions

loss function classes to use with opensoundscape models

class opensoundscape.torch.loss.BCEWithLogitsLoss_hot

use pytorch’s nn.BCEWithLogitsLoss for one-hot labels by simply converting y from long to float

class opensoundscape.torch.loss.CrossEntropyLoss_hot

use pytorch’s nn.CrossEntropyLoss for one-hot labels by converting labels from 1-hot to integer labels

throws a ValueError if labels are not one-hot

class opensoundscape.torch.loss.ResampleLoss(class_freq, reduction='mean', loss_weight=1.0)
opensoundscape.torch.loss.reduce_loss(loss, reduction)

Reduce loss as specified.

Parameters:
  • loss (Tensor) – Elementwise loss tensor.
  • reduction (str) – Options are “none”, “mean” and “sum”.
Returns:

Reduced loss tensor.

Return type:

Tensor

opensoundscape.torch.loss.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)

Apply element-wise weight and reduce loss.

Parameters:
  • loss (Tensor) – Element-wise loss.
  • weight (Tensor) – Element-wise weights.
  • reduction (str) – Same as built-in losses of PyTorch.
  • avg_factor (float) – Avarage factor when computing the mean of losses.
Returns:

Processed loss values.

Return type:

Tensor

Safe Dataloading

Dataset wrapper to handle errors gracefully in Preprocessor classes

A SafeDataset handles errors in a potentially misleading way: If an error is raised while trying to load a sample, the SafeDataset will instead load a different sample. The indices of any samples that failed to load will be stored in ._unsafe_indices.

The behavior may be desireable for training a model, but could cause silent errors when predicting a model (replacing a bad file with a different file), and you should always be careful to check for ._unsafe_indices after using a SafeDataset.

implemented by @msamogh in nonechucks (github.com/msamogh/nonechucks/)

class opensoundscape.torch.safe_dataset.SafeDataset(dataset, eager_eval=False)

A wrapper for a Dataset that handles errors when loading samples

WARNING: When iterating, will skip the failed sample, but when using within a DataLoader, finds the next good sample and uses it for the current index (see __getitem__).

Parameters:
  • dataset – a torch Dataset instance or child such as a Preprocessor
  • eager_eval – If True, checks if every file is able to be loaded during initialization (logs _safe_indices and _unsafe_indices)

Attributes: _safe_indices and _unsafe_indices can be accessed later to check which samples threw errors.

_build_index()

tries to load each sample, logs _safe_indices and _unsafe_indices

__getitem__(index)

If loading an index fails, keeps trying the next index until success

_safe_get_item()

Tries to load a sample, returns None if error occurs

is_index_built

Returns True if all indices of the original dataset have been classified into safe_samples_indices or _unsafe_samples_indices.

Sampling

classes for strategically sampling within a DataLoader

class opensoundscape.torch.sampling.ClassAwareSampler(labels, num_samples_cls=1)

In each batch of samples, pick a limited number of classes to include and give even representation to each class

class opensoundscape.torch.sampling.ImbalancedDatasetSampler(dataset, indices=None, num_samples=None, callback_get_label=None)

Samples elements randomly from a given list of indices for imbalanced dataset :param indices: a list of indices :type indices: list, optional :param num_samples: number of samples to draw :type num_samples: int, optional :param callback_get_label func: a callback-like function which takes two arguments - dataset and index

Data Selection

opensoundscape.data_selection.upsample(input_df, label_column='Labels', random_state=None)

Given a input DataFrame upsample to maximum value

Upsampling removes the class imbalance in your dataset. Rows for each label are repeated up to max_count // rows. Then, we randomly sample the rows to fill up to max_count.

Parameters:
  • input_df – A DataFrame to upsample
  • label_column – The column to draw unique labels from
  • random_state – Set the random_state during sampling
Returns:

An upsampled DataFrame

Return type:

df

Performance Metrics

opensoundscape.metrics.binary_metrics(targets, preds, class_names=[0, 1])

labels should be single-target

opensoundscape.metrics.multiclass_metrics(targets, preds, class_names)

provide a list or np.array of 0,1 targets and predictions

opensoundscape.metrics.predict(scores, single_target=False, threshold=0.5)

convert numeric scores to binary predictions

return 0/1 for an array of scores: samples (rows) x classes (columns)

Parameters:
  • scores – a 2-d list or np.array. row=sample, columns=classes
  • single_target – if True, predict 1 for highest scoring class per sample, 0 for other classes. If False, predict 1 for all scores > threshold [default: False]
  • threshold – Predict 1 for score > threshold. only used if single_target = False. [default: 0.5]

Grad Cam

GradCAM is a method of visualizing the activation of the network on parts of an image

# Author: Kazuto Nakashima # URL: http://kazuto1011.github.io # Created: 2017-05-26