Customize CNN training

This notebook demonstrates how to use classes from opensoundscape.ml.cnn and architectures created using opensoundscape.ml.cnn_architectures to

choose between single-target and multi-target model behavior
modify learning rates, learning rate decay schedule, and regularization
choose from various CNN architectures
train a multi-target model with a special loss function
use strategic sampling for imbalanced training data
customize preprocessing: train on spectrograms with a bandpassed frequency range

Rather than demonstrating their effects on training (model training is slow!), most examples in this notebook either don’t train the model or “train” it for 0 steps for the purpose of demonstration.

For an introductory demonstration of model training, please see the “Train a CNN” tutorial. For a demo of how to apply a trained model to a dataset, see the “Predict with pretrained CNNs” tutorial.

For a hands-on walkthrough of machine learning for bioacoustics, use the “Classifiers 101 Guide”

Run this tutorial

This tutorial is more than a reference! It’s a Jupyter Notebook which you can run and modify on Google Colab or your own computer.

Link to tutorial	How to run tutorial
	The link opens the tutorial in Google Colab. Uncomment the “installation” line in the first cell to install OpenSoundscape.
	The link downloads the tutorial file to your computer. Follow the Jupyter installation instructions, then open the tutorial file in Jupyter.

[9]:

# if this is a Google Colab notebook, install opensoundscape in the runtime environment
if 'google.colab' in str(get_ipython()):
  %pip install "opensoundscape==0.13.0" "jupyter-client<8,>=5.3.4" "ipykernel==6.17.1"

Setup

Import needed packages

[10]:

from opensoundscape.preprocess import preprocessors
from opensoundscape.ml import cnn, cnn_architectures

import torch
import pandas as pd
from pathlib import Path
import numpy as np
import random
import subprocess

from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for big visuals
%config InlineBackend.figure_format = 'retina'

Download labeled audio files

The Kitzes Lab has created a small labeled dataset of short clips of American Woodcock vocalizations. You have two options for obtaining the folder of data, called woodcock_labeled_data:

Run the following cell to download this small dataset. These commands require you to have tar installed on your computer, as they will download and unzip a compressed file in .tar.gz format.
Download a .zip version of the files by clicking here. You will have to unzip this folder and place the unzipped folder in the same folder that this notebook is in.

[11]:

subprocess.run(
    [
        "curl",
        "https://drive.google.com/uc?export=download&id=1Ly2M--dKzpx331cfUFdVuiP96QKGJz_P",
        "-L",
        "-o",
        "woodcock_labeled_data.tar.gz",
    ]
)  # Download the data
subprocess.run(
    ["tar", "-xzf", "woodcock_labeled_data.tar.gz"]
)  # Unzip the downloaded tar.gz file
subprocess.run(
    ["rm", "woodcock_labeled_data.tar.gz"]
)  # Remove the file after its contents are unzipped

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 9499k  100 9499k    0     0  3921k      0  0:00:02  0:00:02 --:--:-- 8620k

[11]:

CompletedProcess(args=['rm', 'woodcock_labeled_data.tar.gz'], returncode=0)

Prepare audio data

To create a machine learning model, we need two dataframes of labeled clips, one for training and one for testing.

The steps to create these dataframes are described in more detail in other tutorials (e.g. the “Audio annotations” tutorial).

First, we need a dataframe with file paths in the index, so we manipulate the included one_hot_labels.csv slightly.

[12]:

# Load one-hot labels dataframe
labels = pd.read_csv("./woodcock_labeled_data/one_hot_labels.csv").set_index("file")[
    ["present"]
]

# Prepend the folder location to the file paths
labels.index = pd.Series(labels.index).apply(lambda f: "./woodcock_labeled_data/" + f)

# Create class list
classes = labels.columns

# Inspect
labels.head()

[12]:

	present
file
./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav	1
./woodcock_labeled_data/e84a4b60a4f2d049d73162ee99a7ead8.wav	0
./woodcock_labeled_data/79678c979ebb880d5ed6d56f26ba69ff.wav	1
./woodcock_labeled_data/49890077267b569e142440fa39b3041c.wav	1
./woodcock_labeled_data/0c453a87185d8c7ce05c5c5ac5d525dc.wav	1

Next, randomly split these data into train and validation sets.

[13]:

from sklearn.model_selection import train_test_split

train_df, valid_df = train_test_split(labels, test_size=0.2, random_state=0)
print(f"created train_df (len {len(train_df)}) and valid_df (len {len(valid_df)})")

created train_df (len 23) and valid_df (len 6)

Model architectures

We initialize a model object by specifying the architecture, a list of classes, and the duration of individual samples in seconds.

The architecture is the particular design of the CNN. This option can either be a string matching one of the architectures available by default in OpenSoundscape, or a custom PyTorch model object.

Default architectures

The opensoundscape.ml.cnn_architectures module provides functions to create several common CNN architectures. These architectures are built into PyTorch, but the OpenSoundscape module helps us out by reshaping the final layer to match the number of classes we have.

Note that these will use default architecture parameters, including using pre-trained ImageNet weights. If you don’t want to use pre-trained weights, follow the method below of creating the architecture and passing it to the initialization of CNN.

See what architectures are available by default in OpenSoundscape:

[14]:

import opensoundscape.ml

opensoundscape.ml.cnn_architectures.list_architectures()

[14]:

['resnet18',
 'resnet34',
 'resnet50',
 'resnet101',
 'resnet152',
 'alexnet',
 'vgg11_bn',
 'squeezenet1_0',
 'densenet121',
 'efficientnet_b0',
 'efficientnet_b1',
 'efficientnet_b4']

For convenience, we can initialize a model object by providing the name of an architecture as a string, rather than the architecture object.

Create a model with a resnet34 architecture:

[15]:

model = cnn.CNN(
    architecture="resnet34", classes=classes, sample_duration=2.0, sample_rate=32000
)

For more control over model architectures, you will initialize the architecture using the corresponding OpenSoundscape object instead:

[17]:

arch = cnn_architectures.resnet50(num_classes=len(classes))

model = cnn.CNN(arch, classes, sample_duration=2.0, sample_rate=32000)

/Users/SML161/opensoundscape/opensoundscape/ml/cnn.py:239: UserWarning: Modifying .preprocessor to match architecture's expected number of channels (3) (originally 1).
  warnings.warn(

Use random weights

By default, OpenSoundscape’s models download weights pre-trained on ImageNet.

You can instead start from scratch with random weights using the parameter weights=None when creating an architecture. For instance, let’s create an Alexnet architecture with random weights:

[18]:

my_arch = cnn_architectures.alexnet(
    num_classes=len(classes), weights=None, num_channels=1
)
model = cnn.CNN(my_arch, classes, 2.0, sample_rate=32000)

Other custom architectures

We can create any Pytorch model architecture and pass it to the architecture argument when creating a model in OpenSoundscape. You can do this by

subclassing an existing Pytorch model
writing one from scratch. The minimum requirement is that it subclasses torch.nn.Module - it should at least have .forward() and .backward() methods.

Viewing the architecture

The architecture is stored in the model object’s .network attribute. We can view the network and access its parameters by examining this attribute and its sub-parameters. For instance, we can view a ResNet’s feature layer using the .fc attribute:

[19]:

model = cnn.CNN("resnet18", classes, 2.0, sample_rate=32000)
model.network.fc

[19]:

Linear(in_features=512, out_features=1, bias=True)

It is also possbile to replace an architecture of a model entirely simply by setting model.architecture to a new architecture, but this is not generally recommended unless you know what you’re doing. It will completely remove anything the model has “learned,” since the learned weights are a part of the architecture.

Freezing the feature extractor

Sometimes, we only wish to train the final layer or layers of a CNN, known as the “classification head” or simply “classifier”, rather than training all of the layers. This technique makes it possible to fine-tune a pre-trained network using limited training data, without ruining the generalizability of the “feature extractor” (the term for all of the layers before the “classification head”).

If you’re using one of the built-in CNN architectures in OpenSoundscape, you can easily “freeze” the feature extractor (i.e., tell PyTorch not to update any of the weights during training of the classification head) with a one-liner, then proceed with training as normal (cnn.train()...)

[20]:

model.freeze_feature_extractor()

If you are using a custom architecture not native to OpenSoundscape, you can still freeze all but one layer with a one-liner. You just need to specify which layer or layers you wish to keep “trainable” or “unfrozen”. In the case of a resnet architecture, we can point to the .fc (for “fully connected”) layer as the classification layer we want to train while freezing all others. Note that different pytorch architectures may not call the classification layer .fc.

[21]:

model.freeze_layers_except(model.network.fc)

Single-target models

One decision about your architecture is whether your classification problem is single-target (exactly one label per sample) or multi-target (any number of labels per sample, including 0). Single-target models have a softmax activation layer which forces the sum of all class scores to be 1.0.

This is a separate decision from the number of classes your model can potentially identify. For example, if you are creating a model to identify only one species, your model should contain only one class, but it should still be a multi-target model. This allows your model to predict that the species isn’t present (i.e. the class score can be 0).

In most cases in bioacoustic monitoring, models are multi-target. But if you would like to train a single-target model, just set single_target=True either when creating the model object or afterwards.

[22]:

# Change the model to be single_target
model.single_target = True

# Or specify single_target when you create the object
model = cnn.CNN("resnet18", classes, 2.0, single_target=True, sample_rate=32000)

Updating torchmetrics and loss_fn to match single_target=True

Multi-target training with ResampleLoss

Training multi-target models is challenging and can benefit from using a modified loss function. OpenSoundscape provides a loss function designed for training multi-target models. We recommend using this loss function when training multi-target models. You can add it to a class with an in-place helper function:

[23]:

from opensoundscape.ml.cnn import use_resample_loss

[24]:

model = cnn.CNN("resnet18", classes, 2.0, sample_rate=32000)
use_resample_loss(model, train_df=train_df)
print(model.loss_fn)

ResampleLoss()

Spectrogram settings

The parameters used to create spectrograms are very important for classifier performance. The main way you modify these parameters are by setting a custom preprocessor.

OpenSoundscape also provides an additional option that can affect performance and training speed, the ability to change the size of the input spectrogram.

Custom preprocessing

The preprocessing tutorial gives in-depth descriptions of how to customize your preprocessing pipeline, as well as best practices for using these customizations, e.g. reviewing what the samples look like before training on them.

Here, we’ll just give a quick example of tweaking the preprocessing pipeline: providing the CNN with a bandpassed spectrogram object instead of the full frequency range.

[25]:

model = cnn.CNN("resnet18", classes, 2.0, sample_rate=32000)

# change the min and max frequencies for the spectrogram bandpass action
model.preprocessor.pipeline.bandpass.set(min_f=3000, max_f=5000)

Size of spectrogram

OpenSoundscape enables you to modify the size of the spectrogram input to the classifier.

Larger spectrograms have greater resolution which can help the classifier pick up on finer details. However, potential accuracy improvements come at the cost of more resource-intensive training and prediction.

To change the image size, when creating the CNN set sample_shape = (height, width, channels). Most classifier architectures expect 3 channels.

[26]:

model = cnn.CNN(
    "resnet18", classes, 2.0, width=448, height=448, channels=3, sample_rate=32000
)
p = model.preprocessor
p.height, p.width, p.channels

[26]:

(448, 448, 3)

Learning parameters

In a general sense, a model’s learning rate determines how fast the model fits to the data. More specifically, it determines how much the model’s weights change every time it calculates the loss function.

Faster learning rates improve the speed of training and help the model leave local minima as it learns to classify, but if the learning rate is too fast, the model may not successfully fit the data or its fitting might be unstable.

OpenSoundscape allows you to flexibly change parameters related to the model’s optimizer. This includes parameters related to the learning rate, as well as the emphasis the model’s training places on learning smaller, less complex weights, known as regularization.

First, let’s look at the model optimization (AKA “learning”) hyperparameters:

[27]:

model.optimizer_params

[27]:

{'class': torch.optim.adamw.AdamW,
 'kwargs': {'lr': 0.001, 'weight_decay': 0.0005},
 'classifier_lr': None}

Options for modifying the learning hyperparameters include:

Modify learning rate
Fine tune a model
Separate learning rates for feature and classifier blocks
Modify the learning rate schedule
Set the regularization weight decay

Modify learning rate

A basic way to modify the learning rate on an entire model is to change the lr parameter:

[28]:

model.optimizer_params["kwargs"]["lr"] = 0.01

Fine tune a model

One instance where we might want to modify a learning rate is to “fine tune” a model.

After training a model for a while at a relatively high learning rate (think 0.01), we might want to “fine tune” the model, or set a lower learning rate, then train the model at the lower rate for a few epochs.

Let’s set a low learning rate for fine tuning:

[29]:

model.optimizer_params["kwargs"]["lr"] = 0.001

Separate learning rates for feature and classifier blocks

Convolutional Neural Networks can be thought of as having two parts: a feature extractor which learns how to represent/”see” the input data, and a classifier which takes those representations and transforms them into predictions about the class identity of each sample.

In Pytorch, we can customize the learning rate of different layers. For convenience, OpenSoundscape provides an easy way to set the classifier layer’s learning rate separately from the rest of the network. (For more advanced use cases, see the source code in BaseModule.configure_optimizers() for how to set different layers to different learning rates)

[30]:

model = cnn.CNN("resnet18", classes, 2.0, sample_rate=32000)
# set a high learning rate for the classifier
model.optimizer_params["classifier_lr"] = 0.02
# set a low learning rate for the rest of the network
model.optimizer_params["kwargs"]["lr"] = 0.0001

# these learning rates will be configured in the optpimizer when you begin training

Learning rate schedule

It’s often helpful to decrease the learning rate over the course of training. By reducing the amount that the model’s weights are updated as time goes on, this causes the learning to gradually switch from coarsely searching across possible weights to fine-tuning the weights.

The default learning rate schedule is Cosine Annealing with Warmup, which starts by linearly ramping up to the full learning rate, then gradually decays to 10%.

Let’s modify that end at 1% of the max learning rate at the end of training

[31]:

model.lr_scheduler_params

[31]:

{'class': opensoundscape.ml.schedulers.CosineAnnealingWithWarmupScheduler,
 'kwargs': {'max_steps': 1000, 'warmup_fraction': 0.05, 'final_lr_ratio': 0.1}}

[32]:

model.lr_scheduler_params["kwargs"]["final_lr_ratio"] = 0.01

Set the regularization weight decay

Pytorch optimizers perform L2 regularization, giving the optimizer an incentive for the model to have small weights rather than large weights. The goal of this regularization is to reduce overfitting to the training data by reducing the complexity of the model.

Depending on how much emphasis you want to place on the L2 regularization, you can change the weight decay parameter. By default, it is 0.0005. The higher the value for the “weight decay” parameter, the more the model training algorithm prioritizes smaller weights.

[33]:

model.optimizer_params["kwargs"]["weight_decay"] = 0.001

Other LR schedulers

or, we can use a different class, such as Cosine Annealing LR

[34]:

# decrease lr over 1000 steps using a cosine annealing schedule
model.lr_scheduler_params = {
    "class": torch.optim.lr_scheduler.CosineAnnealingLR,
    "kwargs": {"T_max": 1000},
}

Experiment!

In this tutorial we’ve covered the more advanced options available to customize your CNN training.

While intuition can be a helpful guide, it’s not always intuitive which parameters will result in the best model. This is why it’s helpful to experiment with different parameters to see what works for you.

To facilitate experimentation, OpenSoundscape includes integration with Weights & Biases. See the original “Train a CNN” tutorial for more information on how to set this up.

Clean up: Run the following code to remove the downloaded files.

[35]:

import shutil

# uncomment to remove the labeled dataset
# shutil.rmtree('./woodcock_labeled_data')
Path("./my_pre.json").unlink(missing_ok=True)

for p in Path(".").glob("*.model"):
    p.unlink()