Prediction with pre-trained CNNs

This notebook contains all the code you need to use a pre-trained OpenSoundscape convolutional neural network model (CNN) to make predictions on your own data.

Before attempting this tutorial, install OpenSoundscape by following the instructions on the OpenSoundscape website, opensoundscape.org.

More detailed tutorials about data preprocessing, training CNNs, and customizing prediction methods can be found in the other tutorial notebooks on opensoundscape.org.

Load required packages

We will load several imports from OpenSoundscape. First, load the AudiotoSpectrogramPreprocessor class from the preprocess.preprocessors module. Preprocessor classes are used to load, transform, and augment audio samples for use in a machine learning model.

[1]:
from opensoundscape.preprocess.preprocessors import AudioToSpectrogramPreprocessor

Second, the cnn module provides classes for training and prediction with various structures of CNNs. For this example, load the Resnet18Binary class, used for models made with the Resnet18 architecture for predicting the presence or absence of a species (a “binary” classifier).

[2]:
# The cnn module provides classes for training/predicting with various types of CNNs
from opensoundscape.torch.models.cnn import Resnet18Binary

Third, the run_command function from the helpers module allows us to run command line commands from inside OpenSoundscape scripts.

[3]:
from opensoundscape.helpers import run_command

Finally, load some additional packages and perform some setup for the Jupyter notebook.

[4]:
# Other utilities and packages
import torch
from pathlib import Path
import numpy as np
import pandas as pd
from glob import glob
[5]:
#set up plotting
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for large visuals
%config InlineBackend.figure_format = 'retina'

Prepare audio data for prediction

To run predictions on your audio data, you will need to have your audio already split up into the clip lengths that the model expects to receive. If your audio data are not already split, see the demonstration of the Audio.split() method in the audio_and_spectrogram notebook.

You can check the length of clips that the model to receives in the model’s notes when you download it. This is often, but not always, 5.0 seconds.

Download audio files

The Kitzes Lab has created a small labeled dataset of short clips of American Woodcock vocalizations. You have two options for obtaining the folder of data, called woodcock_labeled_data:

  1. Run the following cell to download this small dataset. These commands require you to have curl and tar installed on your computer, as they will download and unzip a compressed file in .tar.gz format.
  2. OR download a .zip version of the files by clicking here. You will have to unzip this folder and place the unzipped folder in the same folder that this notebook is in.

Note: Once you have the data, you do not need to run this cell again.

[6]:
commands = [
    "curl -L https://pitt.box.com/shared/static/79fi7d715dulcldsy6uogz02rsn5uesd.gz -o ./woodcock_labeled_data.tar.gz",
    "tar -xzf woodcock_labeled_data.tar.gz", # Unzip the downloaded tar.gz file
    "rm woodcock_labeled_data.tar.gz" # Remove the file after its contents are unzipped
]
for command in commands:
    run_command(command)

Generate a Preprocessor object

In addition to having audio clips of the correct length, you will need to create a Preprocessor object that loads audio samples for the CNN.

First, generate a Pandas DataFrame with the index containing the paths to each file, as shown below.

[7]:
# collect a list of audio files
file_list = glob('./woodcock_labeled_data/*.wav')

# create a DataFrame with the audio files as the index
audio_file_df = pd.DataFrame(index=file_list)

Next, use that DataFrame to create a Preprocessor object suitable for your application. Use the argument return_labels=False, as our audio to predict on does not have labels.

If the model was trained with any special preprocesor settings, you should apply those settings here. For pretrained models created by the Kitzes Lab, see the model’s notes from its download page for the exact code to use here.

[8]:
# create a Preprocessor object
# we use the option "return_labels=False" because our audio to predict on does not have labels
from opensoundscape.preprocess.preprocessors import AudioToSpectrogramPreprocessor
prediction_dataset = AudioToSpectrogramPreprocessor(audio_file_df, return_labels=False)

Models trained with OpenSoundscape v0.5.x

Check the model notes page for the appropriate model class to use and import the correct class from the cnn module.

[9]:
from opensoundscape.torch.models.cnn import Resnet18Binary

For the purpose of demonstration, let’s generate a new Resnet18 model for binary prediction and save it to our local folder.

If you download a pre-trained model, you can skip this cell.

[10]:
model = Resnet18Binary(classes=['absent','present'])
model.save('./demo.model')
created PytorchModel model object with 2 classes
Saving to demo.model

Next, provide the model class’s from_checkpoint() method with the path to your downloaded model.

[11]:
# load the model into the appropriate model class
model = Resnet18Binary.from_checkpoint('./demo.model')
created PytorchModel model object with 2 classes
loading weights from saved object

Generate predictions as follows. The predict method returns three arguments: scores, thresholded predictions, and labels. For unthresholded prediction on unlabeled data, only the first one is relevant, so discard the other returns are discarded using scores, _, _.

[12]:
# call model.predict() with the Preprocessor to generate predictions
scores, _, _ = model.predict(prediction_dataset)
(29, 2)

Look at the scores of the first 5 samples.

[13]:
#look at the scores of the first 5 samples
scores.head()
[13]:
absent present
./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav 0.331394 0.815523
./woodcock_labeled_data/e84a4b60a4f2d049d73162ee99a7ead8.wav 0.440169 0.262892
./woodcock_labeled_data/79678c979ebb880d5ed6d56f26ba69ff.wav 0.512454 0.545183
./woodcock_labeled_data/49890077267b569e142440fa39b3041c.wav 0.282791 0.827620
./woodcock_labeled_data/0c453a87185d8c7ce05c5c5ac5d525dc.wav 0.194547 0.753871

Options for prediction

The code above returns the raw predictions of the model without any post-processing (such as a softmax layer or a sigmoid layer).

For details on how to use the predict() function for post-processing of predictions and to generate binary 0/1 predictions of class presence, see the “Basic training and prediction with CNNs” tutorial notebook. But, as a quick example, let’s generate scores using the following settings: * a softmax layer, to make the prediction scores for both classes sum to 1 * a logit layer, to map the prediction scores from [0, 1] to (-inf, +inf)

We can also use the binary_preds argument to generate 0/1 predictions for each sample and class. For presence/absence models, use the option binary_preds='single_target'. For multi-class models, think about whether each clip should be labeled with only one class (single target) or whether each clip could contain multiple classes (binary_preds='multi_target')

[14]:
scores, binary_predictions, _ = model.predict(
    prediction_dataset,
    activation_layer='softmax_and_logit',
    binary_preds='single_target'
)
(29, 2)

As before, the scores are continuous variables, but now have been softmaxed and logited:

[15]:
scores.head(2)
[15]:
absent present
./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav -0.484129 0.484129
./woodcock_labeled_data/e84a4b60a4f2d049d73162ee99a7ead8.wav 0.177276 -0.177276

We also have an additional output, the binary 0/1 (“absent” vs “present”) predictions generated by the model:

[16]:
binary_predictions.head(2)
[16]:
absent present
./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav 0.0 1.0
./woodcock_labeled_data/e84a4b60a4f2d049d73162ee99a7ead8.wav 1.0 0.0

It is often helpful to look at a histogram of the scores for the positive class. We typically apply softmax and logit to predictions before plotting them as histograms.

[17]:
_ = plt.hist(scores['present'],bins=20)
_ = plt.xlabel('logit of softmax score for positive class')
../_images/tutorials_predict_with_pretrained_cnn_35_0.png

Models trained with OpenSoundscape 0.4.x

One set of our publicly availably binary models for 500 species was created with an older version of OpenSoundscape. These models require a little bit of manipulation to load into OpenSoundscape 0.5.x and onward.

First, let’s download one of these models (it’s stored in a .tar format) and save it to the same directory as this notebook in a file called opso_04_model_acanthis-flammea.tar

[18]:
%%bash
curl -L https://pitt.box.com/shared/static/lglpty35omjhmq6cdz8cfudm43nn2t9f.tar -o ./opso_04_model_acanthis-flammea.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     8    0     8    0     0      7      0 --:--:--  0:00:01 --:--:--     0
100 42.9M  100 42.9M    0     0  4580k      0  0:00:09  0:00:09 --:--:-- 6184k

Next, load the weights from that model into an OpenSoundscape model object with the following code:

[19]:
from opensoundscape.torch.models.cnn import PytorchModel
from opensoundscape.torch.architectures.cnn_architectures import resnet18
import torch

# load the tar file into a dictionary
# (you could change this to the location of any .tar file on your computer)
opso_04_model_tar_path = "./opso_04_model_acanthis-flammea.tar"
opso_04_model_dict = torch.load(opso_04_model_tar_path)

# create a resnet18 binary model
# (all models created with Opensoundscape 0.4.x are 2-class resnet18 architectures)
architecture = resnet18(num_classes=2,use_pretrained=False)
model = PytorchModel(classes=['negative','positive'],architecture=architecture)

# load the model weights into our model object
# now, our model is equivalent to the trained model we downloaded
model.network.load_state_dict(opso_04_model_dict['model_state_dict'])
created PytorchModel model object with 2 classes
[19]:
<All keys matched successfully>

Now, we can use the model as normal to create predictions on audio. We’ll use the same prediction_dataset from above.

Remember to choose the activation_layer you desire. In this example, we’ll assume we just want to generate scores, not binary predictions. We’ll apply a softmax layer, then the logit transform, to the scores using the activation_layer="softmax_and_logit" option. This will generate the type of scores that are useful for plotting score histograms, among other things.

[20]:
# generate predictions on our dataset
predition_scores_df,_,_ = model.predict(prediction_dataset, activation_layer='softmax_and_logit')

predition_scores_df.head(3)
(29, 2)
[20]:
negative positive
./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav 3.953401 -3.953400
./woodcock_labeled_data/e84a4b60a4f2d049d73162ee99a7ead8.wav 3.650462 -3.650461
./woodcock_labeled_data/79678c979ebb880d5ed6d56f26ba69ff.wav 2.165398 -2.165398

Remove the downloaded files to clean up.

[22]:
folder = Path('./woodcock_labeled_data')
[p.unlink() for p in folder.glob("*")]
folder.rmdir()
for p in Path('.').glob('*.model'):
    p.unlink()
for p in Path('.').glob('*.tar'):
    p.unlink()