Prediction with pre-trained CNNs

This notebook contains all the code you need to use a pre-trained OpenSoundscape convolutional neural network model (CNN) to make predictions on your own data. Before attempting this tutorial, install OpenSoundscape by following the instructions on the OpenSoundscape website, More detailed tutorials about data preprocessing, training CNNs, and customizing prediction methods can also be found on this site.

Load required packages

The cnn module provides a function load_model to load saved opensoundscape models

from opensoundscape.torch.models.cnn import load_model
import opensoundscape

load some additional packages and perform some setup for the Jupyter notebook.

# Other utilities and packages
import torch
from pathlib import Path
import numpy as np
import pandas as pd
from glob import glob
import subprocess
#set up plotting
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for large visuals
%config InlineBackend.figure_format = 'retina'

For this example, let’s create an untrained model and save it. This 2-class model is not actually good at recognizing any particular species, but it’s useful for illustrating how prediction works.

from opensoundscape.torch.models.cnn import CNN

Load a saved model

load the model object using the load_model function imported above

(if the model was created with an older version of opensoundscape, see instructions below)

model = load_model('./temp.model')

Choose audio files for prediction

Create a list of audio files to predict on. They can be of any length. Consider using glob to find many files at once.

For this example, let’s download a 1-minute audio clip from the Kitzes Lab box to use as an example.

                '-L', '-o', '1min_audio.wav'])
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     7    0     7    0     0      5      0 --:--:--  0:00:01 --:--:--     0
100 3750k  100 3750k    0     0   993k      0  0:00:03  0:00:03 --:--:-- 2265k
CompletedProcess(args=['curl', '', '-L', '-o', '1min_audio.wav'], returncode=0)

use glob to create a list of all files matching a pattern in a folder:

from glob import glob
audio_files = glob('./*.wav') #match all .wav files in the current directory

generate predictions with the model

The model returns a dataframe with a MultiIndex of file, start_time, and end_time. There is one column for each class.

scores, _, _ = model.predict(audio_files)
classA classB
file start_time end_time
./1min_audio.wav 0.0 5.0 -0.885565 -0.716252
5.0 10.0 -0.927523 -0.182728
10.0 15.0 -0.825782 -0.504427
15.0 20.0 -1.131475 -0.679929
20.0 25.0 -0.782559 -0.597284

Overlapping prediction clips

scores, _, _ = model.predict(audio_files, overlap_fraction=0.5)
classA classB
file start_time end_time
./1min_audio.wav 0.0 5.0 -0.885565 -0.716252
2.5 7.5 -1.184270 -0.451556
5.0 10.0 -0.927523 -0.182728
7.5 12.5 -0.955600 -0.627236
10.0 15.0 -0.825782 -0.504427

Inspect samples generated during prediction

from opensoundscape.preprocess.utils import show_tensor_grid
from opensoundscape.torch.datasets import AudioSplittingDataset

#generate a dataset with the samples we wish to generate and the model's preprocessor
inspection_dataset = AudioSplittingDataset(audio_files, model.preprocessor)
inspection_dataset.bypass_augmentations = True

samples = [sample['X'] for sample in inspection_dataset]
_ = show_tensor_grid(samples,4)

Options for prediction

The code above returns the raw predictions of the model without any post-processing (such as a softmax layer or a sigmoid layer).

For details on how to use the predict() function for post-processing of predictions and to generate binary 0/1 predictions of class presence, see the “Basic training and prediction with CNNs” tutorial notebook. But, as a quick example here, let’s add a softmax layer to make the prediction scores for both classes sum to 1. We can also use the binary_preds argument to generate 0/1 predictions for each sample and class. For presence/absence models, use the option binary_preds='single_target'. For multi-class models, think about whether each clip should be labeled with only one class (single target) or whether each clip could contain multiple classes (binary_preds='multi_target')

scores, binary_predictions, _ = model.predict(

As before, the scores are continuous variables, but now have been softmaxed:

classA classB
file start_time end_time
./1min_audio.wav 0.0 5.0 0.457773 0.542227
5.0 10.0 0.321957 0.678043
10.0 15.0 0.420345 0.579655
15.0 20.0 0.388993 0.611007
20.0 25.0 0.453813 0.546187

We also have an additional output, the binary 0/1 (“absent” vs “present”) predictions generated by the model:

classA classB
file start_time end_time
./1min_audio.wav 0.0 5.0 0.0 1.0
5.0 10.0 0.0 1.0
10.0 15.0 0.0 1.0
15.0 20.0 0.0 1.0
20.0 25.0 0.0 1.0

It is sometimes helpful to look at a histogram of the scores:

_ = plt.hist(scores['classA'],bins=20)
_ = plt.xlabel('softmax score for classA')

Using models from older OpenSoundscape versions

Models from OpenSoundscape 0.4.x and 0.5.x

Models trained and saved with OpenSoundscape versions 0.4.x and 0.5.x need to be loaded in a different way, and require that you know the architecture of the saved model.

For example, one set of our publicly availably binary models for 500 species was created with an older version of OpenSoundscape. These models require a little bit of manipulation to load into OpenSoundscape 0.5.x and onward.

First, let’s download one of these models (it’s stored in a .tar format) and save it to the same directory as this notebook in a file called opso_04_model_acanthis-flammea.tar

                '-L', '-o', 'opso_04_model_acanthis-flammea.tar'])
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     8    0     8    0     0      6      0 --:--:--  0:00:01 --:--:--     0
100 42.9M  100 42.9M    0     0  6247k      0  0:00:07  0:00:07 --:--:-- 10.4M
CompletedProcess(args=['curl', '', '-L', '-o', 'opso_04_model_acanthis-flammea.tar'], returncode=0)

From the model notes page, we know that this is a single-target model with a resnet18 architecture trained on 5 second files. Let’s load the model with load_outdated_model. We also need to make sure we use the same preprocessing settings as the original model. In this case, the original model used the same preprocessing settings as the default CNN.preprocessor.

from opensoundscape.torch.models.cnn import load_outdated_model
model = load_outdated_model('./opso_04_model_acanthis-flammea.tar','resnet18',5.0)
/Users/SML161/opt/miniconda3/envs/opso_dev/lib/python3.8/site-packages/pandas/core/ DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  other = Series(other)
mismatched keys:
<All keys matched successfully>
/Users/SML161/opensoundscape/opensoundscape/torch/models/ UserWarning: After loading a model, you still need to ensure that your preprocessing (model.preprocessor) matches the settings used to createthe original model.

Again, you may need to modify model.preprocessor to match the settings used to train the model.

The model is now fully compatible with OpenSoundscape, and can be used as above. For example:

scores, _, _ = model.predict(audio_files)
acanthis-flammea-absent acanthis-flammea-present
file start_time end_time
./1min_audio.wav 0.0 5.0 2.994287 -3.261499
5.0 10.0 3.646061 -3.892005
10.0 15.0 3.004456 -2.484303
15.0 20.0 3.158881 -3.852344
20.0 25.0 2.755805 -2.693598

if we save the model using, we can re-load the full model object later using load_model() rather than repeating the procedure above.

Loading models from OpenSoundscape 0.6.x

If you saved a model with OpenSoundscape 0.6.x and want to use it in 0.7.0 or above, you will need to re-load the model using the original OpenSoundscape version that it was created with and save the model’s weights explicitly:

#OpenSoundscape version 0.6.x
model = load_model('/path/to/saved.model')

dict_to_save = {
    'classes': model.classes,
}, '/path/to/')

Then, you will be able to create a new model object in OpenSoundscape 0.7.0 and load the weights from the state dict as demonstrated above. Make sure to specify the correct architecture and sample duration when you create the CNN object.

#newer OpenSoundscape version
model_dict = torch.load('/path/to/')
classes = model_dict["classes"]

architecture = 'resnet18' #match this with the original model!

sample_duration = 5.0 #match this with the original model!

model = CNN('resnet18',classes,sample_duration)['network_state_dict'])

#save the model object so that we can simply reload it with load_model() in the future:'/path/to/saved_full_object.model')

# Next time, we can just load the full model object directly:
from opensoundscape.torch.models.cnn import load_model
model = load_model('/path/to/saved_full_object.model')

OpenSoundscape model objects include helper functions .save_weights() and .load_weights() which allow you to save and load platform/class independent dictionaries for increased flexibility. The weights saved and loaded by these functions are simply a dictionary of keys and numeric values, so they don’t depend on the existence of particular classes in the code base. We recommend saving both the full model object (.save()) and the raw weights (.save_weights()) for models you plan to use in the future.

Clean up: delete model objects

from pathlib import Path
for p in Path('.').glob('*.model'):
for p in Path('.').glob('*.tar'):
sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x7facfcfcc760>
ResourceWarning: Enable tracemalloc to get the object allocation traceback