Prediction with pre-trained CNNs

This notebook contains all the code you need to use a pre-trained OpenSoundscape convolutional neural network model (CNN) to make predictions on your own data. Before attempting this tutorial, install OpenSoundscape by following the instructions on the OpenSoundscape website, opensoundscape.org. More detailed tutorials about data preprocessing, training CNNs, and customizing prediction methods can also be found on this site.

Load required packages

The cnn module provides a function load_model to load saved opensoundscape models

[1]:
from opensoundscape.torch.models.cnn import load_model
import opensoundscape

load some additional packages and perform some setup for the Jupyter notebook.

[2]:
# Other utilities and packages
import torch
from pathlib import Path
import numpy as np
import pandas as pd
from glob import glob
import subprocess
[3]:
#set up plotting
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for large visuals
%config InlineBackend.figure_format = 'retina'

For this example, let’s create an untrained model and save it. This 2-class model is not actually good at recognizing any particular species, but it’s useful for illustrating how prediction works.

[4]:
from opensoundscape.torch.models.cnn import CNN
CNN('resnet18',['classA','classB'],5.0).save('./temp.model')
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

Load a saved model

load the model object using the load_model function imported above

(if the model was created with an older version of opensoundscape, see instructions below)

[5]:
model = load_model('./temp.model')

Choose audio files for prediction

Create a list of audio files to predict on. They can be of any length. Consider using glob to find many files at once.

For this example, let’s download a 1-minute audio clip from the Kitzes Lab box to use as an example.

[6]:
subprocess.run(['curl',
               'https://pitt.box.com/shared/static/z73eked7quh1t2pp93axzrrpq6wwydx0.wav',
                '-L', '-o', '1min_audio.wav'])
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     7    0     7    0     0      4      0 --:--:--  0:00:01 --:--:--  7000
100 3750k  100 3750k    0     0  1289k      0  0:00:02  0:00:02 --:--:-- 5677k
[6]:
CompletedProcess(args=['curl', 'https://pitt.box.com/shared/static/z73eked7quh1t2pp93axzrrpq6wwydx0.wav', '-L', '-o', '1min_audio.wav'], returncode=0)

use glob to create a list of all files matching a pattern in a folder:

[7]:
from glob import glob
audio_files = glob('./*.wav') #match all .wav files in the current directory
audio_files
[7]:
['./1min_audio.wav']

generate predictions with the model

The model returns a dataframe with a MultiIndex of file, start_time, and end_time. There is one column for each class.

[8]:
scores = model.predict(audio_files)
scores.head()
[8]:
classA classB
file start_time end_time
./1min_audio.wav 0.0 5.0 -0.290774 -0.155345
5.0 10.0 -0.154260 -0.143534
10.0 15.0 -0.043310 -0.486556
15.0 20.0 -0.162963 -0.302960
20.0 25.0 -0.265351 -0.279445

Overlapping prediction clips

[9]:
scores = model.predict(audio_files, overlap_fraction=0.5)
scores.head()
[9]:
classA classB
file start_time end_time
./1min_audio.wav 0.0 5.0 -0.290774 -0.155345
2.5 7.5 -0.075400 -0.448670
5.0 10.0 -0.154260 -0.143534
7.5 12.5 -0.216965 -0.149215
10.0 15.0 -0.043310 -0.486556

Inspect samples generated during prediction

[10]:
from opensoundscape.preprocess.utils import show_tensor_grid
from opensoundscape.torch.datasets import AudioSplittingDataset

#generate a dataset with the samples we wish to generate and the model's preprocessor
inspection_dataset = AudioSplittingDataset(audio_files, model.preprocessor)
inspection_dataset.bypass_augmentations = True

samples = [sample['X'] for sample in inspection_dataset]
_ = show_tensor_grid(samples,4)
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/matplotlib_inline/config.py:68: DeprecationWarning: InlineBackend._figure_format_changed is deprecated in traitlets 4.1: use @observe and @unobserve instead.
  def _figure_format_changed(self, name, old, new):
../_images/tutorials_predict_with_pretrained_cnn_22_1.png

Options for prediction

The code above returns the raw predictions of the model without any post-processing (such as a softmax layer or a sigmoid layer).

For details on how to post-processing prediction scores and to generate binary 0/1 predictions of class presence, see the “Basic training and prediction with CNNs” tutorial notebook. But, as a quick example here, let’s add a softmax layer to make the prediction scores for both classes sum to 1.

We can also convert our continuous scores into True/False (or 1/0) predictions for the presence of each class in each sample. Think about whether each clip should be labeled with only one class (use metrics.predict_single_target_labels) or whether each clip could contain zero, one, or multiple classes (use metrics.predict_multi_target_labels)

[11]:
scores = model.predict(
    audio_files,
    activation_layer='softmax',
)

As before, the scores are continuous variables, but now have been softmaxed:

[12]:
scores.head()
[12]:
classA classB
file start_time end_time
./1min_audio.wav 0.0 5.0 0.466194 0.533806
5.0 10.0 0.497319 0.502681
10.0 15.0 0.609032 0.390968
15.0 20.0 0.534942 0.465058
20.0 25.0 0.503524 0.496476

Now let’s use the predict_single_target_labels(scores) function to label the highest scoring class 1 for each sample, and other classes 0.

[13]:
from opensoundscape.metrics import predict_single_target_labels
predicted_labels = predict_single_target_labels(scores)
predicted_labels.head()
[13]:
classA classB
file start_time end_time
./1min_audio.wav 0.0 5.0 0 1
5.0 10.0 0 1
10.0 15.0 1 0
15.0 20.0 1 0
20.0 25.0 1 0

It is sometimes helpful to look at a histogram of the scores:

[14]:
_ = plt.hist(scores['classA'],bins=20)
_ = plt.xlabel('softmax score for classA')
../_images/tutorials_predict_with_pretrained_cnn_31_0.png

Using models from older OpenSoundscape versions

Models from OpenSoundscape 0.4.x and 0.5.x

Models trained and saved with OpenSoundscape versions 0.4.x and 0.5.x need to be loaded in a different way, and require that you know the architecture of the saved model.

For example, one set of our publicly availably binary models for 500 species was created with an older version of OpenSoundscape. These models require a little bit of manipulation to load into OpenSoundscape 0.5.x and onward.

First, let’s download one of these models (it’s stored in a .tar format) and save it to the same directory as this notebook in a file called opso_04_model_acanthis-flammea.tar

[15]:
subprocess.run(['curl',
               'https://pitt.box.com/shared/static/lglpty35omjhmq6cdz8cfudm43nn2t9f.tar',
                '-L', '-o', 'opso_04_model_acanthis-flammea.tar'])
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     8    0     8    0     0      4      0 --:--:--  0:00:01 --:--:--   571
100 42.9M  100 42.9M    0     0  6128k      0  0:00:07  0:00:07 --:--:-- 10.6M
[15]:
CompletedProcess(args=['curl', 'https://pitt.box.com/shared/static/lglpty35omjhmq6cdz8cfudm43nn2t9f.tar', '-L', '-o', 'opso_04_model_acanthis-flammea.tar'], returncode=0)

From the model notes page, we know that this is a single-target model with a resnet18 architecture trained on 5 second files. Let’s load the model with load_outdated_model. We also need to make sure we use the same preprocessing settings as the original model. In this case, the original model used the same preprocessing settings as the default CNN.preprocessor.

[16]:
from opensoundscape.torch.models.cnn import load_outdated_model
[17]:
model = load_outdated_model('./opso_04_model_acanthis-flammea.tar','resnet18',5.0)

#invert values to match the convention of OpenSoundscape 0.7.x (lowest values = quiet, highest = loud)
model.preprocessor.pipeline.to_img.set(invert=True)
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
mismatched keys:
<All keys matched successfully>
/Users/SML161/opensoundscape/opensoundscape/torch/models/cnn.py:1363: UserWarning: After loading a model, you still need to ensure that your preprocessing (model.preprocessor) matches the settings used to createthe original model.
  warnings.warn(

Again, you may need to modify model.preprocessor to match the settings used to train the model.

The model is now fully compatible with OpenSoundscape, and can be used as above. For example:

[18]:
scores = model.predict(audio_files)
scores.head()
[18]:
acanthis-flammea-absent acanthis-flammea-present
file start_time end_time
./1min_audio.wav 0.0 5.0 6.254239 -5.859637
5.0 10.0 4.935342 -4.917323
10.0 15.0 6.227312 -5.752949
15.0 20.0 5.256021 -5.732774
20.0 25.0 4.836051 -5.272484

if we save the model using model.save(path), we can re-load the full model object later using load_model() rather than repeating the procedure above.

Loading models from OpenSoundscape 0.6.0

If you saved a model with OpenSoundscape 0.6.0 and want to use it in 0.7.0 or above, you will need to re-load the model using the original OpenSoundscape version that it was created with and save the model’s weights explicitly. Here’s an example of code you could run in an environment with opensoundscape version 0.6.0 to export a model for use in later OpenSoundscape versions:

# This code will only work in an environment with OpenSoundscape version 0.6.0
# Use it if you need to save a model created in OpenSoundscape v0.6.0 for use in later opso versions

import torch
from opensoundscape.torch.models.cnn import Resnet18Binary #choose the class used to create the model

model = Resnet18Binary(classes=['negative','positive']) #provide the list of classes for the model
model.load('/path/to/saved.model')

dict_to_save = {
    'network_state_dict':model.network.state_dict(),
    'classes': model.classes,
}
torch.save(dict_to_save, '/path/to/model_dict.pt')

Then, you will be able to create a new model object in OpenSoundscape >=0.7.0 and load the weights from the state dict as demonstrated above. Make sure to specify the correct architecture and sample duration when you create the CNN object.

#run this code in an envrionment with a newer OpenSoundscape version >=0.7.0

import torch
from opensoundscape.torch.models.cnn import CNN

model_dict = torch.load('/path/to/model_dict.pt')
classes = model_dict["classes"]

#remove the 'feature' prefix on weights and replace the 'classifier' prefix with 'fc'
model_dict['network_state_dict'] = {
    k.replace('feature.','').replace('classifier.','fc.'):v
    for k, v in model_dict['network_state_dict'].items()
}

architecture = 'resnet18' #match this with the original model!

sample_duration = 5.0 #match this with the original model!

model = CNN('resnet18',classes,sample_duration)
model.network.load_state_dict(model_dict['network_state_dict'])

#invert values to match the convention of OpenSoundscape 0.7.x
model.preprocessor.pipeline.to_img.set(invert=True)

#save the model object so that we can simply reload it with load_model() in the future:
model.save('/path/to/saved_full_object.model')

# Next time, we can just load the full model object directly:
from opensoundscape.torch.models.cnn import load_model
model = load_model('/path/to/saved_full_object.model')

Loading models from OpenSoundscape 0.6.1 and 0.6.2

If you saved a model with OpenSoundscape 0.6.1 or 0.6.2 and want to use it in 0.7.0 or above, you will need to re-load the model using the original OpenSoundscape version that it was created with and save the model’s weights explicitly. Here’s an example of code you could run in an environment with opensoundscape version 0.6.1 or 0.6.2 to export a model for use in later OpenSoundscape versions:

# This code will only work in an environment with OpenSoundscape version 0.6.1 or 0.6.2
# Use it if you need to save a model created in OpenSoundscape v0.6.1 or 0.6.2 for use in later opso
versions

import torch
from opensoundscape.torch.models.cnn import load_model
model = load_model('/path/to/saved.model')

dict_to_save = {
    'network_state_dict':model.network.state_dict(),
    'classes': model.classes,
    '
}
torch.save(dict_to_save, '/path/to/model_dict.pt')

Then, you will be able to create a new model object in OpenSoundscape 0.7.0 and load the weights from the state dict as demonstrated above. Make sure to specify the correct architecture and sample duration when you create the CNN object.

#run this code in an envrionment with a newer OpenSoundscape version >=0.7.0

import torch
from opensoundscape.torch.models.cnn import CNN

model_dict = torch.load('/path/to/model_dict.pt')
classes = model_dict["classes"]

architecture = 'resnet18' #match this with the original model!

sample_duration = 5.0 #match this with the original model!

model = CNN('resnet18',classes,sample_duration)
model.network.load_state_dict(model_dict['network_state_dict'])

#invert values to match the convention of OpenSoundscape 0.7.x
model.preprocessor.pipeline.to_img.set(invert=True)

#save the model object so that we can simply reload it with load_model() in the future:
model.save('/path/to/saved_full_object.model')

# Next time, we can just load the full model object directly:
from opensoundscape.torch.models.cnn import load_model
model = load_model('/path/to/saved_full_object.model')

OpenSoundscape model objects include helper functions .save_weights() and .load_weights() which allow you to save and load platform/class independent dictionaries for increased flexibility. The weights saved and loaded by these functions are simply a dictionary of keys and numeric values, so they don’t depend on the existence of particular classes in the code base. We recommend saving both the full model object (.save()) and the raw weights (.save_weights()) for models you plan to use in the future.

Clean up: delete model objects

[19]:
from pathlib import Path
for p in Path('.').glob('*.model'):
    p.unlink()
for p in Path('.').glob('*.tar'):
    p.unlink()
Path('1min_audio.wav').unlink()