Prediction with pre-trained CNNs¶
This notebook contains all the code you need to use a pre-trained OpenSoundscape convolutional neural network model (CNN) to make predictions on your own data. Before attempting this tutorial, install OpenSoundscape by following the instructions on the OpenSoundscape website, opensoundscape.org. More detailed tutorials about data preprocessing, training CNNs, and customizing prediction methods can also be found on this site.
Load required packages¶
We will load several imports from OpenSoundscape. First, load the
AudiotoSpectrogramPreprocessor class from the
preprocess.preprocessors module. Preprocessor classes are used to load, transform, and augment audio samples for use in a machine learning model.
from opensoundscape.preprocess.preprocessors import AudioToSpectrogramPreprocessor
cnn module provides classes for training and prediction with various structures of CNNs. For this example, load the
Resnet18Binary class, used for models made with the Resnet18 architecture for predicting the presence or absence of a species (a “binary” classifier).
# The cnn module provides classes for training/predicting with various types of CNNs from opensoundscape.torch.models.cnn import Resnet18Binary
Finally, load some additional packages and perform some setup for the Jupyter notebook.
# Other utilities and packages import torch from pathlib import Path import numpy as np import pandas as pd from glob import glob import subprocess
#set up plotting from matplotlib import pyplot as plt plt.rcParams['figure.figsize']=[15,5] #for large visuals %config InlineBackend.figure_format = 'retina'
Prepare audio data for prediction¶
To run predictions on your audio data, you will need to have your audio already split up into the clip lengths that the model expects to receive. If your audio data are not already split, see the demonstration of the
Audio.split() method in the
You can check the length of clips that the model to receives in the model’s notes when you download it. This is often, but not always, 5.0 seconds.
Download audio files¶
The Kitzes Lab has created a small labeled dataset of short clips of American Woodcock vocalizations. You have two options for obtaining the folder of data, called
- Run the following cell to download this small dataset. These commands require you to have
tarinstalled on your computer, as they will download and unzip a compressed file in
- OR download a
.zipversion of the files by clicking here. You will have to unzip this folder and place the unzipped folder in the same folder that this notebook is in.
Note: Once you have the data, you do not need to run this cell again.
subprocess.run(['curl','https://pitt.box.com/shared/static/79fi7d715dulcldsy6uogz02rsn5uesd.gz','-L', '-o','woodcock_labeled_data.tar.gz']) # Download the data subprocess.run(["tar","-xzf", "woodcock_labeled_data.tar.gz"]) # Unzip the downloaded tar.gz file subprocess.run(["rm", "woodcock_labeled_data.tar.gz"]) # Remove the file after its contents are unzipped
CompletedProcess(args=['rm', 'woodcock_labeled_data.tar.gz'], returncode=0)
Generate a Preprocessor object¶
In addition to having audio clips of the correct length, you will need to create a Preprocessor object that loads audio samples for the CNN.
First, generate a Pandas DataFrame with the index containing the paths to each file, as shown below.
# collect a list of audio files file_list = glob('./woodcock_labeled_data/*.wav') # create a DataFrame with the audio files as the index audio_file_df = pd.DataFrame(index=file_list)
Next, use that DataFrame to create a Preprocessor object suitable for your application. Use the argument
return_labels=False, as our audio to predict on does not have labels.
If the model was trained with any special preprocesor settings, you should apply those settings here. For pretrained models created by the Kitzes Lab, see the model’s notes from its download page for the exact code to use here.
# create a Preprocessor object # we use the option "return_labels=False" because our audio to predict on does not have labels from opensoundscape.preprocess.preprocessors import AudioToSpectrogramPreprocessor prediction_dataset = AudioToSpectrogramPreprocessor(audio_file_df, return_labels=False)
Models trained with OpenSoundscape v0.5.x¶
Check the model notes page for the appropriate model class to use and import the correct class from the
from opensoundscape.torch.models.cnn import Resnet18Binary
For the purpose of demonstration, let’s generate a new Resnet18 model for binary prediction and save it to our local folder. This is a dummy model that will not be trained using any data and will thus not make meaningful predictions.
If you download a pre-trained model, you can skip this cell.
model = Resnet18Binary(classes=['absent','present']) model.save('./demo.model')
created PytorchModel model object with 2 classes Saving to demo.model
from opensoundscape.torch.models.cnn import PytorchModel
Next, provide the model class’s
from_checkpoint() method with the path to your downloaded model.
# load the model into the appropriate model class model = Resnet18Binary.from_checkpoint('./demo.model')
created PytorchModel model object with 2 classes loading weights from saved object
Generate predictions as follows. The
predict method returns three arguments: scores, thresholded predictions, and labels. For unthresholded prediction on unlabeled data, only the first one is relevant, so we can discard the other returns using
scores, _, _.
# call model.predict() with the Preprocessor to generate predictions scores, _, _ = model.predict(prediction_dataset)
Look at the scores of the first 5 samples. These scores may be anything from negative to positive infinity.
#look at the scores of the first 5 samples scores.head()
Options for prediction¶
The code above returns the raw predictions of the model without any post-processing (such as a softmax layer or a sigmoid layer).
For details on how to use the
predict() function for post-processing of predictions and to generate binary 0/1 predictions of class presence, see the “Basic training and prediction with CNNs” tutorial notebook. But, as a quick example, let’s add a softmax layer to make the prediction scores for both classes sum to 1. We can also use the
binary_preds argument to generate 0/1 predictions for each sample and class. For presence/absence models, use the option
binary_preds='single_target'. For multi-class models, think about whether each clip should be labeled with only one class (single target) or whether each clip could contain multiple classes (
scores, binary_predictions, _ = model.predict( prediction_dataset, activation_layer='softmax', binary_preds='single_target' )
As before, the
scores are continuous variables, but now have been softmaxed:
We also have an additional output, the binary 0/1 (“absent” vs “present”) predictions generated by the model:
It is often helpful to look at a histogram of the scores for the positive class. Because this dummy model had random weights, we would expect this histogram to center somewhere around 0.5.
_ = plt.hist(scores['present'],bins=20) _ = plt.xlabel('softmax score for positive class')
Prediction on long (un-split) audio files¶
It’s also possible to run predictions on long audio files. In this case, OpenSoundscape will internally split the audio into short segments during prediction. The input and output of prediction is slightly different in this case: - Input is similar to before: a dataframe with the index containing the paths to audio files - Output is still a dataframe, but it will have three “index” columns. The first matches the index of the input, and contains the audio file paths. The second and third index columns contain the “begin” and “end” time of clips relative to the start of the audio file. The remaining columns, as usual, contain the names of each class and the scores or predictions for each class for that row’s audio clip.
Let’s look at an example. We’ll use the 1 minute audio file contained within OpenSoundscape’s test folder as a “long” audio file. In practice, you can split files that are multiple hours long - the limiting factor is your computer’s memory (“RAM”), which must be able to hold the entire audio file.
import opensoundscape from opensoundscape.preprocess.preprocessors import LongAudioPreprocessor #get audio path from opensoundscape's tests folder audio_1m_path = Path(opensoundscape.__file__).parent.parent.joinpath('tests/audio/1min.wav') long_audio_prediction_df = pd.DataFrame(index=[audio_1m_path]) img_shape = [224,224] #the audio will be split during prediction. choose the clip length and overlap of sequential clips (0 for no overlap) clip_length = 5.0 clip_overlap = 0.0 long_audio_prediction_ds = LongAudioPreprocessor( long_audio_prediction_df, audio_length=clip_length, clip_overlap=clip_overlap, out_shape=img_shape, )
in addition to the scores (and potentially, predictions) the function returns a list of “unsafe” samples that caused errors during preprocessing.
score_df, pred_df, unsafe_samples = model.split_and_predict( long_audio_prediction_ds, file_batch_size=1, num_workers=0, activation_layer=None, binary_preds='single_target', threshold=0.5, clip_batch_size=4, error_log=None, ) score_df.head()
Models trained with OpenSoundscape 0.4.x¶
One set of our publicly availably binary models for 500 species was created with an older version of OpenSoundscape. These models require a little bit of manipulation to load into OpenSoundscape 0.5.x and onward.
First, let’s download one of these models (it’s stored in a .tar format) and save it to the same directory as this notebook in a file called
subprocess.run(['curl', 'https://pitt.box.com/shared/static/lglpty35omjhmq6cdz8cfudm43nn2t9f.tar', '-L', '-o', 'opso_04_model_acanthis-flammea.tar'])
CompletedProcess(args=['curl', 'https://pitt.box.com/shared/static/lglpty35omjhmq6cdz8cfudm43nn2t9f.tar', '-L', '-o', 'opso_04_model_acanthis-flammea.tar'], returncode=0)
Next, load the weights from that model into an OpenSoundscape model object with the following code:
from opensoundscape.torch.models.cnn import PytorchModel from opensoundscape.torch.architectures.cnn_architectures import resnet18 import torch # load the tar file into a dictionary # (you could change this to the location of any .tar file on your computer) opso_04_model_tar_path = "./opso_04_model_acanthis-flammea.tar" opso_04_model_dict = torch.load(opso_04_model_tar_path) # create a resnet18 binary model # (all models created with Opensoundscape 0.4.x are 2-class resnet18 architectures) architecture = resnet18(num_classes=2,use_pretrained=False) model = PytorchModel(classes=['negative','positive'],architecture=architecture) # load the model weights into our model object # now, our model is equivalent to the trained model we downloaded model.network.load_state_dict(opso_04_model_dict['model_state_dict'])
created PytorchModel model object with 2 classes
<All keys matched successfully>
Now, we can use the model as normal to create predictions on audio. We’ll use the same
prediction_dataset from above (which does not contain any Common redpoll).
Remember to choose the
activation_layer you desire. In this example, we’ll assume we just want to generate scores, not binary predictions. We’ll apply a softmax layer, then the logit transform, to the scores using the
activation_layer="softmax_and_logit" option. This will generate the type of scores that are useful for plotting score histograms, among other things.
# generate predictions on our dataset predition_scores_df,_,_ = model.predict(prediction_dataset, activation_layer='softmax') predition_scores_df.head()
Remove the downloaded files to clean up.
folder = Path('./woodcock_labeled_data') [p.unlink() for p in folder.glob("*")] folder.rmdir() for p in Path('.').glob('*.model'): p.unlink() for p in Path('.').glob('*.tar'): p.unlink()