Training a model

This quickstart will guide you through the process of creating a simple machine learning model that can identify the “peent” vocalization of an American Woodcock (Scolopax minor).

To use this notebook, follow the “developer” installation instructions in OpenSoundscape’s README.

# suppress warnings
import warnings

from opensoundscape.datasets import SingleTargetAudioDataset
from opensoundscape.torch.train import train
from opensoundscape.data_selection import binary_train_valid_split
from opensoundscape.helpers import run_command
import torch
import torch.nn
import torch.optim
import torchvision.models
import yaml
import os.path
import pandas as pd
from pathlib import Path
from math import floor

Download labeled audio files

The Kitzes Lab has created some labeled ARU data of American Woodcock vocalizations. Run the following cell to download this small dataset.

These commands require you to have wget and tar installed on your computer, as they will download and unzip a compressed file in .tar.gz format. If you would prefer, you can also download a .zip version of the files by clicking here. You will have to unzip this folder and place it in the same folder that this notebook is in.

The folder’s name is woodcock_labeled_data.

commands = [
    "curl -L -o ./woodcock_labeled_data.tar.gz",
    "tar -xzf woodcock_labeled_data.tar.gz", # Unzip the downloaded tar.gz file
    "rm woodcock_labeled_data.tar.gz" # Remove the file after its contents are unzipped
for command in commands:
curl -L -o ./woodcock_labeled_data.tar.gz
tar -xzf woodcock_labeled_data.tar.gz # Unzip the downloaded tar.gz file
rm woodcock_labeled_data.tar.gz # Remove the file after its contents are unzipped
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100     7    0     7    0     0      6      0 --:--:--  0:00:01 --:--:--     6
100 4031k  100 4031k    0     0  1626k      0  0:00:02  0:00:02 --:--:-- 3296k

The folder contains 2s long clips. It also contains a file woodcock_labels.csv which contains the names of each file and its corresponding label information, created using a program called Specky.

Create a pandas DataFrame of all of the labeled files, then inspect the head() of this dataframe to see what its contents look like.

labels = pd.read_csv(Path("woodcock_labeled_data/woodcock_labels.csv"))
filename woodcock sound_type
0 d4c40b6066b489518f8da83af1ee4984.wav present song
1 e84a4b60a4f2d049d73162ee99a7ead8.wav absent na
2 79678c979ebb880d5ed6d56f26ba69ff.wav present song
3 49890077267b569e142440fa39b3041c.wav present song
4 0c453a87185d8c7ce05c5c5ac5d525dc.wav present song

So that the machine learning algorithm can find these files, add the name of the folder in front of the files.

labels['filename'] = 'woodcock_labeled_data' + os.path.sep + labels['filename'].astype(str)
filename woodcock sound_type
0 woodcock_labeled_data/d4c40b6066b489518f8da83a... present song
1 woodcock_labeled_data/e84a4b60a4f2d049d73162ee... absent na
2 woodcock_labeled_data/79678c979ebb880d5ed6d56f... present song
3 woodcock_labeled_data/49890077267b569e142440fa... present song
4 woodcock_labeled_data/0c453a87185d8c7ce05c5c5a... present song

Create training and validation datasets

To use machine learning on these files, separate them into a “training” dataset, which will be used to teach the machine learning algorithm, and a “validation” dataset, which will be used to evaluate the algorithm’s performance each epoch.

The “present” labels in the woodcock column of the dataframe will be turned into 1s. All other labels will be turned into 0s. This is required by Pytorch, which doesn’t accept string labels.

train_df, valid_df = binary_train_valid_split(input_df = labels, label_column='woodcock', label="present")

Create a list of labels so future users of the model will be able to interpret the 0/1 output.

label_dict = {0:'absent', 1:'scolopax-minor'}

Turn these dataframes into “Datasets” using the SingleTargetAudioDataset class. We have to specify the names of the columns in the dataframes to use this class. Once they are set up in this class, they can be used by the training algorithm. Data augmentation could be applied in this step, but is not demonstrated here.

train_dataset = SingleTargetAudioDataset(
    df=train_df, label_dict=label_dict, label_column='NumericLabels', filename_column='filename')
valid_dataset = SingleTargetAudioDataset(
    df=valid_df, label_dict=label_dict, label_column='NumericLabels', filename_column='filename')

Train the machine learning model

Next, we will set up the architecture of our model and train it. The model architecture we will use is a combination of a feature extractor and a classifier.

The feature extractor is a resnet18 convolutional neural network. We call it with pretrained=True, so that we use a version of the model that somebody has already trained on another image dataset called ImageNet. Although spectrograms aren’t the same type of images as the photographs used in ImageNet, using the pretrained model will allow the model to more quickly adapt to identifying spectrograms.

The classifier is a Linear classifier. We have to set the input and output size for this classifier. It takes in the outputs of the feature extractor, so in_features = model.fc.in_features. The model identifies one species, so it has to be able to output a “present” or “absent” classification. Thus, out_features=2. A multi-species model would use out_features=number_of_species.

# Set up architecture for the type of model we will use
model = torchvision.models.resnet18(pretrained = True)
model.fc = torch.nn.Linear(in_features = model.fc.in_features, out_features = 2)

Next, we set up a directory in which to save results, and then run the model. We set up the following parameters: * save_dir: the directory in which to save results (which is created if it doesn’t exist) * model: the model set up in the previous cell * train_dataset: the training dataset created using SingleTargetAudioDataset * optimizer: the optimizer to use for training the algorithm * loss_fn: the loss function used to assess the algorithm’s performance during training * epochs: the number of times the model will run through the training data * log_every: how frequently to save performance data and save intermediate machine learning weights (log_every=1 will save every epoch)

This function allows you to control more parameters, but they are not demonstrated here.

results_path = Path('model_train_results')
if not results_path.exists(): results_path.mkdir()
    save_dir = results_path,
    model = model,
    train_dataset = train_dataset,
    valid_dataset = valid_dataset,
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-3),
    loss_fn = torch.nn.CrossEntropyLoss(),
Epoch 0
  Validation results:
    train_loss: 0.6503365182063796
    train_accuracy: 0.7272727272727273
    train_precision: [0.         0.72727273]
    train_recall: [0.         0.72727273]
    train_f1: [0.         0.72727273]
    valid_accuracy: 0.7142857142857143
    valid_precision: [0.         0.71428571]
    valid_recall: [0.         0.71428571]
    valid_f1: [0.         0.71428571]
  Saved results to model_train_results/epoch-0.tar.
Training complete.

This command “cleans up” by deleting all the downloaded files and results.

import shutil
# Delete downloads
# Delete results