Training with Opensoundscape & Pytorch Lightning
OpenSoundscape provides classes that support the use of Pytorch Lightning’s Trainer class, which implements various training techniques, speedups, and utilities. To use Lightning, simply use the opensoundscape.ml.lightning.LightningSpectrogramModule
class rather than the opensoundscape.ml.cnn.SpectrogramClassifier
class (or CNN
class, which is now an alias for SpectrogramClassifier
). For the most part, the API and functionality is similar to the pure-pytorch classes, with a few
major differences: - to train, call the .fit_with_trainer()
method (“train()” method is reserved for other purposes when using Lightning). Pass any kwargs to lightning.Trainer()to customize the Lightning Trainer. - to predict, call .predict_with_trainer()
, passing any kwargs for the lightning.Trainer init with lightning_trainer_kwargs=dict(...)
- note that with the Lightning Trainer, you can use various logging platforms, while only Weights and Biases is currently supported in the
pure PyTorch classes
Check out the lightning.Trainer docs for the full set of implemented features.
[1]:
# if this is a Google Colab notebook, install opensoundscape in the runtime environment
if 'google.colab' in str(get_ipython()):
%pip install git+https://github.com/kitzeslab/opensoundscape@develop ipykernel==5.5.6 ipython==7.34.0 pillow==9.4.0
num_workers=0
else:
num_workers=4
Setup
Import needed packages
[2]:
# the cnn module provides classes for training/predicting with various types of CNNs
from opensoundscape import CNN
#other utilities and packages
import torch
import pandas as pd
from pathlib import Path
import numpy as np
import pandas as pd
import random
import subprocess
from glob import glob
import sklearn
#set up plotting
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for large visuals
%config InlineBackend.figure_format = 'retina'
Set random seeds
Set manual seeds for Pytorch and Python. These essentially “fix” the results of any stochastic steps in model training, ensuring that training results are reproducible. You probably don’t want to do this when you actually train your model, but it’s useful for debugging.
[3]:
torch.manual_seed(0)
random.seed(0)
np.random.seed(0)
Download files
Training a machine learning model requires some pre-labeled data. These data, in the form of audio recordings or spectrograms, are labeled with whether or not they contain the sound of the species of interest.
These data can be obtained from online databases such as Xeno-Canto.org, or by labeling one’s own ARU data using a program like Cornell’s Raven sound analysis software. In this example we are using a set of annotated avian soundscape recordings that were annotated using the software Raven Pro 1.6.4 (Bioacoustics Research Program 2022):
An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. Lauren M. Chronister, Tessa A. Rhinehart, Aidan Place, Justin Kitzes. https://doi.org/10.1002/ecy.3329
These are the same data that are used by the annotation and preprocessing tutorials, so you can skip this step if you’ve already downloaded them there.
Download example files
Download a set of example audio files and Raven annotations:
Option 1: run the cell below
if you get a 403 error, DataDryad suspects you are a bot. Use Option 2.
Option 2:
Download and unzip both
annotation_Files.zip
andmp3_Files.zip
from the https://datadryad.org/stash/dataset/doi:10.5061/dryad.d2547d81zMove the unzipped contents into a subfolder of the current folder called
./annotated_data/
[4]:
# # Note: the "!" preceding each line below allows us to run bash commands in a Jupyter notebook
# # If you are not running this code in a notebook, input these commands into your terminal instead
# !wget -O annotation_Files.zip https://datadryad.org/stash/downloads/file_stream/641805;
# !wget -O mp3_Files.zip https://datadryad.org/stash/downloads/file_stream/641807;
# !mkdir annotated_data;
# !unzip annotation_Files.zip -d ./annotated_data/annotation_Files;
# !unzip mp3_Files.zip -d ./annotated_data/mp3_Files;
Prepare training and validation data
To prepare audio data for machine learning, we need to convert our annotated data into clip-level labels.
These steps are covered in depth in other tutorials, so we’ll just set our clip labels up quickly for this example.
First, get exactly matched lists of audio files and their corresponding selection files:
[6]:
# Set the current directory to where the dataset is downloaded
dataset_path = Path("./annotated_data/")
# Make a list of all of the selection table files
selection_files = glob(f"{dataset_path}/annotation_Files/*/*.txt")
# Create a list of audio files, one corresponding to each Raven file
# (Audio files have the same names as selection files with a different extension)
audio_files = [f.replace('annotation_Files','mp3_Files').replace('.Table.1.selections.txt','.mp3') for f in selection_files]
#Next, convert the selection files and audio files to a `BoxedAnnotations` object, which contains
#the time, frequency, and label information for all annotations for every recording in the dataset.
from opensoundscape.annotations import BoxedAnnotations
# Create a dataframe of annotations
annotations = BoxedAnnotations.from_raven_files(
raven_files=selection_files,
audio_files=audio_files,
annotation_column="Species"
)
# Parameters to use for label creation
clip_duration = 3
clip_overlap = 0
min_label_overlap = 0.25
species_of_interest = ["NOCA", "EATO", "SCTA", "BAWW", "BCCH", "AMCR", "NOFL"]
# Create dataframe of one-hot labels
clip_labels = annotations.clip_labels(
clip_duration = clip_duration,
clip_overlap = clip_overlap,
min_label_overlap = min_label_overlap,
class_subset = species_of_interest # You can comment this line out if you want to include all species.
)
from sklearn.model_selection import train_test_split
train_df, val_df = train_test_split(clip_labels, test_size=0.2)
/Users/SML161/opensoundscape/opensoundscape/annotations.py:300: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
all_annotations_df = pd.concat(all_file_dfs).reset_index(drop=True)
Create Lightning-copmatible model
Now, create a LightningSpectrogramModule object, which integrates OpenSoundscape with Pytorch Lightning’s powerful Trainer class
[7]:
# Create a CNN object designed to recognize 3-second samples
from opensoundscape.ml.lightning import LightningSpectrogramModule
# initializing it looks the same as for the CNN class.
# Let's use resnet34 architecture and 3s clip duration
model = LightningSpectrogramModule(
architecture = 'resnet34',
classes = clip_labels.columns.tolist(),
sample_duration = 3
)
Train with Lightning
Lightning will take a bit of time to get things set up. After that, it can be substantially faster than training in pure PyTorch.
[8]:
# again, the API is very similar to CNN
# but now, we can pass any kwargs to Lightning.Trainer() as well. For example,
# let's use the `accum_grad_batches` argument to accumulate gradients over 2 batches before running the optimizer,
# effectively doubling the batch size.
model.fit_with_trainer(train_df, val_df, epochs=4, batch_size=32, num_workers=num_workers, accumulate_grad_batches=2)
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:654: Checkpoint directory /Users/SML161/opensoundscape/docs/tutorials exists and is not empty.
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/lightning/pytorch/core/optimizer.py:377: Found unsupported keys in the optimizer configuration: {'scheduler'}
| Name | Type | Params | Mode
----------------------------------------------------------
0 | network | ResNet | 21.3 M | train
1 | loss_fn | BCEWithLogitsLoss_hot | 0 | train
----------------------------------------------------------
21.3 M Trainable params
0 Non-trainable params
21.3 M Total params
85.128 Total estimated model params size (MB)
117 Modules in train mode
0 Modules in eval mode
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:419: Consider setting `persistent_workers=True` in 'val_dataloader' to speed up the dataloader worker initialization.
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/torchmetrics/functional/classification/precision_recall_curve.py:798: UserWarning: MPS: nonzero op is supported natively starting from macOS 13.0. Falling back on CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Indexing.mm:334.)
unique_mapping = unique_mapping[unique_mapping >= 0]
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/torchmetrics/functional/classification/average_precision.py:308: UserWarning: MPS: no support for int64 for sum_out_mps, downcasting to a smaller data type (int32/float32). Native support for int64 has been added in macOS 13.3. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/ReduceOps.mm:157.)
weights=(state[1] == 1).sum(dim=0).float() if thresholds is None else state[0][:, 1, :].sum(-1),
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:419: Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization.
`Trainer.fit` stopped: `max_epochs=4` reached.
Training complete
Best model with score 0.181 is saved to /Users/SML161/opensoundscape/docs/tutorials/epoch=3-step=388.ckpt
0 of 6160 total training samples failed to preprocess
[8]:
<lightning.pytorch.trainer.trainer.Trainer at 0x2bf6e5490>
run inference
[9]:
model.predict_with_trainer(val_df, batch_size=32, num_workers=num_workers)
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:419: Consider setting `persistent_workers=True` in 'predict_dataloader' to speed up the dataloader worker initialization.
[9]:
NOCA | EATO | SCTA | BAWW | BCCH | AMCR | NOFL | |||
---|---|---|---|---|---|---|---|---|---|
file | start_time | end_time | |||||||
annotated_data/mp3_Files/Recording_1/Recording_1_Segment_26.mp3 | 123.0 | 126.0 | -1.150377 | 0.549135 | -2.138953 | -2.694704 | -0.467310 | 4.545997 | -4.263886 |
annotated_data/mp3_Files/Recording_2/Recording_2_Segment_11.mp3 | 132.0 | 135.0 | -1.627193 | -2.231425 | -0.292491 | -3.995329 | -2.961541 | 4.214307 | -4.755819 |
annotated_data/mp3_Files/Recording_4/Recording_4_Segment_23.mp3 | 138.0 | 141.0 | 3.111156 | 1.366953 | -3.114839 | -4.208912 | -2.098067 | -3.920546 | -4.175481 |
annotated_data/mp3_Files/Recording_4/Recording_4_Segment_06.mp3 | 18.0 | 21.0 | 3.195024 | 6.508092 | -6.060542 | -5.502892 | -3.900456 | -3.240862 | -5.377626 |
annotated_data/mp3_Files/Recording_1/Recording_1_Segment_02.mp3 | 36.0 | 39.0 | -5.511728 | -3.187492 | -6.659735 | -6.191207 | -6.031331 | -5.297669 | -6.563343 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
annotated_data/mp3_Files/Recording_4/Recording_4_Segment_05.mp3 | 267.0 | 270.0 | 5.829832 | 2.170767 | -3.095157 | -4.235296 | -3.209671 | -1.694019 | -3.902274 |
annotated_data/mp3_Files/Recording_4/Recording_4_Segment_21.mp3 | 141.0 | 144.0 | 6.042611 | 3.894333 | -3.956448 | -4.595656 | -2.281978 | -2.939838 | -4.280013 |
annotated_data/mp3_Files/Recording_1/Recording_1_Segment_23.mp3 | 183.0 | 186.0 | -3.594249 | -1.777073 | -5.218874 | -4.250111 | -2.707364 | -4.353728 | -5.437796 |
annotated_data/mp3_Files/Recording_1/Recording_1_Segment_25.mp3 | 144.0 | 147.0 | -3.061894 | -0.626145 | -4.435275 | -2.705727 | 0.509345 | 7.063011 | -5.337906 |
annotated_data/mp3_Files/Recording_2/Recording_2_Segment_14.mp3 | 159.0 | 162.0 | -2.488947 | -0.996913 | -0.150189 | -3.829111 | -3.008179 | 4.803464 | -3.589157 |
1540 rows × 7 columns
Next steps:
experiment with the various optimizations and features of lightning.Trainer, such as integration with several different logging platforms, multi-device distributed training, and more.
Check out the Lightning Trainer docs to learn more.
clean up
[10]:
import shutil
# uncomment to remove the training files
# shutil.rmtree('./annotated_data')
shutil.rmtree('./wandb', ignore_errors=True)
shutil.rmtree('./model_training_checkpoints', ignore_errors=True)
for f in glob('./*.ckpt'):
Path(f).unlink()
try:
Path('annotation_Files.zip').unlink()
except:
pass
try:
Path('mp3_Files.zip').unlink()
except:
pass