Train classifiers on BirdNET or Perch embeddings

NOTE: The primary class for transfer learning is now SongSpace. Please see the SongSpace tutorial notebook for a demonstration of the workflow to embed samples to a database and train shallow classifiers. Direct use of the APIs demonstrated in this notebook is for advanced useres.

This notebook shows examples of how to train simple one-layer or multi-layer fully-connected neural networks (aka multi-layer perceptron networks, MLPs) on Perch [1] and BirdNET [2], which are TensorFlow models. For a more general introduction to transfer learning tools in OpenSoundscape see the transfer_learning.ipynb notebook, which focuses on PyTorch (rather than Tensorflow) embedding models.

Though BirdNET and Perch are TensorFlow models, we can still use them as feature extractors (to generate embeddings) and train shallow classifiers on top of them with PyTorch. We just won’t be able to train the feature extractor weights at all.

For this notebook, you’ll need a Python environment with tensorflow and tensorflow-hub packages installed. If you want cuda gpu acceleration on a linux machine, check this table for the tensorflow and cudnn package versions you’ll need to be compatible with your current CUDA version (you can check the cuda version on your machine by calling nvidia-smi from command line). Note that the cudnn package version might conflict with which version PyTorch wants, so we recommend creating separate Python environments for pytorch and tensorflow cuda-compatability.

Note that in this tutorial, all classifiers are trained as multi-target (each class is predicted independently, such that any sample can have 0, 1, or >1 classes present). Most bioacoustics classification tasks are multi-target.

preparing your python environment

This notebook uses the bioacoustics_model_zoo, so you’ll want to install that repository as a package in your python environment. After installing, be sure to restart this notebook’s kernel so that the model zoo is available. From command line, you can install the model zoo from github using

pip install bioacoustics-model-zoo==0.12.0

If you want a specific version (aka tag or release) of the model zoo, you can specify it after an @ symbol:

pip install bioacoustics-model-zoo==0.12.0@0.11.0.dev1

[1] Ghani, B., T. Denton, S. Kahl, H. Klinck, T. Denton, S. Kahl, and H. Klinck. 2023. Global birdsong embeddings enable superior transfer learning for bioacoustic classification. Scientific Reports 13:22876.

[2] Kahl, Stefan, et al. “BirdNET: A deep learning solution for avian diversity monitoring.” Ecological Informatics 61 (2021): 101236.

Run this tutorial

This tutorial is more than a reference! It’s a Jupyter Notebook which you can run and modify on Google Colab or your own computer.

Link to tutorial	How to run tutorial
	The link opens the tutorial in Google Colab. Uncomment the “installation” line in the first cell to install OpenSoundscape.
	The link downloads the tutorial file to your computer. Follow the Jupyter installation instructions, then open the tutorial file in Jupyter.

[1]:

%load_ext autoreload
%autoreload 2

[2]:

# if this is a Google Colab notebook, install opensoundscape in the runtime environment
if 'google.colab' in str(get_ipython()):
  %pip install "opensoundscape==0.13.0" "jupyter-client<8,>=5.3.4" "ipykernel==6.17.1" "bioacoustics-model-zoo==0.12.0"
  num_workers=0
else:
  num_workers=4

Setup

Import needed packages

[3]:

import torch
import pandas as pd
from pathlib import Path
import numpy as np
import pandas as pd
import random
from glob import glob
import sklearn

from tqdm.autonotebook import tqdm
from sklearn.metrics import average_precision_score, roc_auc_score
from pathlib import Path

#set up plotting
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for large visuals
%config InlineBackend.figure_format = 'retina'

# opensoundscape transfer learning tools
from opensoundscape.ml.shallow_classifier import MLPClassifier, fit_classifier_on_embeddings

/var/folders/d8/265wdp1n0bn_r85dh3pp95fh0000gq/T/ipykernel_97961/3183818291.py:10: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm

Set random seeds

Set manual seeds for Pytorch and Python. These essentially “fix” the results of any stochastic steps in model training, ensuring that training results are reproducible. You probably don’t want to do this when you actually train your model, but it’s useful for debugging.

[4]:

torch.manual_seed(0)
random.seed(0)
np.random.seed(0)

Download and prepare training data

Download example files

Download a set of aquatic soundscape recordings with annotations of Rana sierrae vocalizations. If you already have them, you can skip this step.

Option 1: run the cell below

if you get a 403 error, DataDryad suspects you are a bot. Use Option 2.

Option 2:

Download and unzip the rana_sierrae_2022.zip folder containing audio and annotations from this public Dryad dataset
Move the unzipped rana_sierrae_2022 folder into the current folder

[5]:

# Note: the "!" preceding each line below allows us to run bash commands in a Jupyter notebook
# If you are not running this code in a notebook, input these commands into your terminal instead
!wget -O rana_sierrae_2022.zip https://datadryad.org/stash/downloads/file_stream/2722802;
!unzip rana_sierrae_2022;

--2026-05-02 13:44:29--  https://datadryad.org/stash/downloads/file_stream/2722802
Resolving datadryad.org (datadryad.org)... 44.230.249.8, 52.26.167.239, 44.225.159.71, ...
Connecting to datadryad.org (datadryad.org)|44.230.249.8|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://datadryad.org/downloads/file_stream/2722802 [following]
--2026-05-02 13:44:30--  https://datadryad.org/downloads/file_stream/2722802
Reusing existing connection to datadryad.org:443.
HTTP request sent, awaiting response... 403 Forbidden
2026-05-02 13:44:30 ERROR 403: Forbidden.

Archive:  rana_sierrae_2022.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
Archive:  rana_sierrae_2022.ZIP
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.

Prepare audio data

See the train_cnn.ipynb tutorial for step-by-step walkthrough of this process, or just run the cells below to prepare a trainig set.

[6]:

# Set the current directory to where the folder `rana_sierrae_2022` is located:
dataset_path = Path("./rana_sierrae_2022/")

# let's generate clip labels of 5s duration (to match Perch input duration) using the raven annotations
# and some utility functions from opensoundscape
from opensoundscape.annotations import BoxedAnnotations

audio_and_raven_files = pd.read_csv(f"{dataset_path}/audio_and_raven_files.csv")
# update the paths to where we have the audio and raven files stored
audio_and_raven_files["audio"] = audio_and_raven_files["audio"].apply(
    lambda x: f"{dataset_path}/{x}"
)
audio_and_raven_files["raven"] = audio_and_raven_files["raven"].apply(
    lambda x: f"{dataset_path}/{x}"
)

annotations = BoxedAnnotations.from_raven_files(
    raven_files=audio_and_raven_files["raven"],
    audio_files=audio_and_raven_files["audio"],
    annotation_column="annotation",
)
# generate labels for 5s clips, including any labels that overlap by at least 0.2 seconds
labels = annotations.clip_labels(clip_duration=3, min_label_overlap=0.2)

/Users/SML161/opensoundscape/opensoundscape/annotations.py:347: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  all_annotations_df = pd.concat(all_file_dfs).reset_index(drop=True)

Inspect labels

Count number of each annotation type:

Note that the ‘X’ label is for when the annotator was uncertain about the identity of a call. Labels A-E denote distinct call types.

[7]:

labels.sum()

[7]:

A    675
E    180
D     65
B     32
C    110
X    133
dtype: int64

split into training and validation data

We’ll just focus on class ‘A’, the call type with the most annotations. We’ll randomly split the clips into training and validation data, acknowledging that this approach does not test the ability of the model to generalize. Since the samples in the training and validation sets could be adjascent 2-second audio clips, good performance could simply mean the model has memorized the training samples, and the validation set has very similar samples.

[ ]:

classes = ["A"]
labels_train, labels_val = sklearn.model_selection.train_test_split(labels[classes])

Train classification head on BirdNET

The BirdNET and Perch models provided in the Bioacoustics Model Zoo have a .tf_model attribute containing the TensorFlow inference model and a .network attribute containing a trainable PyTorch classification head, specifically an instance of the MLPCLassifier class. To train a custom classifier on the embeddings extracted by these models, we just need to (1) embed the training and validation samples, then (2) pass the embeddings and labels to the .network.fit() method.

This is equivalent to passing the .network to the the opensoundscape.ml.shallow_classifier.fit() method, so you can also experiment with generating your own classification heads (e.g. various instances of MLPClassifier) and fitting each of them on the embeddings. See the transfer learning tutorial for further examples.

[9]:

import bioacoustics_model_zoo as bmz

birdnet = bmz.BirdNET()

downloading model from URL...

In general, generating embeddings will take a bit of time (because it requires loading, preprocessing, and embedding samples) but training shallow classifiers will be fast.

[19]:

emb_train = birdnet.embed(
    labels_train, return_dfs=False, batch_size=32, num_workers=num_workers
)
emb_val = birdnet.embed(
    labels_val, return_dfs=False, batch_size=32, num_workers=num_workers
)

/Users/SML161/opensoundscape/opensoundscape/ml/cnn.py:2958: UserWarning: The columns of input samples df differ from `model.classes`. Discarding sample df columns.
  warnings.warn(

/Users/SML161/opensoundscape/opensoundscape/ml/cnn.py:2958: UserWarning: The columns of input samples df differ from `model.classes`. Discarding sample df columns.
  warnings.warn(

[ ]:

# We want to train the classifier on the 'A' class here, corresponding to the primary R. sierrae call type.
# Let's replace fc output layer with 1-output layer for class 'A'

birdnet.change_classes(classes)

# fit the classification head with embeddings and labels
best_step_metrics = birdnet.network.fit(
    emb_train, labels_train.values, emb_val, labels_val.values, steps=300
)
best_step_metrics

Step 100/300, Loss: 0.310, Val Loss: 0.292, val AU ROC: 0.941, val MAP: 0.844
Step 200/300, Loss: 0.274, Val Loss: 0.270, val AU ROC: 0.943, val MAP: 0.849
Step 300/300, Loss: 0.222, Val Loss: 0.262, val AU ROC: 0.944, val MAP: 0.850
Loaded best model with validation loss: 0.262 at step 297 of 300
Training complete

{'loss': 0.26155926535526913,
 'auroc': 0.9439022944210589,
 'map': 0.8502054493981454,
 'per_class_auroc': [0.9439022944210589]}

[22]:

# make predictions by passing the embeddings through the classifier
preds = birdnet.network(torch.tensor(emb_val)).detach()
# calculate the area under the ROC score
print(f"ROC AUC Score: {roc_auc_score(labels_val.values, preds, average=None):.3f}")

ROC AUC Score: 0.942

to visualize the performance, let’s plot histograms of classifier logit scores for positive and negative samples

it shows that precision is ok for scores above 2 (few negatives get high scores), but recall is only moderate (many positive samples get low scores)

[23]:

preds = preds.detach().numpy()
plt.hist(preds[labels_val == True], bins=20, alpha=0.5, label="positives")
plt.hist(preds[labels_val == False], bins=20, alpha=0.5, label="negatives")
plt.legend()

[23]:

<matplotlib.legend.Legend at 0x3401c3230>

../_images/tutorials_training_birdnet_and_perch_28_1.png

Saving and loading custom classifiers

We use the BirdNET.save_classifier(path) and BirdNET.load_classifier(path) functions to save and load custom classification heads (MLPClassifier objects).

[24]:

# save the MLP Classifier (birdnet.network or alias birdnet.classifier) to a file
birdnet.classifier.save("custom_birdnet_classifier.pth")

Load and use the custom classifier

[25]:

# load custom classifier onto BirdNET
birdnet.classifier = MLPClassifier.load("custom_birdnet_classifier.pth")

# apply BirdNET with custom classifier
birdnet.predict(labels_val.head())

[25]:

			A
file	start_time	end_time
rana_sierrae_2022/mp3/sine2022a_MSD-0558_20220624_173000_0-10s.mp3	0.0	3.0	-2.712375
rana_sierrae_2022/mp3/sine2022a_MSD-0558_20220620_093000_0-10s.mp3	3.0	6.0	-2.332742
rana_sierrae_2022/mp3/sine2022a_MSD-0558_20220624_091500_0-10s.mp3	9.0	12.0	-3.837755
rana_sierrae_2022/mp3/sine2022a_MSD-0558_20220624_010000_0-10s.mp3	3.0	6.0	-3.979539
rana_sierrae_2022/mp3/sine2022a_MSD-0558_20220621_140000_0-10s.mp3	6.0	9.0	-1.336104

Note that we could also directly load and use the classifier on embeddings

[26]:

mlp_classifier = MLPClassifier.load("custom_birdnet_classifier.pth")
prediction_scores = mlp_classifier(torch.tensor(emb_val)).detach()

simplifying training into a single function call

We can use the .train() function to combine these steps into one line of code: the function first creates embeddings, then fits the shallow classifier using the embeddings and the labels. It also reports performance on the validation set. This function is designed to look very similar to the Opensoundscape.CNN.train() method, but note that it is doing something different: it begins by creating embeddings using the pre-trained model, and only trains the classification head, not the entire model architecture (the feature extraction layers remain unchanged).

The advantage of taking each step separately (as shown above) is that once we create the embeddings, we can rapidly try fitting different shallow classifiers.

[27]:

birdnet.change_classes(classes)
# we demonstrate on a smaller subset of data for efficiency:
birdnet.train(
    train_df=labels_train.sample(64),
    validation_df=labels_val.sample(64),
    embedding_batch_size=64,
    embedding_num_workers=num_workers,
    steps=1000,
)

Embedding the training samples without augmentation

Embedding the validation samples

Fitting the classifier
Step 100/1000, Loss: 0.153, Val Loss: 0.375, val AU ROC: 0.915, val MAP: 0.783
Step 200/1000, Loss: 0.070, Val Loss: 0.403, val AU ROC: 0.906, val MAP: 0.762
Step 300/1000, Loss: 0.040, Val Loss: 0.436, val AU ROC: 0.903, val MAP: 0.757
Step 400/1000, Loss: 0.027, Val Loss: 0.465, val AU ROC: 0.902, val MAP: 0.757
Step 500/1000, Loss: 0.019, Val Loss: 0.490, val AU ROC: 0.903, val MAP: 0.757
Step 600/1000, Loss: 0.015, Val Loss: 0.513, val AU ROC: 0.903, val MAP: 0.757
Step 700/1000, Loss: 0.011, Val Loss: 0.532, val AU ROC: 0.899, val MAP: 0.754
Step 800/1000, Loss: 0.009, Val Loss: 0.550, val AU ROC: 0.897, val MAP: 0.753
Step 900/1000, Loss: 0.008, Val Loss: 0.566, val AU ROC: 0.896, val MAP: 0.752
Step 1000/1000, Loss: 0.006, Val Loss: 0.581, val AU ROC: 0.896, val MAP: 0.752
Loaded best model with validation loss: 0.373 at step 90 of 1000
Training complete

adjusting the shallow classifier architecture

By default, we are training a single fully-connected neural network layer to map from the embeddings (feature vectors) to the class predictions, which is equivalent to logistic regression. We can easily modify the structure of the shallow classifier to a multi-layer network. For instance, let’s make a 2-layer artificial neural network with a hidden layer of size (100,):

[28]:

# replace the classification head with a 2-layer MLP (2 fully-connected layers)
birdnet.initialize_custom_classifier(classes, hidden_layer_sizes=[100])

# train the classifier in one step: this function first creates embeddings, then fits the classification head
# birdnet.train(
#     train_df=labels_train,
#     validation_df=labels_val,
#     embedding_batch_size=64,
#     embedding_num_workers=num_workers,
#     steps=1000,
# )

# since we already computed the embeddings, we can directly fit the classifier
from opensoundscape.ml.shallow_classifier import fit

fit(
    birdnet.classifier,
    emb_train,
    labels_train.values,
    emb_val,
    labels_val.values,
    steps=300,
)

Step 100/300, Loss: 0.231, Val Loss: 0.241, val AU ROC: 0.945, val MAP: 0.853
Step 200/300, Loss: 0.296, Val Loss: 0.245, val AU ROC: 0.941, val MAP: 0.855
Step 300/300, Loss: 0.187, Val Loss: 0.261, val AU ROC: 0.937, val MAP: 0.850
Loaded best model with validation loss: 0.240 at step 122 of 300
Training complete

[28]:

{'loss': 0.23972870906194052,
 'auroc': 0.9365862334736469,
 'map': 0.8503450673351637,
 'per_class_auroc': [0.9365862334736469]}

when we evaluate, we see that the performance of our two-layer classificaiton head on the validation set is slightly worse than we got with a 1-layer classification head. This indicates that the two-layer model was over-parameterized, and was able to over-fit to the training data leading to worse generalization to the validation set.

[29]:

# make predictions by passing the embeddings through the classifier
preds = birdnet.network(torch.tensor(emb_val)).detach()
# calculate the area under the ROC score
print(f"auc roc score: {roc_auc_score(labels_val.values, preds, average=None)}")

# plot histograms
preds = preds.detach().numpy()
plt.hist(preds[labels_val == True], bins=20, alpha=0.5, label="positives")
plt.hist(preds[labels_val == False], bins=20, alpha=0.5, label="negatives")
plt.legend()

auc roc score: 0.9365862334736469

[29]:

<matplotlib.legend.Legend at 0x3512a0910>

../_images/tutorials_training_birdnet_and_perch_40_2.png

including variants of the training samples using augmentation

Augmentation is a powerful technique for avoiding over-fitting and helping machine learning models generalize. The .predict() function gives us an option to perform stochastic augmentations of the training samples, thereby creating variations of the embeddings to increase the effective size of our training data. We use the n_augmentation_variants parameter to choose the number of variations: by default this parameter is 0 and no augmentation is performed. Here we choose to create 2 variations of each training sample using augmentation - in practice, it may be helpful to create 8 or more variations of each training sample, but note that the time to create the embeddings will be proportional to the number of variations you create.

We use a subset (.sample(20)) of audio clips here for the sake of demonstration. Each of the 20 samples generates 2 training sample variants via augmentation.

[31]:

# revert to a 1-layer classification head
from opensoundscape.ml import shallow_classifier

birdnet.initialize_custom_classifier(hidden_layer_sizes=[], classes=classes)
# create augmented embeddings
train_emb, train_labels = shallow_classifier.augmented_embed(
    birdnet, labels_train.sample(20), n_augmentation_variants=2, batch_size=16
)


# fit the classifier
birdnet.network.fit(train_emb, train_labels, emb_val, labels_val, steps=300)

Step 100/300, Loss: 0.099, Val Loss: 0.542, val AU ROC: 0.830, val MAP: 0.610
Step 200/300, Loss: 0.041, Val Loss: 0.614, val AU ROC: 0.841, val MAP: 0.635
Step 300/300, Loss: 0.023, Val Loss: 0.677, val AU ROC: 0.845, val MAP: 0.643
Loaded best model with validation loss: 0.490 at step 39 of 300
Training complete

[31]:

{'loss': 0.4899434844652812,
 'auroc': 0.8452362601529614,
 'map': 0.6427193477485879,
 'per_class_auroc': [0.8452362601529614]}

Train shallow classifier on Perch2 embeddings

Training classifiers on Perch or Perch2 works the same way!

Since Perch2 uses 5 second audio clips as inputs, we’ll first re-generate clip labels for 5 second audio segments from the annotated Rana sierrae dataset.

As before, well just work on the ‘A’ type call for this example

As with BirdNET (see examples above):

our perch model object with the custom classification head (which is an instance of the Perch class) can be saved with perch.save(path) and loaded with reloaded_model=Perch.load(path).
we could perform the embedding and classifier fitting steps separately using perch.embed(labels_val,...) to create embeddings, then fit with perch.network.fit(emb_train, labels_train.values, emb_val, labels_val.values)

[8]:

labels = annotations.clip_labels(clip_duration=5, min_label_overlap=0.2)
labels_train, labels_val = sklearn.model_selection.train_test_split(labels[["A"]])

[ ]:

import bioacoustics_model_zoo as bmz

perch2 = bmz.Perch2()

/Users/SML161/miniconda3/envs/opso_dev/lib/python3.13/site-packages/tensorflow_hub/__init__.py:61: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import parse_version
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.13/site-packages/bioacoustics_model_zoo/perch_v2.py:208: UserWarning: Disabling TensorFlow's XLA compilation (setting tf.config.optimizer.set_jit(False)) because otherwise TF models on Mac hang at runtime as of Tensorflow 2.21.0
  warnings.warn(

We set up a 2-layer classification head (see details above) by running initialize_custom_classifier with one hidden layer.

We also create 2 variants of each training sample with stochastic augmentation by setting n_augmentation_variants=2

[11]:

# define classes for your custom classifier
perch2.change_classes(classes)

# replace the default 1-layer classification head with a 2-layer MLP (2 fully-connected layers)
perch2.initialize_custom_classifier(hidden_layer_sizes=[100], classes=classes)

[12]:

# train the classifier in one step: this function first creates embeddings, then fits the classification head
perch2.train(
    train_df=labels_train.groupby("A").sample(20),
    validation_df=labels_val.groupby("A").sample(20),
    n_augmentation_variants=2,
    embedding_batch_size=32,
    embedding_num_workers=num_workers,
    steps=1000,
)

Embedding the training samples 2 times with stochastic augmentation

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1777743917.297719 188380475 service.cc:153] XLA service 0x34407a440 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1777743917.298701 188380475 service.cc:161]   StreamExecutor [0]: Host, Default Version (Driver: 0.0.0; Runtime: 0.0.0; Toolkit: 0.0.0; DNN: 0.0.0)
I0000 00:00:1777743917.448584 188380475 dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
W0000 00:00:1777743917.671554 188383069 cpp_gen_intrinsics.cc:74] Empty bitcode string provided for eigen. Optimizations relying on this IR will be disabled.
I0000 00:00:1777743917.671609 188383069 rsqrt.cc:179] Falling back to 1 / sqrt(x) for f32 false
I0000 00:00:1777743917.939103 188380475 device_compiler.h:208] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.

Embedding the validation samples

Fitting the classifier
Step 100/1000, Loss: 0.007, Val Loss: 0.364, val AU ROC: 0.980, val MAP: 0.983
Step 200/1000, Loss: 0.002, Val Loss: 0.437, val AU ROC: 0.980, val MAP: 0.983
Step 300/1000, Loss: 0.001, Val Loss: 0.481, val AU ROC: 0.980, val MAP: 0.983
Step 400/1000, Loss: 0.000, Val Loss: 0.512, val AU ROC: 0.980, val MAP: 0.983
Step 500/1000, Loss: 0.000, Val Loss: 0.536, val AU ROC: 0.980, val MAP: 0.983
Step 600/1000, Loss: 0.000, Val Loss: 0.556, val AU ROC: 0.980, val MAP: 0.983
Step 700/1000, Loss: 0.000, Val Loss: 0.573, val AU ROC: 0.980, val MAP: 0.983
Step 800/1000, Loss: 0.000, Val Loss: 0.588, val AU ROC: 0.980, val MAP: 0.983
Step 900/1000, Loss: 0.000, Val Loss: 0.601, val AU ROC: 0.980, val MAP: 0.983
Step 1000/1000, Loss: 0.000, Val Loss: 0.613, val AU ROC: 0.980, val MAP: 0.983
Loaded best model with validation loss: 0.260 at step 26 of 1000
Training complete

[21]:

perch2.use_custom_classifier

[21]:

True

[50]:

# make predictions by passing the embeddings through the classifier
subset = labels_val.sample(64)
perch2.classes = classes
preds = perch2.predict(subset, batch_size=16, num_workers=num_workers)

# Note: if we already have the embeddings, we could have created epredictions much more quickly by running
# perch.network(torch.tensor(val_embeddings.float()))

# plot histogram of scores for positive and negative clips
plt.hist(preds[subset.A == True], bins=20, alpha=0.5, label="positives")
plt.hist(preds[subset.A == False], bins=20, alpha=0.5, label="negatives")
plt.legend()

# calculate the area under the ROC score
roc_auc_score(subset.values, preds, average=None)

[50]:

0.41360544217687073

../_images/tutorials_training_birdnet_and_perch_53_2.png

Variations on training

The primary class for transfer learning is now SongSpace. Please see the SongSpace tutorial notebook for a demonstration of the workflow to embed samples to a database and train shallow classifiers. Direct use of the APIs demonstrated in this notebook is for advanced useres.

OpenSoundscape also provides tools to generate embeddings for augmented variations of the input samples (opensoundscape.shallow_classifier.augmented_embed()), which could improve the classifiers performance and generalizability. See the transfer learning tutorial for further examples of this and other workflows.

Here are a few other tools from the shallow_classifiers module to check out:

augmented_embed(): generate embeddings for each sample multiple times, with stochastic augmentation on the audio clips
fit_classifier_on_embeddings(): this function wraps together the embedding step with the classifier fitting step into a single operation, with support for generating augmented variations of training samples. It returns the embeddings and labels, in case you want to train additional classifiers on them
MLPClassifier: this class creates a neural network with one or more fully connected layers. This object can be trained by passing it to fit() or fit_classifier_on_embeddings(), or by running the MLPClassifier.fit() method (equivalent to fit(model)). The input size should match the embedding size of the embedding model, and the output size should match the number of classes your model predicts on.

Clean Up

[51]:

try:
    Path("custom_birdnet_classifier.pth").unlink()
except:
    pass