Manipulating audio annotations

This notebook demonstrates how to use the annotations module of OpenSoundscape to:

  • load annotations from Raven files
  • create a set of one-hot labels corresponding to fixed-length audio clips
  • split a set of labeled audio files into clips and create labels dataframe for all clips

The audio recordings used in thise notebook were recorded by Andrew Spencer and are available under a Creative Commons License (CC BY-NC-ND 2.5) from Annotations were performed in Raven Pro software by our team.

from opensoundscape import Audio, Spectrogram
from opensoundscape.annotations import BoxedAnnotations

import numpy as np
import pandas as pd
from glob import glob

from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for big visuals
%config InlineBackend.figure_format = 'retina'

Download example files

Run the code below to download a set of example audio and raven annotations files for this tutorial.

import subprocess['curl','','-L', '-o','gwwa_audio_and_raven_annotations.tar.gz']) # Download the data["tar","-xzf", "gwwa_audio_and_raven_annotations.tar.gz"]) # Unzip the downloaded tar.gz file["rm", "gwwa_audio_and_raven_annotations.tar.gz"]) # Remove the file after its contents are unzipped
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
100 5432k  100 5432k    0     0  1475k      0  0:00:03  0:00:03 --:--:-- 7214k
CompletedProcess(args=['rm', 'gwwa_audio_and_raven_annotations.tar.gz'], returncode=0)

Load a single Raven annotation table from a txt file

We can use the BoxedAnnotation class’s from_raven_file method to load a Raven .txt file into OpenSoundscape. This table contains the frequency and time limits of rectangular “boxes” representing each annotation that was created in Raven.

Note that we need to specify the name of the column containing annotations, since it can be named anything in Raven. The column will be renamed to “annotation”.

This table looks a lot like what you would see in the Raven interface.

# specify an audio file and corresponding raven annotation file
audio_file = './gwwa_audio_and_raven_annotations/GWWA_XC/13738.wav'
annotation_file = './gwwa_audio_and_raven_annotations/GWWA_XC_AnnoTables/13738.Table.1.selections.txt'

Let’s look at a spectrogram of the audio file to see what we’re working with.

/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/matplotlib_inline/ DeprecationWarning: InlineBackend._figure_format_changed is deprecated in traitlets 4.1: use @observe and @unobserve instead.
  def _figure_format_changed(self, name, old, new):

Now, let’s load the annotations from the Raven annotation file.

#create an object from Raven file
annotations = BoxedAnnotations.from_raven_files([annotation_file],audio_files=[audio_file])

#inspect the object's .df attribute, which contains the table of annotations
audio_file raven_file annotation start_time end_time low_f high_f View Notes Selection Channel
0 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... GWWA_song 0.459636 2.298182 4029.8 17006.4 Spectrogram 1 NaN 1 1
1 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... GWWA_song 6.705283 8.246417 4156.6 17031.7 Spectrogram 1 NaN 2 1
2 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... ? 13.464641 15.005775 3903.1 17082.4 Spectrogram 1 NaN 3 1
3 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... GWWA_song 20.128208 21.601748 4055.2 16930.3 Spectrogram 1 NaN 4 1
4 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... GWWA_song 26.047590 27.521131 4207.2 17057.1 Spectrogram 1 NaN 5 1

Note: if you do not have an annotation column, e.g., if you are annotating the sounds of a single species, pass the argument annotation_column_idx=None. The resulting dataframe will have an empty annotation column.

We could instead choose to only load the necessary columns (start_time, end_time, low_f, high_f, and annotation) using the keep_extra_columns=None.

In this example, we use keep_extra_columns=['Notes'] to keep only the Notes column.

annotations = BoxedAnnotations.from_raven_files([annotation_file],keep_extra_columns=['Notes'],audio_files=[audio_file])
audio_file raven_file annotation start_time end_time low_f high_f Notes
0 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA_song 0.459636 2.298182 4029.8 17006.4 NaN
1 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA_song 6.705283 8.246417 4156.6 17031.7 NaN
2 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN ? 13.464641 15.005775 3903.1 17082.4 NaN
3 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA_song 20.128208 21.601748 4055.2 16930.3 NaN
4 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA_song 26.047590 27.521131 4207.2 17057.1 NaN

Convert or correct annotations

We can provide a DataFrame (e.g., from a .csv file) or a dictionary to convert original values to new values.

Let’s load up a little .csv file that specifies a set of conversions we’d like to make. The .csv file should have two columns, but it doesn’t matter what they are called. If you create a table in Microsoft Excel, you can export it to a .csv file to use it as your conversion table.

conversion_table = pd.read_csv('./gwwa_audio_and_raven_annotations/conversion_table.csv')
original new
0 gwwa_song gwwa

Alternatively, we could simply write a Python dictionary for the conversion table. For instance:

conversion_table = {

Now, we can apply the conversions in the table to our annotations.

This will create a new BoxedAnnotations object rather than modifying the original object (an “out of place operation”).

annotations_corrected = annotations.convert_labels(conversion_table)
audio_file raven_file annotation start_time end_time low_f high_f Notes
0 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA 0.459636 2.298182 4029.8 17006.4 NaN
1 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA 6.705283 8.246417 4156.6 17031.7 NaN
2 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN NaN 13.464641 15.005775 3903.1 17082.4 NaN
3 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA 20.128208 21.601748 4055.2 16930.3 NaN
4 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA 26.047590 27.521131 4207.2 17057.1 NaN
5 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA 33.121946 34.663079 4207.2 17082.4 NaN
6 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA 42.967925 44.427946 4181.9 17057.1 NaN
7 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN GWWA 52.417508 53.891048 4232.6 16930.3 NaN

View a subset of annotations

We can specify a list of classes to view annotations of.

For example, we can subset to only annotations marked as “?” - perhaps we’re interested in looking at these annotations in Raven again to determine what class they really were.

classes_to_keep = ['?']
annotations_only_unsure = annotations.subset(classes_to_keep)
audio_file raven_file annotation start_time end_time low_f high_f Notes
2 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN ? 13.464641 15.005775 3903.1 17082.4 NaN

Saving annotations to Raven-compatible file

We can always save our BoxedAnnotations object to a Raven-compatible .txt file, which can be opened in Raven along with an audio file just like the .txt files Raven creates itself. You must specify a path for the save file that ends with .txt.


Split an audio clip and its annotations

Often, we want to train or validate models on short audio segments (e.g., 5 seconds) rather than on long files (e.g., 2 hours).

We can easily create tables of “one hot” encoded labels for a series of audio segments within each annotated file using BoxedAnnotations.one_hot_clip_labels()

*What is one-hot encoding?*

The functions below demonstrate the creation of one-hot encoded labels.

This machine learning term, “one-hot encoding,” refers to a way to format a table of labels in which: * Each row represents a single sample, like a single 5-second long clip * Each column represents a single possible class (e.g. one of multiple species) * A “0” in a row and column means that in that sample, the class is not present * A “1” is “hot,” meaning that in that sample, the class IS present.

For example, let’s say we had a 15-second audio clip that we were splitting into three 5s clips. Let’s say we are training a classifier to identify coyotes and dogs, and we labeled the clip and found: * a coyote howled from 2.5 to 4 seconds into the clip (so, only the first clip contains it) * a dog barked from 4 seconds to 10 seconds into the clip (so, both the first and second clips contain it) * and there was silence for the last 5 seconds of the clip (so, the third clip has neither coyotes nor dogs in it).

The one-hot encoded labels file for this example would look like:

    "start_time":[0, 5, 10],
    "end_time":[5, 10, 15],
    "COYOTE":[1, 0, 0],
    "DOG":[1, 1, 0]
start_time end_time COYOTE DOG
0 0 5 1 1
1 5 10 0 1
2 10 15 0 0

Split annotations using splitting parameters

This function requires that we specify the minimum overlap of the label (in seconds) with the clip for the clip to be labeled positive. It also requires that we either (1) specify the list of classes for one-hot labels or (2) specify class_subset=None, which will make a column for every unique label in the annotations. In this example, that would include a “?” class

labels_df = annotations.one_hot_clip_labels(
    full_duration=60, # The duration of the entire audio file
file start_time end_time
./gwwa_audio_and_raven_annotations/GWWA_XC/13738.wav 0 5 1.0
5 10 1.0
10 15 0.0
15 20 0.0
20 25 1.0

A data munging example: pairing Raven files and audio to create a labeled dataset

In practice, we have tons of audio files with their corresponding Raven files. We need to:

  • Pair up all the audio files with their Raven annotation files
  • Create a dataframe of labels corresponding to short segments of each audio file

Let’s walk through the steps required to do this. But be warned, pairing Raven files and audio files might require more finagling than shown here.

Match up audio files and Raven annotations

The first step in the process is associating audio files with their corresponding Raven files. Perhaps not every audio file is annotated, and perhaps some audio files have been annotated multiple times. This code walks through some steps of sorting through these data to pair files.

Caveat: you’ll need to be careful using the code below, depending on how your audio and Raven files are named and organized.

In this example, we’ll assume that each audio file has the same name as its Raven annotation file (ignoring the extensions like “.Table.1.selections.txt”), which is the default naming convention when using Raven. We’ll also start by assuming that the audio filenames are unique (!) - that is, no two audio files have the same name.

First, find all the Raven files and all the audio files.

# Specify folder containing Raven annotations
raven_files_dir = "./gwwa_audio_and_raven_annotations/GWWA_XC_AnnoTables/"

# Find all .txt files
# We'll naively assume all files with the suffix ".txt" are Raven files!
# A better assumption could be to search for files with the suffix ".selections.txt"
raven_files = glob(f"{raven_files_dir}/*.txt")
print(f"found {len(raven_files)} annotation files")

# Specify folder containing audio files
audio_files_dir = "./gwwa_audio_and_raven_annotations/GWWA_XC/"

# Find all audio files (we'll assume they are .wav, .WAV, or .mp3)
audio_files = glob(f"{audio_files_dir}/*.wav")+glob(f"{audio_files_dir}/*.WAV")+glob(f"{audio_files_dir}/*.mp3")
print(f"found {len(audio_files)} audio files")
found 3 annotation files
found 3 audio files

Next, starting by assuming that audio files have unique names, use the audio filenames to pair up the annotation files. Then, double-check that our assumption is correct.

# Pair up the Raven and audio files based on the audio file name
from pathlib import Path
audio_df = pd.DataFrame({'audio_file':audio_files})
audio_df.index = [Path(f).stem for f in audio_files]

# Check that there aren't duplicate audio file names
print('\n audio files with duplicate names:')

 audio files with duplicate names:

Seeing that no audio files have duplicate names, check to make sure the same is true for Raven files.

raven_df = pd.DataFrame({'raven_file':raven_files})
raven_df.index = [Path(f).stem.split('.Table')[0] for f in raven_files]

#check that there aren't duplicate audio file names
print('\n raven files with duplicate names:')

 raven files with duplicate names:
13738 ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann...
13738 ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann...

Since we found some duplicate Raven files, resolve this issue by deleting the extra Raven file, which in this case was named “selections2”.

#remove the second selection table for file 13738.wav
raven_df=raven_df[raven_df.raven_file.apply(lambda x: "selections2" not in x)]

Once we’ve resolved any issues with duplicate names, we can match up Raven and audio files.

paired_df = audio_df.join(raven_df,how='outer')

Check if any audio files don’t have Raven annotation files:

print(f"audio files without raven file: {len(paired_df[paired_df.raven_file.apply(lambda x:x!=x)])}")
paired_df[paired_df.raven_file.apply(lambda x:x!=x)]
audio files without raven file: 2
audio_file raven_file
135601 ./gwwa_audio_and_raven_annotations/GWWA_XC/135... NaN
13742 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... NaN

Check if any Raven files don’t have audio files:

#look at unmatched raven files
print(f"raven files without audio file: {len(paired_df[paired_df.audio_file.apply(lambda x:x!=x)])}")

paired_df[paired_df.audio_file.apply(lambda x:x!=x)]
raven files without audio file: 1
audio_file raven_file
16989 NaN ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann...

In this example, let’s discard any unpaired Raven or audio files.

paired_df = paired_df.dropna()
audio_file raven_file
13738 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann...

Create label dataframes

Now we have a set of paired up Raven and audio files.

Let’s create label dataframes representing 3-second segments of each audio file

# Choose settings for audio splitting
clip_duration = 3
clip_overlap = 0
final_clip = None

Next, set up the settings for annotation splitting:

  • Whether to use a subset of classes
  • How many seconds a label should overlap a clip, at minimum, in order for that clip to be labeled
# Choose settings for annotation splitting
class_subset = None #Equivalent to a list of all classes: ['GWWA_song', '?']
min_label_overlap = 0.1

Load Raven annotations

boxed_annotations = BoxedAnnotations.from_raven_files(paired_df.raven_file,paired_df.audio_file)
audio_file raven_file annotation start_time end_time low_f high_f View Notes Selection Channel
0 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... GWWA_song 0.459636 2.298182 4029.8 17006.4 Spectrogram 1 NaN 1 1
1 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... GWWA_song 6.705283 8.246417 4156.6 17031.7 Spectrogram 1 NaN 2 1
2 ./gwwa_audio_and_raven_annotations/GWWA_XC/137... ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... ? 13.464641 15.005775 3903.1 17082.4 Spectrogram 1 NaN 3 1

Create label dataframes

label_df = boxed_annotations.one_hot_clip_labels(
GWWA_song ?
file start_time end_time
./gwwa_audio_and_raven_annotations/GWWA_XC/13738.wav 0.0 3.0 1.0 0.0
3.0 6.0 0.0 0.0

Sanity check: look at spectrograms of clips labeled 0 and 1

# ignore the "?" annotations for this visualization
label_df = label_df[label_df["?"]==0]

Note: replace the “GWWA_song” here with a class name from your own dataset.

# plot spectrograms for 3 random positive clips
positives = label_df[label_df['GWWA_song']==1].sample(3,random_state=0)
print("spectrograms of 3 random positive clips (label=1)")
for clip, t0, t1 in positives.index.values:

# plot spectrograms for 5 random negative clips
negatives = label_df[label_df['GWWA_song']==0].sample(3,random_state=0)
print("spectrogram of 3 random negative clips (label=0)")
for clip, t0, t1 in negatives.index.values:

spectrograms of 3 random positive clips (label=1)
spectrogram of 3 random negative clips (label=0)

Clean up: remove the sounds that we downloaded for this tutorial as well as the audio files we created.

import shutil