Manipulating audio annotations
This notebook demonstrates how to use the annotations
module of OpenSoundscape to:
load annotations from Raven files
create a set of one-hot labels corresponding to fixed-length audio clips
split a set of labeled audio files into clips and create labels dataframe for all clips
The audio recordings used in thise notebook were recorded by Andrew Spencer and are available under a Creative Commons License (CC BY-NC-ND 2.5) from xeno-canto.org. Annotations were performed in Raven Pro software by our team.
[1]:
from opensoundscape import Audio, Spectrogram
from opensoundscape.annotations import BoxedAnnotations
import numpy as np
import pandas as pd
from glob import glob
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for big visuals
%config InlineBackend.figure_format = 'retina'
Download example files
Run the code below to download a set of example audio and raven annotations files for this tutorial.
[2]:
import subprocess
subprocess.run(['curl','https://drive.google.com/uc?export=download&id=1ZTlV9KzWU0lWsjZpn1rgA92LUuXO1u8a','-L', '-o','gwwa_audio_and_raven_annotations.tar.gz']) # Download the data
subprocess.run(["tar","-xzf", "gwwa_audio_and_raven_annotations.tar.gz"]) # Unzip the downloaded tar.gz file
subprocess.run(["rm", "gwwa_audio_and_raven_annotations.tar.gz"]) # Remove the file after its contents are unzipped
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0
100 5432k 100 5432k 0 0 1475k 0 0:00:03 0:00:03 --:--:-- 7214k
[2]:
CompletedProcess(args=['rm', 'gwwa_audio_and_raven_annotations.tar.gz'], returncode=0)
Load a single Raven annotation table from a txt file
We can use the BoxedAnnotation class’s from_raven_file
method to load a Raven .txt file into OpenSoundscape. This table contains the frequency and time limits of rectangular “boxes” representing each annotation that was created in Raven.
Note that we need to specify the name of the column containing annotations, since it can be named anything in Raven. The column will be renamed to “annotation”.
This table looks a lot like what you would see in the Raven interface.
[3]:
# specify an audio file and corresponding raven annotation file
audio_file = './gwwa_audio_and_raven_annotations/GWWA_XC/13738.wav'
annotation_file = './gwwa_audio_and_raven_annotations/GWWA_XC_AnnoTables/13738.Table.1.selections.txt'
Let’s look at a spectrogram of the audio file to see what we’re working with.
[4]:
Spectrogram.from_audio(Audio.from_file(audio_file)).plot()
/Users/SML161/miniconda3/envs/opso_dev/lib/python3.9/site-packages/matplotlib_inline/config.py:68: DeprecationWarning: InlineBackend._figure_format_changed is deprecated in traitlets 4.1: use @observe and @unobserve instead.
def _figure_format_changed(self, name, old, new):

Now, let’s load the annotations from the Raven annotation file.
[5]:
#create an object from Raven file
annotations = BoxedAnnotations.from_raven_files([annotation_file],audio_files=[audio_file])
#inspect the object's .df attribute, which contains the table of annotations
annotations.df.head()
[5]:
audio_file | raven_file | annotation | start_time | end_time | low_f | high_f | View | Notes | Selection | Channel | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... | GWWA_song | 0.459636 | 2.298182 | 4029.8 | 17006.4 | Spectrogram 1 | NaN | 1 | 1 |
1 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... | GWWA_song | 6.705283 | 8.246417 | 4156.6 | 17031.7 | Spectrogram 1 | NaN | 2 | 1 |
2 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... | ? | 13.464641 | 15.005775 | 3903.1 | 17082.4 | Spectrogram 1 | NaN | 3 | 1 |
3 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... | GWWA_song | 20.128208 | 21.601748 | 4055.2 | 16930.3 | Spectrogram 1 | NaN | 4 | 1 |
4 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... | GWWA_song | 26.047590 | 27.521131 | 4207.2 | 17057.1 | Spectrogram 1 | NaN | 5 | 1 |
Note: if you do not have an annotation column, e.g., if you are annotating the sounds of a single species, pass the argument annotation_column_idx=None
. The resulting dataframe will have an empty annotation
column.
We could instead choose to only load the necessary columns (start_time
, end_time
, low_f
, high_f
, and annotation
) using the keep_extra_columns=None
.
In this example, we use keep_extra_columns=['Notes']
to keep only the Notes column.
[6]:
annotations = BoxedAnnotations.from_raven_files([annotation_file],keep_extra_columns=['Notes'],audio_files=[audio_file])
annotations.df.head()
[6]:
audio_file | raven_file | annotation | start_time | end_time | low_f | high_f | Notes | |
---|---|---|---|---|---|---|---|---|
0 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA_song | 0.459636 | 2.298182 | 4029.8 | 17006.4 | NaN |
1 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA_song | 6.705283 | 8.246417 | 4156.6 | 17031.7 | NaN |
2 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | ? | 13.464641 | 15.005775 | 3903.1 | 17082.4 | NaN |
3 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA_song | 20.128208 | 21.601748 | 4055.2 | 16930.3 | NaN |
4 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA_song | 26.047590 | 27.521131 | 4207.2 | 17057.1 | NaN |
Convert or correct annotations
We can provide a DataFrame (e.g., from a .csv file) or a dictionary to convert original values to new values.
Let’s load up a little .csv file that specifies a set of conversions we’d like to make. The .csv file should have two columns, but it doesn’t matter what they are called. If you create a table in Microsoft Excel, you can export it to a .csv file to use it as your conversion table.
[7]:
conversion_table = pd.read_csv('./gwwa_audio_and_raven_annotations/conversion_table.csv')
conversion_table
[7]:
original | new | |
---|---|---|
0 | gwwa_song | gwwa |
Alternatively, we could simply write a Python dictionary for the conversion table. For instance:
[8]:
conversion_table = {
"GWWA_song":"GWWA",
"?":np.nan
}
Now, we can apply the conversions in the table to our annotations.
This will create a new BoxedAnnotations object rather than modifying the original object (an “out of place operation”).
[9]:
annotations_corrected = annotations.convert_labels(conversion_table)
annotations_corrected.df
[9]:
audio_file | raven_file | annotation | start_time | end_time | low_f | high_f | Notes | |
---|---|---|---|---|---|---|---|---|
0 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA | 0.459636 | 2.298182 | 4029.8 | 17006.4 | NaN |
1 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA | 6.705283 | 8.246417 | 4156.6 | 17031.7 | NaN |
2 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | NaN | 13.464641 | 15.005775 | 3903.1 | 17082.4 | NaN |
3 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA | 20.128208 | 21.601748 | 4055.2 | 16930.3 | NaN |
4 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA | 26.047590 | 27.521131 | 4207.2 | 17057.1 | NaN |
5 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA | 33.121946 | 34.663079 | 4207.2 | 17082.4 | NaN |
6 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA | 42.967925 | 44.427946 | 4181.9 | 17057.1 | NaN |
7 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | GWWA | 52.417508 | 53.891048 | 4232.6 | 16930.3 | NaN |
View a subset of annotations
We can specify a list of classes to view annotations of.
For example, we can subset to only annotations marked as “?” - perhaps we’re interested in looking at these annotations in Raven again to determine what class they really were.
[10]:
classes_to_keep = ['?']
annotations_only_unsure = annotations.subset(classes_to_keep)
annotations_only_unsure.df
[10]:
audio_file | raven_file | annotation | start_time | end_time | low_f | high_f | Notes | |
---|---|---|---|---|---|---|---|---|
2 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN | ? | 13.464641 | 15.005775 | 3903.1 | 17082.4 | NaN |
Saving annotations to Raven-compatible file
We can always save our BoxedAnnotations object to a Raven-compatible .txt file, which can be opened in Raven along with an audio file just like the .txt files Raven creates itself. You must specify a path for the save file that ends with .txt
.
[11]:
annotations_only_unsure.to_raven_files('./gwwa_audio_and_raven_annotations/')#13738_unsure.txt')
Split an audio clip and its annotations
Often, we want to train or validate models on short audio segments (e.g., 5 seconds) rather than on long files (e.g., 2 hours).
We can easily create tables of “one hot” encoded labels for a series of audio segments within each annotated file using BoxedAnnotations.one_hot_clip_labels()
What is one-hot encoding?
The functions below demonstrate the creation of one-hot encoded labels.
This machine learning term, “one-hot encoding,” refers to a way to format a table of labels in which: * Each row represents a single sample, like a single 5-second long clip * Each column represents a single possible class (e.g. one of multiple species) * A “0” in a row and column means that in that sample, the class is not present * A “1” is “hot,” meaning that in that sample, the class IS present.
For example, let’s say we had a 15-second audio clip that we were splitting into three 5s clips. Let’s say we are training a classifier to identify coyotes and dogs, and we labeled the clip and found: * a coyote howled from 2.5 to 4 seconds into the clip (so, only the first clip contains it) * a dog barked from 4 seconds to 10 seconds into the clip (so, both the first and second clips contain it) * and there was silence for the last 5 seconds of the clip (so, the third clip has neither coyotes nor dogs in it).
The one-hot encoded labels file for this example would look like:
[17]:
pd.DataFrame({
"start_time":[0, 5, 10],
"end_time":[5, 10, 15],
"COYOTE":[1, 0, 0],
"DOG":[1, 1, 0]
})
[17]:
start_time | end_time | COYOTE | DOG | |
---|---|---|---|---|
0 | 0 | 5 | 1 | 1 |
1 | 5 | 10 | 0 | 1 |
2 | 10 | 15 | 0 | 0 |
Split annotations using splitting parameters
This function requires that we specify the minimum overlap of the label (in seconds) with the clip for the clip to be labeled positive. It also requires that we either (1) specify the list of classes for one-hot labels or (2) specify class_subset=None
, which will make a column for every unique label in the annotations. In this example, that would include a “?” class
[18]:
labels_df = annotations.one_hot_clip_labels(
full_duration=60, # The duration of the entire audio file
clip_duration=5,
clip_overlap=0,
class_subset=['GWWA_song'],
min_label_overlap=0.25,
)
labels_df.head()
[18]:
GWWA_song | |||
---|---|---|---|
file | start_time | end_time | |
./gwwa_audio_and_raven_annotations/GWWA_XC/13738.wav | 0 | 5 | 1.0 |
5 | 10 | 1.0 | |
10 | 15 | 0.0 | |
15 | 20 | 0.0 | |
20 | 25 | 1.0 |
A data munging example: pairing Raven files and audio to create a labeled dataset
In practice, we have tons of audio files with their corresponding Raven files. We need to:
Pair up all the audio files with their Raven annotation files
Create a dataframe of labels corresponding to short segments of each audio file
Let’s walk through the steps required to do this. But be warned, pairing Raven files and audio files might require more finagling than shown here.
Match up audio files and Raven annotations
The first step in the process is associating audio files with their corresponding Raven files. Perhaps not every audio file is annotated, and perhaps some audio files have been annotated multiple times. This code walks through some steps of sorting through these data to pair files.
Caveat: you’ll need to be careful using the code below, depending on how your audio and Raven files are named and organized.
In this example, we’ll assume that each audio file has the same name as its Raven annotation file (ignoring the extensions like “.Table.1.selections.txt”), which is the default naming convention when using Raven. We’ll also start by assuming that the audio filenames are unique (!) - that is, no two audio files have the same name.
First, find all the Raven files and all the audio files.
[19]:
# Specify folder containing Raven annotations
raven_files_dir = "./gwwa_audio_and_raven_annotations/GWWA_XC_AnnoTables/"
# Find all .txt files
# We'll naively assume all files with the suffix ".txt" are Raven files!
# A better assumption could be to search for files with the suffix ".selections.txt"
raven_files = glob(f"{raven_files_dir}/*.txt")
print(f"found {len(raven_files)} annotation files")
# Specify folder containing audio files
audio_files_dir = "./gwwa_audio_and_raven_annotations/GWWA_XC/"
# Find all audio files (we'll assume they are .wav, .WAV, or .mp3)
audio_files = glob(f"{audio_files_dir}/*.wav")+glob(f"{audio_files_dir}/*.WAV")+glob(f"{audio_files_dir}/*.mp3")
print(f"found {len(audio_files)} audio files")
found 3 annotation files
found 3 audio files
Next, starting by assuming that audio files have unique names, use the audio filenames to pair up the annotation files. Then, double-check that our assumption is correct.
[20]:
# Pair up the Raven and audio files based on the audio file name
from pathlib import Path
audio_df = pd.DataFrame({'audio_file':audio_files})
audio_df.index = [Path(f).stem for f in audio_files]
# Check that there aren't duplicate audio file names
print('\n audio files with duplicate names:')
audio_df[audio_df.index.duplicated(keep=False)]
audio files with duplicate names:
[20]:
audio_file |
---|
Seeing that no audio files have duplicate names, check to make sure the same is true for Raven files.
[21]:
raven_df = pd.DataFrame({'raven_file':raven_files})
raven_df.index = [Path(f).stem.split('.Table')[0] for f in raven_files]
#check that there aren't duplicate audio file names
print('\n raven files with duplicate names:')
raven_df[raven_df.index.duplicated(keep=False)]
raven files with duplicate names:
[21]:
raven_file | |
---|---|
13738 | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... |
13738 | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... |
Since we found some duplicate Raven files, resolve this issue by deleting the extra Raven file, which in this case was named “selections2”.
[22]:
#remove the second selection table for file 13738.wav
raven_df=raven_df[raven_df.raven_file.apply(lambda x: "selections2" not in x)]
Once we’ve resolved any issues with duplicate names, we can match up Raven and audio files.
[23]:
paired_df = audio_df.join(raven_df,how='outer')
Check if any audio files don’t have Raven annotation files:
[24]:
print(f"audio files without raven file: {len(paired_df[paired_df.raven_file.apply(lambda x:x!=x)])}")
paired_df[paired_df.raven_file.apply(lambda x:x!=x)]
audio files without raven file: 2
[24]:
audio_file | raven_file | |
---|---|---|
135601 | ./gwwa_audio_and_raven_annotations/GWWA_XC/135... | NaN |
13742 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | NaN |
Check if any Raven files don’t have audio files:
[25]:
#look at unmatched raven files
print(f"raven files without audio file: {len(paired_df[paired_df.audio_file.apply(lambda x:x!=x)])}")
paired_df[paired_df.audio_file.apply(lambda x:x!=x)]
raven files without audio file: 1
[25]:
audio_file | raven_file | |
---|---|---|
16989 | NaN | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... |
In this example, let’s discard any unpaired Raven or audio files.
[26]:
paired_df = paired_df.dropna()
[27]:
paired_df
[27]:
audio_file | raven_file | |
---|---|---|
13738 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... |
Create label dataframes
Now we have a set of paired up Raven and audio files.
Let’s create label dataframes representing 3-second segments of each audio file
[28]:
# Choose settings for audio splitting
clip_duration = 3
clip_overlap = 0
final_clip = None
Next, set up the settings for annotation splitting:
Whether to use a subset of classes
How many seconds a label should overlap a clip, at minimum, in order for that clip to be labeled
[29]:
# Choose settings for annotation splitting
class_subset = None #Equivalent to a list of all classes: ['GWWA_song', '?']
min_label_overlap = 0.1
Load Raven annotations
[30]:
boxed_annotations = BoxedAnnotations.from_raven_files(paired_df.raven_file,paired_df.audio_file)
boxed_annotations.df.head(3)
[30]:
audio_file | raven_file | annotation | start_time | end_time | low_f | high_f | View | Notes | Selection | Channel | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... | GWWA_song | 0.459636 | 2.298182 | 4029.8 | 17006.4 | Spectrogram 1 | NaN | 1 | 1 |
1 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... | GWWA_song | 6.705283 | 8.246417 | 4156.6 | 17031.7 | Spectrogram 1 | NaN | 2 | 1 |
2 | ./gwwa_audio_and_raven_annotations/GWWA_XC/137... | ./gwwa_audio_and_raven_annotations/GWWA_XC_Ann... | ? | 13.464641 | 15.005775 | 3903.1 | 17082.4 | Spectrogram 1 | NaN | 3 | 1 |
Create label dataframes
[31]:
label_df = boxed_annotations.one_hot_clip_labels(
clip_duration=clip_duration,
clip_overlap=clip_overlap,
min_label_overlap=min_label_overlap,
class_subset=class_subset,
final_clip=final_clip,
)
[32]:
label_df.head(2)
[32]:
GWWA_song | ? | |||
---|---|---|---|---|
file | start_time | end_time | ||
./gwwa_audio_and_raven_annotations/GWWA_XC/13738.wav | 0.0 | 3.0 | 1.0 | 0.0 |
3.0 | 6.0 | 0.0 | 0.0 |
Sanity check: look at spectrograms of clips labeled 0 and 1
[33]:
# ignore the "?" annotations for this visualization
label_df = label_df[label_df["?"]==0]
Note: replace the “GWWA_song” here with a class name from your own dataset.
[34]:
# plot spectrograms for 3 random positive clips
positives = label_df[label_df['GWWA_song']==1].sample(3,random_state=0)
print("spectrograms of 3 random positive clips (label=1)")
for clip, t0, t1 in positives.index.values:
Spectrogram.from_audio(Audio.from_file(clip,offset=t0,duration=t1-t0)).plot()
# plot spectrograms for 5 random negative clips
negatives = label_df[label_df['GWWA_song']==0].sample(3,random_state=0)
print("spectrogram of 3 random negative clips (label=0)")
for clip, t0, t1 in negatives.index.values:
Spectrogram.from_audio(Audio.from_file(clip,offset=t0,duration=t1-t0)).plot()
spectrograms of 3 random positive clips (label=1)



spectrogram of 3 random negative clips (label=0)



Clean up: remove the sounds that we downloaded for this tutorial as well as the audio files we created.
[35]:
import shutil
shutil.rmtree('./gwwa_audio_and_raven_annotations')