Raven annotations

Raven Sound Analysis Software enables users to inspect spectrograms, draw time and frequency boxes around sounds of interest, and label these boxes with species identities. OpenSoundscape contains functionality to prepare and use these annotations for machine learning.

Download annotated data

We published an example Raven-annotated dataset here: https://doi.org/10.1002/ecy.3329

[1]:
from opensoundscape.commands import run_command
from pathlib import Path

Download the zipped data here:

[2]:
link = "https://esajournals.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fecy.3329&file=ecy3329-sup-0001-DataS1.zip"
name = 'powdermill_data.zip'
out = run_command(f"wget -O powdermill_data.zip {link}")

Unzip the files to a new directory, powdermill_data/

[3]:
out = run_command("unzip powdermill_data.zip -d powdermill_data")

Keep track of the files we have now so we can delete them later.

[4]:
files_to_delete = [Path("powdermill_data"), Path("powdermill_data.zip")]

Preprocess Raven data

The opensoundscape.raven module contains preprocessing functions for Raven data, including: * annotation_check - for all the selections files, make sure they all contain labels * lowercase_annotations - lowercase all of the annotations * generate_class_corrections - create a CSV to see whether there are any weird names * Modify the CSV as needed. If you need to look up files you can use query_annotations * Can be used in SplitterDataset * apply_class_corrections - replace incorrect labels with correct labels * query_annotations - look for files that contain a particular species or a typo

[5]:
import pandas as pd
import opensoundscape.raven as raven
import opensoundscape.audio as audio
[6]:
raven_files_raw = Path("./powdermill_data/Annotation_Files/")

Check Raven files have labels

Check that all selections files contain labels under one column name. In this dataset the labels column is named "species".

[7]:
raven.annotation_check(directory=raven_files_raw, col='species')
All rows in powdermill_data/Annotation_Files contain labels in column `species`

Create lowercase files

Convert all the text in the files to lowercase to standardize them. Save these to a new directory. They will be saved with the same filename but with “.lower” appended.

[8]:
raven_directory = Path('./powdermill_data/Annotation_Files_Standardized')
if not raven_directory.exists(): raven_directory.mkdir()
raven.lowercase_annotations(directory=raven_files_raw, out_dir=raven_directory)

Check that the outputs are saved as expected.

[9]:
list(raven_directory.glob("*.lower"))[:5]
[9]:
[PosixPath('powdermill_data/Annotation_Files_Standardized/Recording_1_Segment_22.Table.1.selections.txt.lower'),
 PosixPath('powdermill_data/Annotation_Files_Standardized/Recording_4_Segment_15.Table.1.selections.txt.lower'),
 PosixPath('powdermill_data/Annotation_Files_Standardized/Recording_4_Segment_24.Table.1.selections.txt.lower'),
 PosixPath('powdermill_data/Annotation_Files_Standardized/Recording_1_Segment_13.Table.1.selections.txt.lower'),
 PosixPath('powdermill_data/Annotation_Files_Standardized/Recording_1_Segment_06.Table.1.selections.txt.lower')]

Generate class corrections

This function generates a table that can be modified by hand to correct labels with typos in them. It identifies the unique labels in the provided column (here "species") in all of the lowercase files in the directory raven_directory.

For instance, the generated table could be something like the following:

raw,corrected
sparrow,sparrow
sparow,sparow
goose,goose
[10]:
print(raven.generate_class_corrections(directory=raven_directory, col='species'))
raw,corrected
amcr,amcr
amgo,amgo
amre,amre
amro,amro
baor,baor
baww,baww
bbwa,bbwa
bcch,bcch
bggn,bggn
bhco,bhco
bhvi,bhvi
blja,blja
brcr,brcr
btnw,btnw
bwwa,bwwa
cang,cang
carw,carw
cedw,cedw
cora,cora
coye,coye
cswa,cswa
dowo,dowo
eato,eato
eawp,eawp
hawo,hawo
heth,heth
howa,howa
kewa,kewa
lowa,lowa
nawa,nawa
noca,noca
nofl,nofl
oven,oven
piwo,piwo
rbgr,rbgr
rbwo,rbwo
rcki,rcki
revi,revi
rsha,rsha
rwbl,rwbl
scta,scta
swth,swth
tuti,tuti
veer,veer
wbnu,wbnu
witu,witu
woth,woth
ybcu,ybcu

The released dataset has no need for class corrections, but if it did, we could save the return text to a CSV and use the CSV to apply corrections to future dataframes.

Query annotations

This function can be used to print all annotations of a particular class, e.g. “amro” (American Robin)

[11]:
output = raven.query_annotations(directory=raven_directory, cls='amro', col='species', print_out=True)
=================================================================================================
powdermill_data/Annotation_Files_Standardized/Recording_4_Segment_16.Table.1.selections.txt.lower
=================================================================================================

     selection           view  channel  begin time (s)  end time (s)  \
85          86  spectrogram 1        1       77.634876     82.129659
93          94  spectrogram 1        1       84.226733     86.313096
98          99  spectrogram 1        1       88.825438     91.272182
107        108  spectrogram 1        1       96.028977     97.552840
111        112  spectrogram 1        1       99.990354    100.914517
116        117  spectrogram 1        1      104.327755    108.656087
122        123  spectrogram 1        1      109.525937    112.021391
129        130  spectrogram 1        1      113.765766    117.386474
137        138  spectrogram 1        1      121.053454    121.383161
141        142  spectrogram 1        1      124.864220    129.139630
154        155  spectrogram 1        1      132.583749    135.017840
162        163  spectrogram 1        1      139.602300    142.087527
168        169  spectrogram 1        1      143.969913    146.785822
176        177  spectrogram 1        1      149.282840    151.873748
210        211  spectrogram 1        1      170.636021    174.123521
225        226  spectrogram 1        1      178.252401    181.670619
238        239  spectrogram 1        1      184.176135    188.110226
250        251  spectrogram 1        1      190.244089    192.858862
267        268  spectrogram 1        1      203.737856    204.958310
277        278  spectrogram 1        1      211.662233    216.270763

     low freq (hz)  high freq (hz) species
85          1539.7          3668.7    amro
93          1349.6          3630.6    amro
98          1539.7          4029.8    amro
107         1159.5          3573.6    amro
111         1539.7          3440.4    amro
116         1368.6          3041.4    amro
122         1577.7          3041.4    amro
129         1602.9          3831.4    amro
137         1993.9          2813.1    amro
141         1558.7          4200.9    amro
154         2186.0          3782.7    amro
162         1634.7          4200.9    amro
168         1748.8          3687.7    amro
176         1634.7          3744.7    amro
210         1444.7          4162.9    amro
225         1798.4          3831.4    amro
238         1653.7          3592.6    amro
250         1615.7          3687.7    amro
267         1563.1          4230.8    amro
277         1646.5          4189.1    amro

=================================================================================================
powdermill_data/Annotation_Files_Standardized/Recording_4_Segment_01.Table.1.selections.txt.lower
=================================================================================================

     selection           view  channel  begin time (s)  end time (s)  \
188        189  spectrogram 1        1      247.263069    249.107387
201        202  spectrogram 1        1      263.512160    264.851933

     low freq (hz)  high freq (hz) species
188         1249.2          2419.2    amro
201         1229.4          2558.0    amro

Split Raven annotations and audio files

The Raven module’s raven_audio_split_and_save function enables splitting of both audio data and associated annotations. It requires that the annotation and audio filenames are unique, and that corresponding annotation and audiofilenames are named the same filenames as each other.

[12]:
audio_directory = Path('./powdermill_data/Recordings/')
destination = Path('./powdermill_data/Split_Recordings')
out = raven.raven_audio_split_and_save(

    # Where to look for Raven files
    raven_directory = raven_directory,

    # Where to look for audio files
    audio_directory = audio_directory,

    # The destination to save clips and the labels CSV to
    destination = destination,

    # The column name of the labels
    col = 'species',

    # Desired audio sample rate
    sample_rate = 22050,

    # Desired duration of clips
    clip_duration = 5,

    # Verbose (uncomment the next line to see progress--this cell takes a while to run)
    #verbose=True,
)
Found 77 sets of matching audio files and selection tables out of 77 audio files and 77 selection tables

The results of the splitting are saved in the destination folder under the name labels.csv.

[13]:
labels = pd.read_csv(destination.joinpath("labels.csv"), index_col='filename')
labels.head()
[13]:
amcr amgo amre amro baor baww bbwa bcch bggn bhco ... rsha rwbl scta swth tuti veer wbnu witu woth ybcu
filename
powdermill_data/Split_Recordings/Recording_4_Segment_13_0.0s_5.0s.wav 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
powdermill_data/Split_Recordings/Recording_4_Segment_13_5.0s_10.0s.wav 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
powdermill_data/Split_Recordings/Recording_4_Segment_13_10.0s_15.0s.wav 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
powdermill_data/Split_Recordings/Recording_4_Segment_13_15.0s_20.0s.wav 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
powdermill_data/Split_Recordings/Recording_4_Segment_13_20.0s_25.0s.wav 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 48 columns

The raven_audio_split_and_save function contains several options. Notable options are: * clip_duration: the length of the clips * clip_overlap: the overlap, in seconds, between clips * final_clip: what to do with the final clip if it is not exactly clip_duration in length (see API docs for more details) * labeled_clips_only: whether to only save labeled clips * min_label_length: minimum length, in seconds, of an annotation for a clip to be considered labeled. For instance, if an annotation only overlaps 0.1s with a 5s clip, you might want to exclude it with min_label_length=0.2. * species: a subset of species to search for labels of (by default, finds all species labels in dataset) * dry_run: if True, produces print statements and returns dataframe of labels, but does not save files. * verbose: if True, prints more information, e.g. clip-by-clip progress.

For instance, let’s extract labels for one species, American Redstart (AMRE) only saving clips that contain at least 0.5s of label for that species. The “verbose” flag causes the function to print progress splitting each clip.

[14]:
btnw_split_dir = Path('./powdermill_data/btnw_recordings')
out = raven.raven_audio_split_and_save(
    raven_directory = raven_directory,
    audio_directory = audio_directory,
    destination = btnw_split_dir,
    col = 'species',
    sample_rate = 22050,
    clip_duration = 5,
    clip_overlap = 0,
    verbose=True,
    species='amre',
    labeled_clips_only=True,
    min_label_len=1
)
Found 77 sets of matching audio files and selection tables out of 77 audio files and 77 selection tables
Making directory powdermill_data/btnw_recordings
1. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_13.mp3
2. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_33.mp3
3. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_26.mp3
4. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_19.mp3
5. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_11.mp3
6. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_13.mp3
7. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_29.mp3
8. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_01.mp3
9. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_15.mp3
10. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_20.mp3
11. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_12.mp3
12. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_36.mp3
13. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_25.mp3
14. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_26.mp3
15. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_14.mp3
16. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_10.mp3
17. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_11.mp3
18. Finished powdermill_data/Recordings/Recording_3/Recording_3_Segment_01.mp3
19. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_32.mp3
20. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_03.mp3
21. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_07.mp3
22. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_04.mp3
23. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_16.mp3
24. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_30.mp3
25. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_02.mp3
26. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_19.mp3
27. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_12.mp3
28. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_08.mp3
29. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_10.mp3
30. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_20.mp3
31. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_12.mp3
32. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_14.mp3
33. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_16.mp3
34. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_25.mp3
35. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_09.mp3
36. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_17.mp3
37. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_07.mp3
38. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_02.mp3
39. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_02.mp3
40. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_08.mp3
41. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_09.mp3
42. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_05.mp3
43. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_08.mp3
44. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_05.mp3
45. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_18.mp3
46. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_14.mp3
47. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_09.mp3
48. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_23.mp3
49. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_06.mp3
50. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_34.mp3
51. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_10.mp3
52. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_27.mp3
53. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_06.mp3
54. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_31.mp3
55. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_04.mp3
56. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_24.mp3
57. Finished powdermill_data/Recordings/Recording_2/Recording_2_Segment_05.mp3
58. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_22.mp3
59. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_18.mp3
60. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_01.mp3
61. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_21.mp3
62. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_24.mp3
63. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_03.mp3
64. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_01.mp3
65. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_04.mp3
66. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_15.mp3
67. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_13.mp3
68. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_07.mp3
69. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_11.mp3
70. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_21.mp3
71. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_28.mp3
72. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_17.mp3
73. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_03.mp3
74. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_35.mp3
75. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_06.mp3
76. Finished powdermill_data/Recordings/Recording_4/Recording_4_Segment_23.mp3
77. Finished powdermill_data/Recordings/Recording_1/Recording_1_Segment_22.mp3

The labels CSV only has a column for the species of interest:

[15]:
btnw_labels = pd.read_csv(btnw_split_dir.joinpath("labels.csv"), index_col='filename')
btnw_labels.head()
[15]:
amre
filename
powdermill_data/btnw_recordings/Recording_2_Segment_13_60.0s_65.0s.wav 1.0
powdermill_data/btnw_recordings/Recording_2_Segment_13_65.0s_70.0s.wav 1.0
powdermill_data/btnw_recordings/Recording_2_Segment_13_85.0s_90.0s.wav 1.0
powdermill_data/btnw_recordings/Recording_2_Segment_13_95.0s_100.0s.wav 1.0
powdermill_data/btnw_recordings/Recording_2_Segment_13_105.0s_110.0s.wav 1.0

The split files and associated labels csv can now be used to train machine learning models (see additional tutorials).

The command below cleans up after the tutorial is done – only run it if you want to delete all of the files.

[16]:
from shutil import rmtree
for file in files_to_delete:
    if file.is_dir():
        rmtree(file)
    else:
        file.unlink()