RIBBIT Pulse Rate model demonstration

RIBBIT (Repeat-Interval Based Bioacoustic Identification Tool) is a tool for detecting vocalizations that have a repeating structure.

This tool is useful for detecting vocalizations of frogs, toads, and other animals that produce vocalizations with a periodic structure. In this notebook, we demonstrate how to select model parameters for the Great Plains Toad, then run the model on data to detect vocalizations.

RIBBIT was introduced in the 2020 poster “Automatic Detection of Pulsed Vocalizations”

This notebook demonstrates how to use the RIBBIT tool implemented in opensoundscape as opensoundscape.ribbit.ribbit()

For help instaling OpenSoundscape, see the documentation

import packages

[1]:
# suppress warnings
import warnings
warnings.simplefilter('ignore')

#import packages
import numpy as np
from glob import glob
import pandas as pd
from matplotlib import pyplot as plt

#local imports from opensoundscape
from opensoundscape.audio import Audio
from opensoundscape.spectrogram import Spectrogram
from opensoundscape.ribbit import ribbit

# create big visuals
plt.rcParams['figure.figsize']=[15,8]

download example audio

first, let’s download some example audio to work with.

You can run the cell below, OR visit this link to downlaod the data (whichever you find easier):

https://pitt.box.com/shared/static/0xclmulc4gy0obewtzbzyfnsczwgr9we.zip

If you download using the link above, first un-zip the folder (double-click on mac or right-click -> extract all on Windows). Then, move the great_plains_toad_dataset folder to the same location on your computer as this notebook. Then you can skip this cell:

[2]:
from opensoundscape.helpers import run_command
#download files from box.com to the current directory
_ = run_command(f"curl -L https://pitt.box.com/shared/static/9mrxib85y1jmf1ybbjvbr0tv171iekvy.gz -o ./great_plains_toad_dataset.tar.gz")# | tar -xz -f")
_ = run_command(f"tar -xz -f great_plains_toad_dataset.tar.gz")

#this will print `0` if everything went correctly. If it prints 256 or another number, something is wrong (try downloading from the link above)

now, you should have a folder in the same location as this notebook called great_plains_toad_dataset

if you had trouble accessing the data, you can try using your own audio files - just put them in a folder called great_plains_toad_dataset in the same location as this notebook, and this notebook will load whatever is in that folder

load an audio file and create a spectrogram

[3]:
audio_path = np.sort(glob('./great_plains_toad_dataset/*'))[0]

#load the audio file into an OpenSoundscape Audio object
audio = Audio.from_file(audio_path)

#trim the audio to the time from 0-3 seconds for a closer look
audio = audio.trim(0,3)

#create a Spectrogram object
spectrogram = Spectrogram.from_audio(audio)

show the Great Plains Toad spectrogram as an image

a spectrogram is a visual representation of audio with frequency on the vertical axis, time on the horizontal axis, and intensity represented by the color of the pixels

[4]:
spectrogram.plot()
_images/RIBBIT_pulse_rate_demo_9_0.png

select model parameters

RIBBIT requires the user to select a set of parameters that describe the target vocalization. Here is some detailed advice on how to use these parameters.

Signal Band: The signal band is the frequency range where RIBBIT looks for the target species. Based on the spectrogram above, we can see that the Great Plains Toad vocalization has the strongest energy around 2000-2500 Hz, so we will specify signal_band = [2000,2500]. It is best to pick a narrow signal band if possible, so that the model focuses on a specific part of the spectrogram and has less potential to include erronious sounds.

Noise Bands: Optionally, users can specify other frequency ranges called noise bands. Sounds in the noise_bands are subtracted from the signal_band. Noise bands help the model filter out erronious sounds from the recordings, which could include confusion species, background noise, and popping/clicking of the microphone due to rain, wind, or digital errors. It’s usually good to include one noise band for very low frequencies – this specifically eliminates popping and clicking from being registered as a vocalization. It’s also good to specify noise bands that target confusion species. Another approach is to specify two narrow noise_bands that are directly above and below the signal_band.

Pulse Rate Range: This parameters specifies the minimum and maximum pulse rate (the number of pulses per second, also known as pulse repetition rate) RIBBIT should look for to find the focal species. Looking at the spectrogram above, we can see that the pulse rate of this Great Plains Toad vocalization is about 15 pulses per second. By looking at other vocalizations in different environmental conditions, we notice that the pulse rate can be as slow as 10 pulses per second or as fast as 20. So, we choose pulse_rate_range = [10, 20] meaning that RIBBIT should look for pulses no slower than 10 pulses per second and no faster than 20 pulses per second.

Window Length: This parameter tells the algorithm how many seconds of audio to analyze at one time. Generally, you should choose a window_length that is similar to the length of the target species vocalization, or a little bit longer. For very slowly pulsing vocalizations, choose a longer window so that at least 5 pulses can occur in one window (0.5 pulses per second -> 10 second window). Typical values for window_length are 1 to 10 seconds. Keep in mind that The Great Plains Toad has a vocalization that continues on for many seconds (or minutes!) so we chose a 2-second window which will include plenty of pulses.

Plot: We can choose to show the power spectrum of pulse repetition rate for each window by setting plot=True. The default is not to show these plots (plot=False).

[5]:
# minimum and maximum rate of pulsing (pulses per second) to search for
pulse_rate_range = [10,20]

# look for a vocalization in the range of 1000-2000 Hz
signal_band = [2000,2500]

# subtract the amplitude signal from these frequency ranges
noise_bands = [ [0,200], [10000,10100]]

#divides the signal into segments this many seconds long, analyzes each independently
window_length = 2 #(seconds)

#if True, it will show the power spectrum plot for each audio segment
show_plots = True

search for pulsing vocalizations with ribbit()

This function takes the parameters we chose above as arguments, performs the analysis, and returns two arrays: - scores: the pulse rate score for each window - times: the start time in seconds of each window

The scores output by the function may be very low or very high. They do not represent a “confidence” or “probability” from 0 to 1. Instead, the relative values of scores on a set of files should be considered: when RIBBIT detects the target species, the scores will be significantly higher than when the species is not detected.

The file gpt0.wav has a Great Plains Toad vocalizing only at the beginning. Let’s analyze the file with RIBBIT and look at the scores versus time.

[8]:
#get the audio file path
audio_path = np.sort(glob('./great_plains_toad_dataset/*'))[0]

#make the spectrogram
spec = Spectrogram.from_audio(audio.from_file(audio_path))

#run RIBBIT
scores, times =  ribbit(
                    spec,
                    pulse_rate_range=pulse_rate_range,
                    signal_band=signal_band,
                    window_len=window_length,
                    noise_bands=noise_bands,
                    plot=False)


#show the spectrogram
print('spectrogram of 10 second file with Great Plains Toad at the beginning')
spec.plot()

# plot the score vs time of each window
plt.scatter(times,scores)
plt.xlabel('window start time (sec)')
plt.ylabel('RIBBIT score')
plt.title('RIBBIT scores for 10 second file with Great Plains Toad at the beginning')
spectrogram of 10 second file with Great Plains Toad at the beginning
_images/RIBBIT_pulse_rate_demo_14_1.png
[8]:
Text(0.5, 1.0, 'RIBBIT scores for 10 second file with Great Plains Toad at the beginning')
_images/RIBBIT_pulse_rate_demo_14_3.png

as we hoped, RIBBIT outputs a high score during the vocalization (the window from 0-2 seconds) and a low score when the frog is not vocalizing

analyzing a set of files

[12]:
# set up a dataframe for storing files' scores and labels
df = pd.DataFrame(index = glob('./great_plains_toad_dataset/*'),columns=['score','label'])

# label is 1 if the file contains a Great Plains Toad vocalization, and 0 if it does not
df['label'] = [1 if 'gpt' in f else 0 for f in df.index]

# calculate RIBBIT scores
for path in df.index:

    #make the spectrogram
    spec = Spectrogram.from_audio(audio.from_file(path))

    #run RIBBIT
    scores, times =  ribbit(
                        spec,
                        pulse_rate_range=pulse_rate_range,
                        signal_band=signal_band,
                        window_len=window_length,
                        noise_bands=noise_bands,
                        plot=False)

    # use the maximum RIBBIT score from any window as the score for this file
    # multiply the score by 10,000 to make it easier to read
    df.at[path,'score'] = max(scores) * 10000

print("Files sorted by score, from highest to lowest:")
df.sort_values(by='score',ascending=False)
Files sorted by score, from highest to lowest:
[12]:
score label
./great_plains_toad_dataset/gpt0.mp3 139.765 1
./great_plains_toad_dataset/gpt3.mp3 13.8338 1
./great_plains_toad_dataset/gpt2.mp3 8.25766 1
./great_plains_toad_dataset/gpt1.mp3 6.1136 1
./great_plains_toad_dataset/negative3.mp3 2.34044 0
./great_plains_toad_dataset/negative2.mp3 1.73015 0
./great_plains_toad_dataset/negative4.mp3 1.56953 0
./great_plains_toad_dataset/negative1.mp3 1.21802 0
./great_plains_toad_dataset/negative9.mp3 1.13301 0
./great_plains_toad_dataset/negative8.mp3 1.08165 0
./great_plains_toad_dataset/negative6.mp3 0.966176 0
./great_plains_toad_dataset/negative5.mp3 0.695368 0
./great_plains_toad_dataset/gpt4.mp3 0.634423 1
./great_plains_toad_dataset/pops2.mp3 0.51215 0
./great_plains_toad_dataset/water.mp3 0.510283 0
./great_plains_toad_dataset/pops1.mp3 0.493911 0
./great_plains_toad_dataset/negative7.mp3 0.0156971 0
./great_plains_toad_dataset/silent.mp3 0 0

So, how good is RIBBIT at finding the Great Plains Toad?

We can see that the scores for all of the files with Great Plains Toad (gpt) score above 6 except gpt4.mp3 (which contains only a very quiet and distant vocalization). All files that do not contain the Great Plains Toad score less than 2.5. So, RIBBIT is doing a good job separating Great Plains Toads vocalizations from other sounds!

Notably, noisy files like pops1.mp3 score low even though they have lots of periodic energy - our noise_bands sucessfully rejected these files. Without using noise_bands, files like these would receive very high scores. Also, some birds in “negatives” files that have periodic calls around the same pulsre rate as the Great Plains Toad received low scores. This is also a result of choosing a tight signal_band and strategic noise_bands. You can try adjusting or eliminating these bands to see their effect on the audio.

(HINT: elimintating the noise_bands will result in high scores for the “pops” files)

detail view

Now, lets look at one 10 second file and tell ribbit to plot the power spectral density for each window (plot=True). This way, we can see if peaks are emerging at the expected pulse rates. Since our window_length is 2 seconds, each of these plots represents 2 seconds of audio. The vertical lines on the power spectral density represent the lower and upper pulse_rate_range limits.

In the file gpt0.mp3, the Great Plains Toad vocalizes for a couple seconds at the beginning, then stops. We expect to see a peak in the power spectral density at 15 pulses/sec in the first 2 second window, and maybe a bit in the second, but not later in the audio.

[45]:
#create a spectrogram from the file, like above:
# 1. get audio file path
audio_path = np.sort(glob('./great_plains_toad_dataset/*'))[0]
# 2. make audio object and trim (this time 0-10 seconds)
audio = Audio.from_file(audio_path).trim(0,10)
# 3. make spectrogram
spectrogram = Spectrogram.from_audio(audio)

scores, times =  ribbit(
                        spectrogram,
                        pulse_rate_range=pulse_rate_range,
                        signal_band=signal_band,
                        window_len=window_length,
                        noise_bands=noise_bands,
                        plot=show_plots)

window: 0.0000 sec to 1.9969 sec
peak freq: 13.4583
_images/RIBBIT_pulse_rate_demo_20_1.png
window: 1.9969 sec to 3.9938 sec
peak freq: 0.6729
_images/RIBBIT_pulse_rate_demo_20_3.png
window: 3.9938 sec to 5.9907 sec
peak freq: 57.0293
_images/RIBBIT_pulse_rate_demo_20_5.png
window: 5.9907 sec to 7.9877 sec
peak freq: 23.7202
_images/RIBBIT_pulse_rate_demo_20_7.png

Time to experiment for yourself

Now that you know the basics of how to use RIBBIT, you can try using it on your own data. We recommend spending some time looking at different recordings of your focal species before choosing parameters. Experiment with the noise bands and window length, and get in touch if you have questions!

Sam’s email: sam . lapp [at] pitt.edu

this cell will delete the folder great_plains_toad_dataset. Only run it if you wish delete that folder and the example audio inside it.

[3]:
_ = run_command('rm -r ./great_plains_toad_dataset/')
_ = run_command('rm ./great_plains_toad_dataset.tar.gz')
[ ]: