# Custom preprocessing¶

Preprocessors in OpenSoundscape perform all of the preprocessing steps from loading a file from the disk up to providing a sample to the machine learning algorithm for training or prediction. These classes are used when (a) training a machine learning model in OpenSoundscape, or (b) making predictions with a machine learning model in OpenSoundscape.

If you are already familiar with PyTorch, you might notice that Preprocessors take the place of, and are children of, PyTorch’s Dataset classes to provide each sample to PyTorch as a Tensor.

Preprocessors are designed to be flexible and modular, so that each step of the preprocessing pipeline can be modified or removed. This notebook demonstrates:

• preparation of audio data to be used by a preprocessor
• how “Actions” are strung together into “Pipelines” to preprocess data
• modifying the parameters of actions
• turning Actions on and off
• modifying the order and contents of pipelines
• use of the AudioToSpectrogramPreprocessor class, including examples of:
• modifying audio and spectrogram parameters
• changing the output image shape
• changing the output type
• use of the CnnPreprocessor class, including examples of:
• choosing between default “augmentation on” and “augmentation off” pipelines
• modifying augmentation parameters
• using the “overlay” augmentation
• writing custom preprocessors and actions

First, import the needed packages.

[1]:

# Preprocessor classes are used to load, transform, and augment audio samples for use in a machine learing model
from opensoundscape.preprocess.preprocessors import BasePreprocessor, AudioToSpectrogramPreprocessor, CnnPreprocessor

#other utilities and packages
from opensoundscape.helpers import run_command
import torch
import pandas as pd
from pathlib import Path
import numpy as np
import pandas as pd
import random


Set up plotting and some helper functions.

[2]:

#set up plotting
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for large visuals
%config InlineBackend.figure_format = 'retina'

# helper function for displaying a sample as an image
def show_tensor(sample):
plt.imshow((sample['X'][0,:,:]/2+0.5)*-1,cmap='Greys',vmin=-1,vmax=0)
plt.show()


Set manual seeds for pytorch and python. These ensure the training results are reproducible. You probably don’t want to do this when you actually train your model, but it’s useful for debugging.

[3]:

torch.manual_seed(0)
random.seed(0)


## Preparing audio data¶

The Kitzes Lab has created a small labeled dataset of short clips of American Woodcock vocalizations. You have two options for obtaining the folder of data, called woodcock_labeled_data:

1. Run the following cell to download this small dataset. These commands require you to have curl and tar installed on your computer, as they will download and unzip a compressed file in .tar.gz format.
2. Download a .zip version of the files by clicking here. You will have to unzip this folder and place the unzipped folder in the same folder that this notebook is in.

Note: Once you have the data, you do not need to run this cell again.

[4]:

commands = [
"curl -L https://pitt.box.com/shared/static/79fi7d715dulcldsy6uogz02rsn5uesd.gz -o ./woodcock_labeled_data.tar.gz",
"rm woodcock_labeled_data.tar.gz" # Remove the file after its contents are unzipped
]
for command in commands:
run_command(command)


### Generate one-hot encoded labels¶

The folder contains 2s long audio clips taken from an autonomous recording unit. It also contains a file woodcock_labels.csv which contains the names of each file and its corresponding label information, created using a program called Specky.

We manipulate the label dataframe to give “one hot” labels - that is, a column for every class, with 1 for present or 0 for absent in each sample’s row. In this case, our classes are simply ‘negative’ for files without a woodcock and ‘positive’ for files with a woodcock. Note that these classes are mutually exclusive, so we have a “single-target” problem (as opposed to a “multi-target” problem where multiple classes can simultaneously be present).

For more details on the steps below, see the basic CNN training and prediction tutorial.

[5]:

#load Specky output: a table of labeled audio files

#update the paths to the audio files
labels.filename = ['./woodcock_labeled_data/'+f for f in labels.filename]

#generate "one-hot" labels
labels['negative']=[0 if label=='present' else 1 for label in labels['woodcock']]
labels['positive']=[1 if label=='present' else 0 for label in labels['woodcock']]

#use the file path as the index, and class names as the only columns
classes = ['negative','positive']
labels = labels.set_index('filename')[classes]

[5]:

negative positive
filename
./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav 0 1
./woodcock_labeled_data/79678c979ebb880d5ed6d56f26ba69ff.wav 0 1
./woodcock_labeled_data/49890077267b569e142440fa39b3041c.wav 0 1
./woodcock_labeled_data/0c453a87185d8c7ce05c5c5ac5d525dc.wav 0 1

## Intro to Preprocessors¶

Preprocessors prepare samples for use by machine learning algorithms by stringing together transformations called Actions into a Pipeline. The preprocessor sequentially applies to the sample each Action in the Pipeline. You can add, remove, and rearrange Actions from the pipeline and change the parameters of each Action.

The currently implemented Preprocessor classes and their Actions include:

• CnnPreprocessor - loads audio files, creates spectrograms, performs various augmentations, and returns a pytorch Tensor.
• AudioToSpectrogramPreprocessor - loads audio files, creates spectrograms, and returns a pytorch Tensor (no augmentation).

### Initialize preprocessor¶

A Preprocessor must be initialized with a very specific dataframe:

• the index of the dataframe provides paths to audio samples
• the columns are the class names
• the values are 0 (absent/False) or 1 (present/True) for each sample and each class.

For example, we’ve set up the labels dataframe with files as the index and classes as the columns, so we can use it to make an instance of CnnPreprocessor:

[6]:

from opensoundscape.preprocess.preprocessors import CnnPreprocessor

preprocessor = CnnPreprocessor(labels)


### Access sample from a Preprocessor¶

A sample is accessed in a preprocessor using indexing, like a list. Each sample is a dictionary with two keys: ‘X’, the Tensor of the sample, and ‘y’, the Tensor of labels of the sample.

[7]:

preprocessor[0]

[7]:

{'X': tensor([[[0.0000, 0.0000, 0.0000,  ..., 0.4976, 0.4393, 0.4653],
[0.0000, 0.0000, 0.0000,  ..., 0.4880, 0.4658, 0.4754],
[0.0000, 0.0000, 0.0000,  ..., 0.4894, 0.4882, 0.4231],
...,
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]],

[[0.0000, 0.0000, 0.0000,  ..., 0.4694, 0.4241, 0.4743],
[0.0000, 0.0000, 0.0000,  ..., 0.4994, 0.4457, 0.4963],
[0.0000, 0.0000, 0.0000,  ..., 0.4734, 0.4847, 0.4085],
...,
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]],

[[0.0000, 0.0000, 0.0000,  ..., 0.4864, 0.4233, 0.4630],
[0.0000, 0.0000, 0.0000,  ..., 0.4857, 0.4357, 0.4965],
[0.0000, 0.0000, 0.0000,  ..., 0.4995, 0.4914, 0.4010],
...,
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]]]),
'y': tensor([0, 1])}


### Subset samples from a Preprocessor¶

Preprocessors allow you to select a subset of samples using sample() and head() methods (like Pandas DataFrames). For example:

[8]:

len(preprocessor)

[8]:

29


Select the first 10 samples (non-random)

[9]:

len(preprocessor.head(5))

[9]:

5


Randomly select an absolute number of samples

[10]:

len(preprocessor.sample(n=10))

[10]:

10


Randomly select a fraction of samples

[11]:

len(preprocessor.sample(frac=0.5))

[11]:

14


## Pipelines and actions¶

Each Preprocessor class has two attributes, preprocessor.pipeline and preprocessor.actions. Pipelines are comprised of Actions.

The preprocessor’s Pipeline is the ordered list of Actions that the preprocessor performs on each sample.

• The Pipeline is stored in the preprocessor.pipeline attribute.
• You can modify the contents or order of Preprocessor Actions by overwriting the preprocessor’s .pipeline attribute. When you modify this attribute, you must provide a list of Actions, where each Action is an instance of a class that sub-classes opensoundscape.preprocess.BaseAction.

Inspect the current pipeline.

[12]:

# inspect the current pipeline (ordered sequence of Actions to take)
preprocessor.pipeline

[12]:

[<opensoundscape.preprocess.actions.AudioLoader at 0x7fd98ef45e50>,
<opensoundscape.preprocess.actions.AudioTrimmer at 0x7fd98ef454f0>,
<opensoundscape.preprocess.actions.AudioToSpectrogram at 0x7fd98ef45970>,
<opensoundscape.preprocess.actions.SpectrogramBandpass at 0x7fd99603ba30>,
<opensoundscape.preprocess.actions.SpecToImg at 0x7fd99603b8e0>,
<opensoundscape.preprocess.actions.BaseAction at 0x7fd99603b1f0>,
<opensoundscape.preprocess.actions.TorchColorJitter at 0x7fd99603b220>,
<opensoundscape.preprocess.actions.ImgToTensor at 0x7fd99603b160>,
<opensoundscape.preprocess.actions.TensorNormalize at 0x7fd98ef45c10>,
<opensoundscape.preprocess.actions.TorchRandomAffine at 0x7fd99603b130>]


Preprocessors come with a set of predefined Actions that are available to the preprocessor. These are not necessarily all included in the preprocessing pipeline; these are just the transformations that are available to be strung together into a pipeline if desired.

• The Actions are stored in the preprocessor.actions attribute. Each Action is an instance of a class (described in more detail below).
• Each Action takes a sample (and its labels), performs some transformation to them, and returns the sample (and its labels). The code for this transformation is stored in the Action’s .go() method.
• You can customize Actions using the .on() and .off() methods to turn the Action on or off, or by changing the action’s parameters. Any customizable parameters for performing the Action are stored in a dictionary, .params. This dictionary can be modified using the Action’s .set() method, e.g. action.set(param=value, param2=value2, ...).
• You can view all the available Actions in a preprocessor using the .list_actions() method.
[13]:

# create a new instance of a CnnPreprocessor
preprocessor = AudioToSpectrogramPreprocessor(labels)

# print all Actions that have been added to the preprocessor
# (Note that this is not the pipeline, just a collection of available actions)
preprocessor.actions.list_actions()

[13]:

['load_audio',
'trim_audio',
'to_spec',
'bandpass',
'to_img',
'to_tensor',
'normalize']


Notice that the Actions in preprocessor.actions.list_actions() are not identical to the names listed in the pipeline, but are parallel. For example, in this case, preprocessor.actions.to_spec corresponds to an instance of opensoundscape.preprocess.actions.AudioToSpectrogram:

[14]:

preprocessor.actions.to_spec

[14]:

<opensoundscape.preprocess.actions.AudioToSpectrogram at 0x7fd99698aa60>


That’s because of the structure of actions:

• The .actions attribute is an instance of a class called ActionContainer (see below)
• The ActionContainer has an attribute for each possible action, e.g. preprocessor.actions.to_spec
• Each attribute is defined as an instance of an Action class, e.g. AudioToSpectrogram
• Each Action class is a child of a class called BaseAction; see the actions module for examples.
[15]:

preprocessor.actions?

Type:        ActionContainer
String form: <opensoundscape.preprocess.actions.ActionContainer object at 0x7fd99698a280>
File:        ~/Code/opensoundscape/opensoundscape/preprocess/actions.py
Docstring:
this is a container object which holds instances of Action child-classes

the Actions it contains each have .go(), .on(), .off(), .set(), .get()

The actions are un-ordered and may not all be used. In preprocessor objects
such as AudioToSpectrogramPreprocessor, Actions from the action
container are listed in a pipeline(list), which defines their order of use.

To set parameters of actions: action_container.loader.set(param=value,...)

Methods: list_actions()



## Modifying Actions¶

### View default parameters for an Action¶

The docstring for an individual action, such as preprocessor.actions.to_spec, gives information on what parameters can be changed and what the defaults are.

[16]:

preprocessor.actions.to_spec?

Type:        AudioToSpectrogram
String form: <opensoundscape.preprocess.actions.AudioToSpectrogram object at 0x7fd99698aa60>
File:        ~/Code/opensoundscape/opensoundscape/preprocess/actions.py
Docstring:
Action child class for Audio.from_file() (Audio -> Spectrogram)

see spectrogram.Spectrogram.from_audio for documentation

Args:
window_type="hann":
see scipy.signal.spectrogram docs for description of window parameter
window_samples=512:
number of audio samples per spectrogram window (pixel)
overlap_samples=256:
number of samples shared by consecutive windows
decibel_limits = (-100,-20) :
limit the dB values to (min,max)
(lower values set to min, higher values set to max)



Any defaults that have been changed will be shown in the .params attribute of the action:

[17]:

preprocessor.actions.to_spec.params

[17]:

{}


### Modify Action parameters¶

In general, Actions are modified using the set() method, e.g.:

[18]:

preprocessor.actions.to_spec.set(window_samples=256)


We can check that the values were actually changed by printing the action’s params. This is not guaranteed to print the defaults, but will definitely print the parameters that have actively changed.

[19]:

print(preprocessor.actions.load_audio.params)

{'sample_rate': 22050}


### Turn individual Actions on or off¶

Each Action has .on() and .off() methods which toggle a bypass of the Action in the pipeline. Note that the Actions will still remain in the same order in the pipeline, and can be turned back on again if desired.

[20]:

#initialize a preprocessor that includes augmentation
preprocessor = CnnPreprocessor(labels)
preprocessor.pipeline

[20]:

[<opensoundscape.preprocess.actions.AudioLoader at 0x7fd9969923a0>,
<opensoundscape.preprocess.actions.AudioTrimmer at 0x7fd996992430>,
<opensoundscape.preprocess.actions.AudioToSpectrogram at 0x7fd996992490>,
<opensoundscape.preprocess.actions.SpectrogramBandpass at 0x7fd9969924f0>,
<opensoundscape.preprocess.actions.SpecToImg at 0x7fd996992550>,
<opensoundscape.preprocess.actions.BaseAction at 0x7fd996992670>,
<opensoundscape.preprocess.actions.TorchColorJitter at 0x7fd9969926d0>,
<opensoundscape.preprocess.actions.ImgToTensor at 0x7fd9969925b0>,
<opensoundscape.preprocess.actions.TensorNormalize at 0x7fd996992640>,
<opensoundscape.preprocess.actions.TorchRandomAffine at 0x7fd996992730>]

[21]:

#turn off augmentations other than noise
preprocessor.actions.color_jitter.off()
preprocessor.pipeline

[21]:

[<opensoundscape.preprocess.actions.AudioLoader at 0x7fd9969923a0>,
<opensoundscape.preprocess.actions.AudioTrimmer at 0x7fd996992430>,
<opensoundscape.preprocess.actions.AudioToSpectrogram at 0x7fd996992490>,
<opensoundscape.preprocess.actions.SpectrogramBandpass at 0x7fd9969924f0>,
<opensoundscape.preprocess.actions.SpecToImg at 0x7fd996992550>,
<opensoundscape.preprocess.actions.BaseAction at 0x7fd996992670>,
<opensoundscape.preprocess.actions.TorchColorJitter at 0x7fd9969926d0>,
<opensoundscape.preprocess.actions.ImgToTensor at 0x7fd9969925b0>,
<opensoundscape.preprocess.actions.TensorNormalize at 0x7fd996992640>,
<opensoundscape.preprocess.actions.TorchRandomAffine at 0x7fd996992730>]

[22]:

print('random affine on')
show_tensor(preprocessor[0])

print('random affine off')
preprocessor.actions.random_affine.off()
show_tensor(preprocessor[0])

random affine on

random affine off


To view whether an individual Action in a pipeline is on or off, inspect its bypass attribute:

[23]:

# The AudioLoader Action that is still on
preprocessor.pipeline[0].bypass

[23]:

False

[24]:

# The TorchRandomAffine Action that we turned off
preprocessor.pipeline[-1].bypass

[24]:

True


## Modifying the pipeline¶

Sometimes, you may want to change the order or composition of the Preprocessor’s pipeline. You can simply overwrite the .pipeline attribute, as long as the new pipeline is still a list of Action instances from the preprocessor’s .actions ActionContainer.

### Example: return Spectrogram instead of Tensor¶

Here’s an example where we replace the pipeline with one that just loads audio and converts it to a Spectrogram, returning a Spectrogram instead of a Tensor:

[25]:

#initialize a preprocessor
preprocessor = AudioToSpectrogramPreprocessor(labels)
print('original pipeline:')
[print(p) for p in preprocessor.pipeline]

#overwrite the pipeline with a slice of the original pipeline
print('\nnew pipeline:')
preprocessor.pipeline = preprocessor.pipeline[0:3]

[print(p) for p in preprocessor.pipeline]

print('\nwe now have a preprocessor that returns Spectrograms instead of Tensors:')
print(type(preprocessor[0]['X']))
preprocessor[0]['X'].plot()

original pipeline:
<opensoundscape.preprocess.actions.AudioTrimmer object at 0x7fd9947e1eb0>
<opensoundscape.preprocess.actions.AudioToSpectrogram object at 0x7fd9947e1b50>
<opensoundscape.preprocess.actions.SpectrogramBandpass object at 0x7fd9947e1b80>
<opensoundscape.preprocess.actions.SpecToImg object at 0x7fd9947e1e50>
<opensoundscape.preprocess.actions.ImgToTensor object at 0x7fd996b40fd0>
<opensoundscape.preprocess.actions.TensorNormalize object at 0x7fd996b2ed30>

new pipeline:
<opensoundscape.preprocess.actions.AudioTrimmer object at 0x7fd9947e1eb0>
<opensoundscape.preprocess.actions.AudioToSpectrogram object at 0x7fd9947e1b50>

we now have a preprocessor that returns Spectrograms instead of Tensors:
<class 'opensoundscape.spectrogram.Spectrogram'>


### Example: custom augmentation pipeline¶

Here’s an example where we add a new Action to the Action container, then overwrite the preprocessing pipeline with one that includes our new action.

Note that each Action requires a specific input Type and may return that same Type or a different Type. So you’ll need to be careful about the order of your Actions in your pipeline

This custom pipeline will first performs a Gaussian noise augmentation, then a random affine, then our second noise augmentation (add_noise_2)

[26]:

#initialize a preprocessor
preprocessor = CnnPreprocessor(labels)

#add a new Action to the Action container

#overwrite the pipeline with a list of Actions from .actions
preprocessor.pipeline = [
preprocessor.actions.trim_audio,
preprocessor.actions.to_spec,
preprocessor.actions.bandpass,
preprocessor.actions.to_img,
preprocessor.actions.to_tensor,
preprocessor.actions.normalize,
preprocessor.actions.random_affine,
]

show_tensor(preprocessor[0])


### Use an Action multiple times in a pipeline¶

If an Action is present multiple times in a pipeline (e.g. multiple overlays), changing the parameters of the Action at one point in the pipeline will change it at all points in the pipeline. For instance, create a pipeline with multiple “add noise” steps:

[27]:

#initialize a preprocessor that includes augmentation
preprocessor = CnnPreprocessor(labels)

# Insert another instance of the "add_noise" action into the pipeline
preprocessor.pipeline

[27]:

[<opensoundscape.preprocess.actions.AudioLoader at 0x7fd98e583130>,
<opensoundscape.preprocess.actions.AudioTrimmer at 0x7fd996990040>,
<opensoundscape.preprocess.actions.AudioToSpectrogram at 0x7fd996990f10>,
<opensoundscape.preprocess.actions.SpectrogramBandpass at 0x7fd996990790>,
<opensoundscape.preprocess.actions.SpecToImg at 0x7fd996990730>,
<opensoundscape.preprocess.actions.BaseAction at 0x7fd996ab3eb0>,
<opensoundscape.preprocess.actions.TorchColorJitter at 0x7fd98fdca8b0>,
<opensoundscape.preprocess.actions.ImgToTensor at 0x7fd996990ca0>,
<opensoundscape.preprocess.actions.TensorNormalize at 0x7fd996ae6880>,
<opensoundscape.preprocess.actions.TorchRandomAffine at 0x7fd996ae0460>]


Note that changing the parameter of one of the add_noise steps changes the parameters for both of them.

[28]:

# Print the parameters of both of the TensorAddNoise Actions in the pipeline
print("Parameters of TensorAddNoise actions before changing:")
[print(f"Params of {p}:", p.params) for p in preprocessor.pipeline[-4:-2]]

# Change the parameters of one of the add noise steps
preprocessor.pipeline[-4].set(std=0.01)

# The modification above is the same as:

# See that the parameters for both steps are changed
print("\nParameters of TensorAddNoise actions after changing:")
[print(f"Params of {p}:", p.params) for p in preprocessor.pipeline[-4:-2]];

Parameters of TensorAddNoise actions before changing:
Params of <opensoundscape.preprocess.actions.TensorAddNoise object at 0x7fd996ae0430>: {'std': 0.005}
Params of <opensoundscape.preprocess.actions.TensorAddNoise object at 0x7fd996ae0430>: {'std': 0.005}

Parameters of TensorAddNoise actions after changing:
Params of <opensoundscape.preprocess.actions.TensorAddNoise object at 0x7fd996ae0430>: {'std': 0.01}
Params of <opensoundscape.preprocess.actions.TensorAddNoise object at 0x7fd996ae0430>: {'std': 0.01}


To modify the parameters of Actions individually, add them as separate Actions in the pipeline by adding a new named action to the action container.

[29]:

from opensoundscape.preprocess.actions import TensorAddNoise

# Add a new possible action to the ActionContainer

# Replace one of the old actions in the pipeline with the new one with different parameters
preprocessor.pipeline[-3] = preprocessor.actions.my_new_action


Now notice that the two instances of the TensorAddNoise action can have different parameters.

[30]:

[print(f"Params of {p}:", p.params) for p in preprocessor.pipeline[-4:-2]];

Params of <opensoundscape.preprocess.actions.TensorAddNoise object at 0x7fd996ae0430>: {'std': 0.01}
Params of <opensoundscape.preprocess.actions.TensorAddNoise object at 0x7fd98e598a60>: {'std': 0.005}


## Customizing AudioToSpectrogramPreprocessor¶

Below are various examples of how to modify parameters of the Actions of the AudioToSpectrogramPreprocessor class, including the AudioLoader, AudioToSpectrogram, and SpectrogramBandpass actions.

### Modify the sample rate¶

Re-sample all loaded audio to a specified rate during the load_audio action

[31]:

preprocessor = AudioToSpectrogramPreprocessor(labels)



### Modify spectrogram window length and overlap¶

(see Spectrogram.from_audio() for detailed documentation)

[32]:

print('default parameters:')
show_tensor(preprocessor[0])

print('high time resolution, low frequency resolution:')
preprocessor.actions.to_spec.set(window_samples=64,overlap_samples=32)

show_tensor(preprocessor[0])

default parameters:

high time resolution, low frequency resolution:


### Bandpass spectrograms¶

Trim spectrograms to a specified frequency range:

[33]:

preprocessor = AudioToSpectrogramPreprocessor(labels)

print('default parameters:')
show_tensor(preprocessor[0])

print('bandpassed to 2-4 kHz:')
preprocessor.actions.bandpass.set(min_f=2000,max_f=4000)
preprocessor.actions.bandpass.on()
show_tensor(preprocessor[0])

default parameters:

bandpassed to 2-4 kHz:


### Change the output image¶

Change the shape of the output image:

[34]:

preprocessor = AudioToSpectrogramPreprocessor(labels)
preprocessor.actions.to_img.set(shape=[1000,500])
show_tensor(preprocessor[0])


## Customizing CnnPreprocessor¶

The CnnPreprocessor class can be used to perform both audio and spectrogram transformation as well as augmentation for training with CNNs.

This section describes: * A special property of CnnPreprocessor which allows you to turn all augmentations on or off * Examples of modifying augmentation parameters for standard augmentations * Detailed descriptions of the useful “Overlay” augmentation

### Turn all augmentation on or off¶

With CnnPreprocessor, we can easily choose between a pipeline that contains augmentations and a pipeline with no augmentations using the shortcuts augmentation_off() and augmentation_on() methods. Using these methods will overwrite any changes made to the pipeline, so apply them first before further customizing an instance of CnnPreprocessor.

[35]:

preprocessor = CnnPreprocessor(labels)
preprocessor.augmentation_off()
preprocessor.pipeline

[35]:

[<opensoundscape.preprocess.actions.AudioLoader at 0x7fd978c0af10>,
<opensoundscape.preprocess.actions.AudioTrimmer at 0x7fd978c0afd0>,
<opensoundscape.preprocess.actions.AudioToSpectrogram at 0x7fd978f9eb80>,
<opensoundscape.preprocess.actions.SpectrogramBandpass at 0x7fd97974c760>,
<opensoundscape.preprocess.actions.SpecToImg at 0x7fd97974c6a0>,
<opensoundscape.preprocess.actions.ImgToTensor at 0x7fd978ce20a0>,
<opensoundscape.preprocess.actions.TensorNormalize at 0x7fd978c217c0>]

[36]:

preprocessor.augmentation_on()
preprocessor.pipeline

[36]:

[<opensoundscape.preprocess.actions.AudioLoader at 0x7fd978c0af10>,
<opensoundscape.preprocess.actions.AudioTrimmer at 0x7fd978c0afd0>,
<opensoundscape.preprocess.actions.AudioToSpectrogram at 0x7fd978f9eb80>,
<opensoundscape.preprocess.actions.SpectrogramBandpass at 0x7fd97974c760>,
<opensoundscape.preprocess.actions.SpecToImg at 0x7fd97974c6a0>,
<opensoundscape.preprocess.actions.BaseAction at 0x7fd978c21670>,
<opensoundscape.preprocess.actions.TorchColorJitter at 0x7fd978c217f0>,
<opensoundscape.preprocess.actions.ImgToTensor at 0x7fd978ce20a0>,
<opensoundscape.preprocess.actions.TensorNormalize at 0x7fd978c217c0>,
<opensoundscape.preprocess.actions.TorchRandomAffine at 0x7fd978c14310>]


### Modify augmentation parameters¶

CnnPreprocessor includes several augmentations with customizable parameters. Here we provide a couple of illustrative examples - see any action’s documentation for details on how to use its parameters.

[37]:

#initialize a preprocessor
preprocessor = CnnPreprocessor(labels)

#turn off augmentations other than overlay
preprocessor.actions.color_jitter.off()
preprocessor.actions.random_affine.off()
preprocessor.actions.random_affine.off()

# allow up to 20 horizontal masks, each spanning up to 0.1x the height of the image.
show_tensor(preprocessor[0])

[38]:

#turn off frequency mask and turn on gaussian noise

# increase the intensity of gaussian noise added to the image
show_tensor(preprocessor[0])


### Overlay augmentation¶

Overlay is a powerful Action that allows additional samples to be overlayed or blended with the original sample.

The additional samples are chosen from the overlay_df that is provided to the preprocessor when it is initialized. The index of the overlay_df must be paths to audio files. The dataframe can be simply an index containing audio files with no other columns, or it can have the same columns as the sample dataframe for the preprocessor.

Samples for overlays are chosen based on their class labels, according to the parameter overlay_class:

• None - Randomly select any file from overlay_df
• "different" - Select a random file from overlay_df containing none of the classes this file contains
• specific class name - always choose files from this class

Samples can be drawn from dataframes in a few general ways (each is demonstrated below):

1. Using a separate dataframe where any sample can be overlayed (overlay_class=None)
2. Using the same dataframe as training, where the overlay class is “different,” i.e., does not contain overlapping labels with the original sample
3. Using the same dataframe as training, where samples from a specific class are used for overlays

By default, the overlay Action does not change the labels of the sample it modifies. However, if you wish to add the labels from overlayed samples to the original sample’s labels, you can set update_labels=True (see example below).

[39]:

#initialize a preprocessor and provide a dataframe with samples to use as overlays
preprocessor = CnnPreprocessor(labels, overlay_df=labels)

#turn off augmentations other than overlay
preprocessor.actions.color_jitter.off()
preprocessor.actions.random_affine.off()
preprocessor.actions.random_affine.off()


#### Modify overlay_weight¶

We’ll first overlay a random sample with 30% of the final mix coming from the overlayed sample (70% coming from the original) by using overlay_weight=0.3.

To demonstrate this, let’s show what happens if we overlay samples from the “negative” class, resulting in the final sample having a higher or lower signal-to-noise ratio. By default, the overlay Action chooses a random file from the overlay dataframe. Instead, choose a sample from the class called "negative" using the overlay_class parameter.

[40]:

preprocessor.actions.overlay.set(
overlay_class='negative',
overlay_weight=0.3
)
show_tensor(preprocessor[0])


Now use overlay_weight=0.8 to increase the contribution of the overlayed sample (80%) compared to the original sample (20%).

[41]:

preprocessor.actions.overlay.set(overlay_weight=0.8)
show_tensor(preprocessor[0])


#### Overlay samples from a specific class¶

As demonstrated above, you can choose a specific class to choose samples from. Here, instead, we choose samples from the “positive” class.

[42]:

preprocessor.actions.overlay.set(
overlay_class='positive',
overlay_weight=0.4
)
show_tensor(preprocessor[0])


#### Overlaying samples from any class¶

By default, or by specifying overlay_class=None, the overlay sample is chosen randomly from the overlay_df with no restrictions

[43]:

preprocessor.actions.overlay.set(overlay_class=None)
show_tensor(preprocessor[0])


#### Overlaying samples from a “different” class¶

The 'different' option for overlay_class chooses a sample to overlay that has non-overlapping labels with the original sample.

In the case of this example, this has the same effect as drawing samples from the "negative" class a demonstrated above. In multi-class examples, this would draw from any of the samples not labeled with the class(es) of the original sample.

We’ll again use overlay_weight=0.8 to exaggerate the importance of the overlayed sample (80%) compared to the original sample (20%).

[44]:

preprocessor.actions.overlay.set(update_labels=False,overlay_class='different',overlay_weight=0.8)
show_tensor(preprocessor[0])


#### Updating labels¶

By default, the overlay Action does not change the labels of the sample it modifies.

For instance, if the overlayed sample has labels [1,0] and the original sample has labels [0,1], the default behavior will return a sample with labels [0,1] not [1,1].

If you wish to add the labels from overlayed samples to the original sample’s labels, you can set update_labels=True.

[45]:

print('default: labels do not update')
preprocessor.actions.overlay.set(update_labels=False,overlay_class='different')
print(f"\t resulting labels: {preprocessor[0]['y'].numpy()}")

print('Using update_labels=True')
preprocessor.actions.overlay.set(update_labels=True,overlay_class='different')
print(f"\t resulting labels: {preprocessor[0]['y'].numpy()}")


default: labels do not update
resulting labels: [0 1]
Using update_labels=True
resulting labels: [1 1]


This example is a single-target problem: the two classes represent “woodcock absent” and “woodcock present.” Because the labels are mutually exclusive, labels [1,1] do not make sense. So, for this single-target problem, we would not want to use update_labels=True, and it would probably make most sense to only overlay absent recordings, e.g., overlay_class='absent'.

## Creating a new Preprocessor class¶

If you have a specific augmentation routine you want to perform, you may want to create your own Preprocessor class rather than modifying an existing one.

Your subclass might add a different set of Actions, define a different pipeline, or even override the __getitem__ method of BasePreprocessor.

Here’s an example of a customized preprocessor that subclasses AudioToSpectrogramPreprocessor and creates a pipeline that depends on the magic_parameter input.

[46]:

from opensoundscape.preprocess.actions import TensorAddNoise
class MyPreprocessor(AudioToSpectrogramPreprocessor):
"""Child of AudioToSpectrogramPreprocessor with weird augmentation routine"""

def __init__(
self,
df,
magic_parameter,
audio_length=None,
return_labels=True,
out_shape=[224, 224],
):

super(MyPreprocessor, self).__init__(
df,
audio_length=audio_length,
out_shape=out_shape,
return_labels=return_labels,
)

self.pipeline = [
self.actions.trim_audio,
self.actions.to_spec,
self.actions.bandpass,
self.actions.to_img,
self.actions.to_tensor,
self.actions.normalize,
] + [self.actions.add_noise for i in range(magic_parameter)]

[47]:

p = MyPreprocessor(labels, magic_parameter=2)
show_tensor(p[0])

[48]:

p = MyPreprocessor(labels, magic_parameter=3)
show_tensor(p[0])


## Defining new Actions¶

You can define new Actions to include in your Preprocessor pipeline. They should subclass opensoundscape.actions.BaseAction.

You will need to define a .go() method for all actions. If you provide default parameter values, you will also need to define an __init__() method.

### Without default parameters¶

If the Action does not need to have default arguments, it’s trivial to create by defining a go() method.

[49]:

from opensoundscape.preprocess.actions import BaseAction
class SquareSamples(BaseAction):
"""Square values of every audio sample

Audio in, Audio out
"""
def go(self, audio):
samples = np.array(audio.samples)**2
return Audio(samples, audio.sample_rate)


Test it out:

[50]:

from opensoundscape.audio import Audio

square_action = SquareSamples(threshold=0.2)

audio = Audio.from_file('./woodcock_labeled_data/01c5d0c90bd4652f308fd9c73feb1bf5.wav')
print(np.mean(audio.samples))
audio = square_action.go(audio)
print(np.mean(audio.samples))

0.012753859
0.008748752


### With default parameters¶

Here we overwrite the __init__ method to provide a default parameter value. The Action below removes low-amplitude audio samples, acting somewhat as a “denoiser”.

[51]:

class AudioGate(BaseAction):
"""Replace audio samples below a threshold with 0

Audio in, Audio out

Args:
threshold: sample values below this will become 0
"""

def __init__(self, **kwargs):
super(AudioGate, self).__init__(**kwargs)

# default parameters
self.params["threshold"] = 0.1

# update/add any parameters passed to __init__
self.params.update(kwargs)

def go(self, audio):
samples = np.array([0 if np.abs(s)<self.params["threshold"] else s for s in audio.samples])
return Audio(samples, audio.sample_rate)


Test it out:

[52]:

gate_action = AudioGate(threshold=0.2)

print('histogram of samples')
audio = Audio.from_file('./woodcock_labeled_data/01c5d0c90bd4652f308fd9c73feb1bf5.wav')
_ = plt.hist(audio.samples,bins=100)
plt.semilogy()
plt.show()

print('histogram of samples after audio gate')
audio_gated = gate_action.go(audio)
_ = plt.hist(audio_gated.samples,bins=100)
plt.semilogy()

histogram of samples

histogram of samples after audio gate

[52]:

[]


Clean up files created during this tutorial:

[54]:

folder = Path('./woodcock_labeled_data')