Preprocessing

Preprocessing

class opensoundscape.preprocess.preprocessors.AudioLoadingPreprocessor(df, return_labels=True, audio_length=None)

creates Audio objects from file paths

Parameters:
  • df – dataframe of samples. df must have audio paths in the index. If df has labels, the class names should be the columns, and the values of each row should be 0 or 1. If data does not have labels, df will have no columns
  • return_labels – if True, __getitem__ returns {“X”:batch_tensors,”y”:labels} if False, __getitem__ returns {“X”:batch_tensors} [default: True]
  • audio_length – length in seconds of audio to return - None: do not trim the original audio - seconds (float): trim longer audio to this length. Shorter audio input will raise a ValueError.
class opensoundscape.preprocess.preprocessors.AudioToSpectrogramPreprocessor(df, audio_length=None, out_shape=[224, 224], return_labels=True)

loads audio paths, creates spectrogram, returns tensor

by default, resamples audio to sr=22050 can change with .actions.load_audio.set(sample_rate=sr)

Parameters:
  • df – dataframe of samples. df must have audio paths in the index. If df has labels, the class names should be the columns, and the values of each row should be 0 or 1. If data does not have labels, df will have no columns
  • audio_length – length in seconds of audio clips [default: None] If provided, longer clips trimmed to this length. By default, shorter clips will not be extended (modify actions.AudioTrimmer to change behavior).
  • out_shape – output shape of tensor in pixels [default: [224,224]]
  • return_labels – if True, the __getitem__ method will return {X:sample,y:labels} If False, the __getitem__ method will return {X:sample} If df has no labels (no columns), use return_labels=False [default: True]
class opensoundscape.preprocess.preprocessors.BasePreprocessor(df, return_labels=True)

Base class for Preprocessing pipelines (use in place of torch Dataset)

Custom Preprocessor classes should subclass this class or its children

Parameters:
  • df – dataframe of samples. df must have audio paths in the index. If df has labels, the class names should be the columns, and the values of each row should be 0 or 1. If data does not have labels, df will have no columns
  • return_labels – if True, the __getitem__ method will return {X:sample,y:labels} If False, the __getitem__ method will return {X:sample} If df has no labels (no columns), use return_labels=False [default: True]
Raises:

PreprocessingError if exception is raised during __getitem__

class_counts_cal()

count number of each label

head(n=5)

out-of-place copy of first n samples

performs df.head(n) on self.df

Parameters:
  • n – number of first samples to return, see pandas.DataFrame.head()
  • [default – 5]
Returns:

a new dataset object

sample(**kwargs)

out-of-place random sample

creates copy of object with n rows randomly sampled from dataframe

Args: see pandas.DataFrame.sample()

Returns:a new dataset object
class opensoundscape.preprocess.preprocessors.CnnPreprocessor(df, audio_length=None, return_labels=True, debug=None, overlay_df=None, out_shape=[224, 224])

Child of AudioToSpectrogramPreprocessor with full augmentation pipeline

loads audio, creates spectrogram, performs augmentations, returns tensor

by default, resamples audio to sr=22050 can change with .actions.load_audio.set(sample_rate=sr)

Parameters:
  • df – dataframe of samples. df must have audio paths in the index. If df has labels, the class names should be the columns, and the values of each row should be 0 or 1. If data does not have labels, df will have no columns
  • audio_length – length in seconds of audio clips [default: None] If provided, longer clips trimmed to this length. By default, shorter clips will not be extended (modify actions.AudioTrimmer to change behavior).
  • out_shape – output shape of tensor in pixels [default: [224,224]]
  • return_labels – if True, the __getitem__ method will return {X:sample,y:labels} If False, the __getitem__ method will return {X:sample} If df has no labels (no columns), use return_labels=False [default: True]
  • debug – If a path is provided, generated samples (after all augmentation) will be saved to the path as an image. This is useful for checking that the sample provided to the model matches expectations. [default: None]
augmentation_off()

use pipeline that skips all augmentations

augmentation_on()

use pipeline containing all actions including augmentations

exception opensoundscape.preprocess.utils.PreprocessingError

Custom exception indicating that a Preprocessor pipeline failed

Preprocessing Actions

Actions for augmentation and preprocessing pipelines

This module contains Action classes which act as the elements in Preprocessor pipelines. Action classes have go(), on(), off(), and set() methods. They take a single sample of a specific type and return the transformed or augmented sample, which may or may not be the same type as the original.

See the preprocessor module and Preprocessing tutorial for details on how to use and create your own actions.

class opensoundscape.preprocess.actions.ActionContainer

this is a container object which holds instances of Action child-classes

the Actions it contains each have .go(), .on(), .off(), .set(), .get()

The actions are un-ordered and may not all be used. In preprocessor objects such as AudioToSpectrogramPreprocessor, Actions from the action container are listed in a pipeline(list), which defines their order of use.

To add actions to the container: action_container.loader = AudioLoader() To set parameters of actions: action_container.loader.set(param=value,…)

Methods: list_actions()

class opensoundscape.preprocess.actions.AudioLoader(**kwargs)

Action child class for Audio.from_file() (path -> Audio)

Loads an audio file, see Audio.from_file() for documentation.

Parameters:
  • sample_rate (int, None) – resample audio with value and resample_type, if None use source sample_rate (default: None)
  • resample_type – method used to resample_type (default: kaiser_fast)
  • max_duration – the maximum length of an input file, None is no maximum (default: None)

Note: default sample_rate=None means use file’s sample rate, don’t resample

class opensoundscape.preprocess.actions.AudioToSpectrogram(**kwargs)

Action child class for Audio.from_file() (Audio -> Spectrogram)

see spectrogram.Spectrogram.from_audio for documentation

Parameters:
  • window_type="hann" – see scipy.signal.spectrogram docs for description of window parameter
  • window_samples=512 – number of audio samples per spectrogram window (pixel)
  • overlap_samples=256 – number of samples shared by consecutive windows
  • = (decibel_limits) –
  • the dB values to (limit) –
  • values set to min, higher values set to max) ((lower) –
class opensoundscape.preprocess.actions.AudioTrimmer(**kwargs)

Action child class for trimming audio (Audio -> Audio)

Trims an audio file to desired length Allows audio to be trimmed from start or from a random time Optionally extends audio shorter than clip_length with silence

Parameters:
  • audio_length – desired final length (sec); if None, no trim is performed
  • extend – if True, clips shorter than audio_length are extended with silence to required length
  • random_trim – if True, a random segment of length audio_length is chosen from the input audio. If False, the file is trimmed from 0 seconds to audio_length seconds.
class opensoundscape.preprocess.actions.BaseAction(**kwargs)

Parent class for all Actions (used in Preprocessor pipelines)

New actions should subclass this class.

Subclasses should set self.requires_labels = True if go() expects (X,y) instead of (X). y is a row of a dataframe (a pd.Series) with index (.name) = original file path, columns=class names, values=labels (0,1). X is the sample, and can be of various types (path, Audio, Spectrogram, Tensor, etc). See ImgOverlay for an example of an Action that uses labels.

class opensoundscape.preprocess.actions.FrequencyMask(**kwargs)

add random horizontal bars over image

Parameters:
  • max_masks – max number of horizontal bars [default: 3]
  • max_width – maximum size of horizontal bars as fraction of image height
go(x)

torch Tensor in, torch Tensor out

class opensoundscape.preprocess.actions.ImgOverlay(overlay_df, audio_length, loader_pipeline, update_labels, **kwargs)

iteratively overlay images on top of eachother

Overlays images from overlay_df on top of the sample with probability overlay_prob until stopping condition. If necessary, trims overlay audio to the length of the input audio. Overlays the images on top of each other with a weight.

Overlays can be used in a few general ways:
  1. a separate df where any file can be overlayed (overlay_class=None)
  2. same df as training, where the overlay class is “different” ie,
    does not contain overlapping labels with the original sample
  3. same df as training, where samples from a specific class are used
    for overlays
Parameters:
  • overlay_df – a labels dataframe with audio files as the index and classes as columns
  • audio_length – length in seconds of original audio sample
  • loader_pipeline – the preprocessing pipeline to load audio -> spec
  • update_labels – if True, add overlayed sample’s labels to original sample
  • overlay_class

    how to choose files from overlay_df to overlay Options [default: “different”]: None - Randomly select any file from overlay_df “different” - Select a random file from overlay_df containing none

    of the classes this file contains

    specific class name - always choose files from this class

  • overlay_prob – the probability of applying each subsequent overlay
  • max_overlay_num

    the maximum number of samples to overlay on original - for example, if overlay_prob = 0.5 and max_overlay_num=2,

    1/2 of images will recieve 1 overlay and 1/4 will recieve an additional second overlay
  • overlay_weight – can be a float between 0-1 or range of floats (chooses randomly from within range) such as [0.1,0.7]. An overlay_weight <0.5 means more emphasis on original image.
go(x, x_labels)

Overlay images from overlay_df

class opensoundscape.preprocess.actions.ImgToTensor(**kwargs)

Convert PIL image to RGB Tensor (PIL.Image -> Tensor)

convert PIL.Image w/range [0,255] to torch Tensor w/range [0,1] converts image to RGB (3 channels)

class opensoundscape.preprocess.actions.ImgToTensorGrayscale(**kwargs)

Convert PIL image to greyscale Tensor (PIL.Image -> Tensor)

convert PIL.Image w/range [0,255] to torch Tensor w/range [0,1] converts image to grayscale (1 channel)

class opensoundscape.preprocess.actions.SaveTensorToDisk(save_path, **kwargs)

save a torch Tensor to disk (Tensor -> Tensor)

Requires x_labels because the index of the label-row (.name) gives the original file name for this sample.

Uses torchvision.utils.save_image. Creates save_path dir if it doesn’t exist

Parameters:save_path – a directory where tensor will be saved
go(x, x_labels)

we require x_labels because the .name gives origin file name

class opensoundscape.preprocess.actions.SpecToImg(**kwargs)

Action class to transform Spectrogram to PIL image

(Spectrogram -> PIL.Image)

Parameters:
  • destination – a file path (string)
  • shape=None – tuple of image dimensions for 1 channel, eg (224,224)
  • mode="RGB" – RGB for 3-channel color or “L” for 1-channel grayscale
  • spec_range=[-100,-20] – the lowest and highest possible values in the spectrogram
class opensoundscape.preprocess.actions.SpectrogramBandpass(**kwargs)

Action class for Spectrogram.bandpass() (Spectrogram -> Spectrogram)

see opensoundscape.spectrogram.Spectrogram.bandpass() for documentation

To bandpass the spectrogram from 1kHz to 5Khz: action = SpectrogramBandpass(1000,5000)

Parameters:
  • min_f – low frequency in Hz for bandpass
  • max_f – high frequency in Hz for bandpass
class opensoundscape.preprocess.actions.TensorAddNoise(**kwargs)

Add gaussian noise to sample (Tensor -> Tensor)

Parameters:std – standard deviation for Gaussian noise [default: 1]

Note: be aware that scaling before/after this action will change the effect of a fixed stdev Gaussian noise

class opensoundscape.preprocess.actions.TensorAugment(**kwargs)

combination of 3 augmentations with hard-coded parameters

time warp, time mask, and frequency mask

use (bool) time_warp, time_mask, freq_mask to turn each on/off

go(x)

torch Tensor in, torch Tensor out

class opensoundscape.preprocess.actions.TensorNormalize(**kwargs)

torchvision.transforms.Normalize (WARNING: FIXED shift and scale)

(Tensor->Tensor)

WARNING: This does not perform per-image normalization. Instead, it takes as arguments a fixed u and s, ie for the entire dataset, and performs X=(X-u)/s.

Params:
mean=0.5 std=0.5
class opensoundscape.preprocess.actions.TimeMask(**kwargs)

add random vertical bars over image (Tensor -> Tensor)

Parameters:
  • max_masks – maximum number of bars [default: 3]
  • max_width – maximum width of horizontal bars as fraction of image width
  • [default – 0.2]
class opensoundscape.preprocess.actions.TimeWarp(**kwargs)

Time warp is an experimental augmentation that creates a tilted image.

Parameters:warp_amount – use higher values for more skew and offset (experimental)
class opensoundscape.preprocess.actions.TorchColorJitter(**kwargs)

Action class for torchvision.transforms.ColorJitter

(Tensor -> Tensor) or (PIL Img -> PIL Img)

Parameters:
  • brightness=0.3
  • contrast=0.3
  • saturation=0.3
  • hue=0
class opensoundscape.preprocess.actions.TorchRandomAffine(**kwargs)

Action class for torchvision.transforms.RandomAffine

(Tensor -> Tensor) or (PIL Img -> PIL Img)

Parameters:
  • = 0 (degrees) –
  • = (fill) –
  • =

Note: If applying per-image normalization, we recommend applying RandomAffine after image normalization. In this case, an intermediate gray value is ~0. If normalization is applied after RandomAffine on a PIL image, use an intermediate fill color such as (122,122,122).

Image Augmentation

Transforms and augmentations for PIL.Images

opensoundscape.preprocess.img_augment.time_split(img, seed=None)

Given a PIL.Image, split into left/right parts and swap

Randomly chooses the slicing location For example, if h chosen

abcdefghijklmnop
^

hijklmnop + abcdefg

Parameters:img – A PIL.Image
Returns:A PIL.Image

Tensor Augmentation

Augmentations and transforms for torch.Tensors

These functions were implemented for PyTorch in: https://github.com/zcaceres/spec_augment The original paper is available on https://arxiv.org/abs/1904.08779

opensoundscape.preprocess.tensor_augment.freq_mask(spec, F=30, max_masks=3, replace_with_zero=False)

draws horizontal bars over the image

F:maximum frequency-width of bars in pixels

max_masks: maximum number of bars to draw

replace_with_zero: if True, bars are 0s, otherwise, mean img value

opensoundscape.preprocess.tensor_augment.time_mask(spec, T=40, max_masks=3, replace_with_zero=False)

draws vertical bars over the image

T:maximum time-width of bars in pixels

max_masks: maximum number of bars to draw

replace_with_zero: if True, bars are 0s, otherwise, mean img value

opensoundscape.preprocess.tensor_augment.time_warp(spec, W=5)

apply time stretch and shearing to spectrogram

fills empty space on right side with horizontal bars

W controls amount of warping. Random with occasional large warp.