opensoundscape.preprocess package

Submodules

opensoundscape.preprocess.action_functions module

preprocessing and augmentation functions

these can be passed to the Action class (action_fn=…) to create a preprocessing action that applies the function to a sample

opensoundscape.preprocess.action_functions.adaptive_random_gain(audio, gain_range=(-30, 0), min_output_level=-40, clip_range=(-1, 1))[source]

apply gain while maintaining a minimum resulting dBFS level

applies a randomly selected gain level to an Audio object, while ensuring that the resulting audio has at least min_output_level dBFS (while respecting the maximum gain allowed in the gain_range argument)

Parameters:

audio – an Audio object
gain_range – (min,max) decibels of gain to apply - dB gain applied is chosen from a uniform random distribution in this range
min_output_level – minimum dBFS level of resulting audio - if audio.dBFS + gain < min_output_level, gain_range is restricted to ensure resulting audio is at least min_output_level dBFS

opensoundscape.preprocess.action_functions.adaptive_random_noise(audio, snr_range=(-20, 0), input_gain=0, color='white')[source]

apply random noise, selecting from a signal to noise ratio range

Parameters:

audio – an Audio object
snr_range – (min,max) decibels of signal to noise ratio - SNR is defined here as signal_dB - noise_dBFS - SNR is chosen from a uniform random distribution in this range
input_gain – dB (decibels) gain to apply to the incoming Audio before mixing with noise [default: 0 dB]
color – color of noise to add (see Audio.noise() color arg) options: “white”, “pink”, “brownian”, “brown”, “violet”, “blue”

Returns: Audio object with noise added

opensoundscape.preprocess.action_functions.audio_add_noise(audio, noise_dB=-30, signal_dB=0, color='white')[source]

Generates noise and adds to audio object

Parameters:

audio – an Audio object
noise_dB – number or range: dBFS of noise signal generated - if number, crates noise with dB dBFS level - if (min,max) tuple, chooses noise dBFS randomly from range with a uniform distribution
signal_dB – dB (decibels) gain to apply to the incoming Audio before mixing with noise [default: -3 dB] - like noise_dB, can specify (min,max) tuple to use random uniform choice in range

Returns: Audio object with noise added

opensoundscape.preprocess.action_functions.audio_random_gain(audio, dB_range=(-30, 0), clip_range=(-1, 1))[source]

Applies a randomly selected gain level to an Audio object

Gain is selected from a uniform distribution in the range dB_range

Parameters:

audio – an Audio object
dB_range – (min,max) decibels of gain to apply - dB gain applied is chosen from a uniform random distribution in this range

Returns: Audio object with gain applied

opensoundscape.preprocess.action_functions.audio_time_mask(audio, max_masks=10, max_width=0.02, noise_to_signal_dB=10, noise_color='white')[source]

randomly replace time slices with noise

Adaptively selects noise level relative to the signal level of the input audio

Parameters:

audio – input Audio object
max_masks – maximum number of white noise time masks [default: 10]
max_width – maximum size of bars as fraction of sample width [default: 0.02]
noise_to_signal_dB – desired noise:signal ratio in dB. Positive values mean noise is louder than signal, negative values mean noise is quieter. Signal level is calculated as audio.dBFS ie the temporal average level.
noise_color – see Audio.noise() dBFS and color args

Returns:

augmented Audio object

opensoundscape.preprocess.action_functions.frequency_mask(tensor, max_masks=3, max_width=0.05)[source]

add random horizontal bars over Tensor

Parameters:

tensor – input Torch.tensor sample
max_masks – max number of horizontal bars [default: 3]
max_width – maximum size of horizontal bars as fraction of sample height

Returns:

augmented tensor

opensoundscape.preprocess.action_functions.image_to_tensor(img, greyscale=False)[source]

Convert PIL image to RGB or greyscale Tensor (PIL.Image -> Tensor)

convert PIL.Image w/range [0,255] to torch Tensor w/range [0,1]

Parameters:

img – PIL.Image
greyscale – if False, converts image to RGB (3 channels). If True, converts image to one channel.

opensoundscape.preprocess.action_functions.list_action_fns()[source]: return list of available action function keyword strings (can be used to initialize Action class)

opensoundscape.preprocess.action_functions.pcen(s, **kwargs)[source]

opensoundscape.preprocess.action_functions.random_lowpass(audio, cutoff_range=(3000, 9000), probability=0.5, order_range=(1, 1))[source]

randomly apply lowpass filter to audio

Parameters:

audio – an Audio object
cutoff_range – (min,max) frequency range in Hz for cutoff frequency - cutoff frequency is chosen randomly from this range with uniform distribution
probability – probability of applying the lowpass filter
order_range – (min,max) range of filter orders to choose from - order is chosen randomly from this range with uniform distribution - higher order = steeper filter rolloff; default 1 = gentle rolloff 2 already creates steep enough rollof to eliminate most high freq content

Returns:

Audio object, possibly lowpass filtered

opensoundscape.preprocess.action_functions.random_wrap_audio(audio, probability=0.5, max_shift=None)[source]

Randomly splits the audio into two parts, swapping their order

useful as a “time shift” augmentation when extra audio beyond the bounds is not available

Parameters:

audio – an Audio object
probability – probability of performing the augmentation
max_shift – max number of seconds to shift, default None means no limit

opensoundscape.preprocess.action_functions.register_action_fn(action_fn)[source]

add function to ACTION_FN_DICT

this allows us to recreate the Action class with a named action_fn

see also: ACTION_DICT (stores list of named classes for preprocessing)

opensoundscape.preprocess.action_functions.scale_tensor(tensor, input_mean=0.5, input_std=0.5)[source]

linear scaling of tensor values using torch.transforms.Normalize

(Tensor->Tensor)

WARNING: This does not perform per-image normalization. Instead, it takes as arguments a fixed u and s, ie for the entire dataset, and performs X=(X-input_mean)/input_std.

Parameters:

input_mean – mean of input sample pixels (average across dataset)
input_std – standard deviation of input sample pixels (average across dataset)
sd ((these are NOT the target mu and)
img (but the original mu and sd of)
mu=0 (for which the output will have)
std=1)

Returns:

modified tensor

opensoundscape.preprocess.action_functions.tensor_add_noise(tensor, std=1)[source]

Add gaussian noise to sample (Tensor -> Tensor)

Parameters:: std – standard deviation for Gaussian noise [default: 1]

Note: be aware that scaling before/after this action will change the effect of a fixed stdev Gaussian noise

opensoundscape.preprocess.action_functions.time_mask(tensor, max_masks=4, max_width=0.05)[source]

add random vertical bars over sample (Tensor -> Tensor)

Parameters:

tensor – input Torch.tensor sample
max_masks – maximum number of vertical bars [default: 3]
max_width – maximum size of bars as fraction of sample width

Returns:

augmented tensor

opensoundscape.preprocess.action_functions.torch_color_jitter(tensor, brightness=0.3, contrast=0.3, saturation=0.3, hue=0)[source]

Wraps torchvision.transforms.ColorJitter

(Tensor -> Tensor) or (PIL Img -> PIL Img)

Parameters:

tensor – input sample
brightness=0.3
contrast=0.3
saturation=0.3
hue=0

Returns:

modified tensor

opensoundscape.preprocess.action_functions.torch_random_affine(tensor, degrees=0, translate=(0.2, 0.05), fill=0)[source]

Wraps for torchvision.transforms.RandomAffine

(Tensor -> Tensor) or (PIL Img -> PIL Img)

Parameters:

tensor – torch.Tensor input saple
0 (degrees =)
= (translate)
0-255 (fill =)
channels (duplicated across)

Returns:

modified tensor

Note: If applying per-image normalization, we recommend applying RandomAffine after image normalization. In this case, an intermediate gray value is ~0. If normalization is applied after RandomAffine on a PIL image, use an intermediate fill color such as (122,122,122).

opensoundscape.preprocess.actions module

Actions for augmentation and preprocessing pipelines

This module contains Action classes which act as the elements in Preprocessor pipelines. Action classes have __call__() method that operates on an audio sample, using the .params dictionary of parameter values. They take a single sample of a specific type and return the transformed or augmented sample, which may or may not be the same type as the original.

See the action_functions.py module for functions that can be used to create actions using the Action class. Pass the Action class any function to the action_fn argument, and pass additional arguments to set parameters of the Action’s .params dictionary.

Note on converting to/from dictionary/json/yaml: This will break if you use non-built-in preprocessing operations. However, will work if you provide any custom functions/classes and decorate them with @register_action_cls or @register_action_fn. See the docstring of action_from_dict() for examples.

See the preprocessor module and Preprocessing tutorial for details on how to use and create your own actions.

class opensoundscape.preprocess.actions.Action(fn, is_augmentation=False, **kwargs)[source]

Bases: BaseAction

Action class for an arbitrary function

The function must take the sample as the first argument

Note that this allows two use cases: (A) regular function that takes an input object as first argument

eg. Audio.from_file(path,**kwargs)

method of a class, which takes ‘self’ as the first argument, eg. Spectrogram.bandpass(self,**kwargs)

Other arguments are an arbitrary list of kwargs.

classmethod from_dict(dict)[source]: initialize from dictionary created by .to_dict()

to_dict(ignore_attributes=())[source]

export current attributes and .params to a dictionary

useful for saving to JSON

re-load with .from_dict(dict)

class opensoundscape.preprocess.actions.AudioClipLoader(out_of_bounds_mode='ignore', **kwargs)[source]

Bases: Action

Action to load clips from an audio file

Loads an audio file or part of a file to an Audio object. Will load entire audio file if sample.start_time and sample.duration are None. If sample.start_time and sample.duration are provided, loads the audio only in the specified interval.

see Audio.from_file() for documentation.

Parameters:: Audio.from_file() (see)

class opensoundscape.preprocess.actions.AudioToSamplesTensor(is_augmentation=False)[source]

Bases: BaseAction

extract Audio.samples to a PyTorch tensor and add channel dimensions

class opensoundscape.preprocess.actions.AudioToTensor(is_augmentation=False)[source]: Bases: BaseAction

class opensoundscape.preprocess.actions.AudioTrim(**kwargs)[source]

Bases: Action

Action to trim/extend audio to desired length

Parameters:: actions.trim_audio() (see)

class opensoundscape.preprocess.actions.BaseAction(is_augmentation=False)[source]

Bases: object

Parent class for all Actions (used in Preprocessor pipelines)

New actions should subclass this class (or Action for pre-wired functionality).

classmethod from_dict(dict)[source]

initialize from dictionary created by .to_dict()

override if subclass should be initialized with any arguments

get(arg)[source]

set(**kwargs)[source]

to_dict(ignore_attributes=())[source]

export current attributes and .params to a dictionary

useful for saving to JSON

Parameters:: ignore_attributes – list of str: attributes to not save (useful for skipping large objects to reduce memory usage)

re-load with .from_dict(dict)

class opensoundscape.preprocess.actions.MelScale(*args: Any, **kwargs: Any)[source]

Bases: MelScale

Patch of torchaudio.transforms.MelScale that saves n_stft attribute

This allows re-loading from a dictionary with the correct n_stft value.

class opensoundscape.preprocess.actions.SpectrogramToTensor(fn=<function Spectrogram.to_image>, is_augmentation=False, **kwargs)[source]

Bases: Action

Action to create Tesnsor of desired shape from Spectrogram

calls .to_image on sample.data, which should be type Spectrogram

**kwargs are passed to Spectrogram.to_image()

class opensoundscape.preprocess.actions.TorchCropFreq(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(tensor)[source]

class opensoundscape.preprocess.actions.TorchTransforms(transforms)[source]

Bases: BaseAction

Action to apply torchvision transforms to sample

Parameters:: transforms – list of torchvision transform objects to apply in sequence see https://pytorch.org/vision/stable/transforms.html and https://pytorch.org/audio/stable/transforms.html

classmethod from_dict(dict)[source]: initialize from dictionary created by .to_dict()

to_dict(ignore_attributes=())[source]

export the composed transforms and their parameters to a dictionary

useful for saving to JSON

Will fail if any of the transforms or their parameters are not serializable.

property transforms

opensoundscape.preprocess.actions.action_from_dict(dict)[source]

load an action from a dictionary

Parameters:: dict – dictionary created with Action.to_dict() - contains keys ‘class’, ‘params’, and other keys for object attributes

Note: if the dictionary names a ‘class’ or ‘action_fn’ that is not built-in to OpenSoundscape, you should define the class/action in your code and add the decorator @register_action_cls or @register_action_fn

For instance, if we used the Action class and passed a custom action_fn: @register_action_fn def my_special_sauce(…):

…

Now we can use action_from_dict() to re-create an action that specifies ‘action_fn’:’__main__.my_special_sauce’

Similarly, say we defined a custom class in a module my_utils.py, we add the decorator before the class definition: @register_action_cls class Magic(BaseAction):

…

now we can use action_from_dict() to re-create the class from a dictionary that has ‘class’ : ‘my_utils.Magic’

opensoundscape.preprocess.actions.deserialize_transform(transform_dict)[source]: Recreate a transform from a serialized dict.

opensoundscape.preprocess.actions.list_actions()[source]: return list of available Action class keyword strings

opensoundscape.preprocess.actions.register_action_cls(action_cls)[source]: add class to ACTION_DICT

opensoundscape.preprocess.actions.register_all_methods(cls, public_only=True)[source]

opensoundscape.preprocess.actions.serialize_transform(transform)[source]: Convert a torchvision/torchaudio transform object into a JSON-serializable dict.

opensoundscape.preprocess.actions.trim_audio(sample, target_duration, extend=True, random_trim=False, tol=1e-10)[source]

trim audio clips from t=0 or random position (Audio -> Audio)

Trims an audio file to desired length.

Allows audio to be trimmed from start or from a random time

Optionally extends audio shorter than clip_length to sample.duration by appending silence.

Parameters:

sample – AudioSample with .data=Audio object, .duration as sample duration
target_duration – length of resulting clip in seconds. If None, no trimming is performed.
extend – if True, clips shorter than sample.duration are extended with silence to required length [Default: True]
random_trim – if True, chooses a random segment of length sample.duration from the input audio. If False, the file is trimmed from 0 seconds to sample.duration seconds. [Default: False]
tol – tolerance for considering a clip to be long enough (sec), when raising an error for short clips [Default: 1e-6]

Effects:: Updates the sample’s .data, .start_time, and .duration attributes

opensoundscape.preprocess.img_augment module

Transforms and augmentations for PIL.Images

opensoundscape.preprocess.img_augment.time_split(img, seed=None)[source]

Given a PIL.Image, split into left/right parts and swap

Randomly chooses the slicing location For example, if h chosen

abcdefghijklmnop: ^

hijklmnop + abcdefg

Parameters:: img – A PIL.Image
Returns:: A PIL.Image

opensoundscape.preprocess.io module

utilities for serializing, reading, and writing Action and Preprocessor objects to/from files and dictionaries

class opensoundscape.preprocess.io.CustomYamlDumper(*args: Any, **kwargs: Any)[source]: Bases: Dumper

class opensoundscape.preprocess.io.CustomYamlLoader(*args: Any, **kwargs: Any)[source]: Bases: Loader

class opensoundscape.preprocess.io.NumpyTypeDecoder(*args, **kwargs)[source]

Bases: JSONDecoder

recursively modify dictionary to change “numpy_dtype_…” strings to numpy dtypes

opensoundscape.preprocess.overlay module

class opensoundscape.preprocess.overlay.Overlay(overlay_samples, break_on_key, is_augmentation=True, sample_duration=None, **kwargs)[source]

Bases: BaseAction

Action Class for augmentation that overlays samples on eachother

Overlay is a flavor of “mixup” augmentation, where two samples are overlayed on top of eachother. The samples are blended with a weighted average, where the weight may be chosen randomly from a range of values.

In this implementation, the overlayed samples are chosen from a dataframe of audio files and labels. The dataframe must have the audio file paths as the index, and the labels as columns. The labels are used to choose overlayed samples based on an “overlay_class” argument.

Parameters:

overlay_samples – list or dataframe of audio files (index) and labels to use for overlay
update_labels (bool) – if True, labels of sample are updated to include labels of overlayed sample
criterion_fn – function that takes AudioSample and returns True or False - if True, perform overlay - if False, do not perform overlay Default is always_true, perform overlay on all samples
values (See overlay() for **kwargs and default) –

classmethod from_dict(dict)[source]

initialize from dictionary created by .to_dict()

override if subclass should be initialized with any arguments

to_dict()[source]

export current attributes and .params to a dictionary

useful for saving to JSON

Parameters:: ignore_attributes – list of str: attributes to not save (useful for skipping large objects to reduce memory usage)

re-load with .from_dict(dict)

opensoundscape.preprocess.overlay.always_true(x)[source]

opensoundscape.preprocess.overlay.overlay(sample, overlay_df, update_labels, break_on_key, overlay_class=None, overlay_prob=1, max_overlay_num=1, overlay_weight=0.5, criterion_fn=<function always_true>)[source]

iteratively overlay 2d samples on top of eachother

Overlays (blends) image-like samples from overlay_df on top of the sample with probability overlay_prob until stopping condition. If necessary, trims overlay audio to the length of the input audio.

Optionally provide criterion_fn which takes sample and returns True/False to determine whether to perform overlay on this sample.

Overlays can be used in a few general ways:

a separate df where any file can be overlayed (overlay_class=None)
same df as training, where the overlay class is “different” ie,
does not contain overlapping labels with the original sample
same df as training, where samples from a specific class are used
for overlays

Parameters:

sample – AudioSample with .labels: labels of the original sample and .preprocessor: the preprocessing pipeline
overlay_df – a labels dataframe with audio files as the index and classes as columns
update_labels – if True, add overlayed sample’s labels to original sample
break_on_key – the name of a preprocessing step to stop at when preprocessing the overlayed samples (typically this action’s name)
overlay_class –
how to choose files from overlay_df to overlay Options [default: None]: None - Randomly select any file from overlay_df “different” - Select a random file from overlay_df containing none

of the classes this file contains

specific class name - always choose files from this class
overlay_prob – the probability of applying each subsequent overlay
max_overlay_num –
the maximum number of samples to overlay on original - for example, if overlay_prob = 0.5 and max_overlay_num=2,

1/2 of samples will recieve 1 overlay and 1/4 will recieve an additional second overlay
overlay_weight – a float > 0 and < 1, or a list of 2 floats [min, max] between which the weight will be randomly chosen. e.g. [0.1,0.7] An overlay_weight <0.5 means more emphasis on original sample.
criterion_fn – function that takes AudioSample and returns True or False - if True, perform overlay - if False, do not perform overlay Default is always_true, perform overlay on all samples

Returns:

overlayed sample, (possibly updated) labels

Example

check if sample is from a xeno canto file (has “XC” in name), and only perform overlay on xeno canto files ``` def is_xc(audio_sample):

return “XC” in Path(audio_sample.source).stem

s=overlay(s, overlay_df, break_on_key=”overlay”, is_augmentation=True, criterion_fn=is_xc) ```

opensoundscape.preprocess.preprocessors module

Preprocessor classes: tools for preparing and augmenting audio samples

class opensoundscape.preprocess.preprocessors.AudioAugmentationPreprocessor(**kwargs)[source]

Bases: AudioPreprocessor

AudioPreprocessor that applies augmentations to audio samples during training

class opensoundscape.preprocess.preprocessors.AudioPreprocessor(sample_duration, sample_rate, extend_short_clips=True)[source]

Bases: BasePreprocessor

Child of BasePreprocessor that only loads audio and resamples

Parameters:

sample_duration – length in seconds of audio samples generated
sample_rate – target sample rate. [default: None] does not resample
extend_short_clips – if True, clips shorter than sample_duration are extended to sample_duration by adding silence.

class opensoundscape.preprocess.preprocessors.BasePreprocessor(sample_duration=None, sample_rate=None)[source]

Bases: object

Class for defining an ordered set of Actions and a way to run them

Custom Preprocessor classes should subclass this class or its children

Preprocessors have one job: to transform samples from some input (eg a file path) to some output (eg an AudioSample with .data as torch.Tensor) using a specific procedure defined by the .pipeline attribute. The procedure consists of Actions ordered by the Preprocessor’s .pipeline. Preprocessors have a forward() method which sequentially applies the Actions in the pipeline to produce a sample.

Parameters:: sample_duration – length of audio samples to generate (seconds)

forward(sample, break_on_type=None, break_on_key=None, bypass_augmentations=False, trace=False, profile=False)[source]

perform actions in self.pipeline on a sample (until a break point)

Actions with .bypass = True are skipped. Actions with .is_augmentation = True can be skipped by passing bypass_augmentations=True.

Parameters:

sample – any of - (path, start time) tuple - pd.Series with (file, start_time, end_time) as .name (eg index of a pd.DataFrame from which row was taken) - AudioSample object
break_on_type – if not None, the pipeline will be stopped when it reaches an Action of this class. The matching action is not performed.
break_on_key – if not None, the pipeline will be stopped when it reaches an Action whose index equals this value. The matching action is not performed.
clip_times –
can be either - None: the file is treated as a single sample - dictionary {“start_time”:float,”end_time”:float}:

the start and end time of clip in audio
bypass_augmentations – if True, actions with .is_augmentatino=True are skipped
trace (boolean - default False) – if True, saves the output of each pipeline step in the sample_info output argument Can be used for analysis/debugging of intermediate values of the sample during preprocessing
profile (boolean - default False) – if True, saves the runtime of each pipeline step in .runtime (a series indexed like .pipeline)

Returns:

sample (instance of AudioSample class)

classmethod from_dict(dict)[source]

classmethod from_json(path)[source]

load preprocessor from a json file

for instance, file created with .save_json()

classmethod from_yaml(path)[source]

load preprocessor from a YAML file

for instance, file created with .save_yaml()

note that safe_load is not used, so make sure you trust the author of the file

Parameters:: path – path to the .yaml file
Returns:: instance of a preprocessor class
Return type:: preprocessor

insert_action(action_index, action, after_key=None, before_key=None)[source]

insert an action in specific specific position

This is an in-place operation

Inserts a new action before or after a specific key. If after_key and before_key are both None, action is appended to the end of the index.

Parameters:

action_index – string key for new action in index
action – the action object, must be subclass of BaseAction
after_key – insert the action immediately after this key in index
before_key – insert the action immediately before this key in index Note: only one of (after_key, before_key) can be specified

remove_action(action_index)[source]

alias for self.drop(…,inplace=True), removes an action

This is an in-place operation

Parameters:: action_index – index of action to remove

save(path)[source]

save preprocessor to a file

Parameters:: path – path to the file, with .json or .yaml extension

save_json(path)[source]

save preprocessor to a json file

re-load with load_json(path) or .from_json(path)

save_yaml(path)[source]

save preprocessor to a YAML file

re-load with load_yaml(path) or .from_yaml(path)

to_dict()[source]

class opensoundscape.preprocess.preprocessors.NoiseReduceAudioPreprocessor(sample_duration, sample_rate, extend_short_clips=True, noisereduce_kwargs=None)[source]: Bases: AudioPreprocessor

class opensoundscape.preprocess.preprocessors.NoiseReduceSpectrogramPreprocessor(sample_duration, sample_rate, overlay_samples=None, height=None, width=None, channels=1, noisereduce_kwargs=None)[source]: Bases: SpectrogramPreprocessor

class opensoundscape.preprocess.preprocessors.PCENPreprocessor(*args, **kwargs)[source]: Bases: SpectrogramPreprocessor

class opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor(sample_duration, sample_rate, overlay_samples=None, height=None, width=None, channels=1, bandpass_range=None, use_legacy_spectrogram=False)[source]

Bases: BasePreprocessor

Child of BasePreprocessor that creates specrogram Tensors w/augmentation

loads audio, creates spectrogram, performs augmentations, creates tensor

by default, does not resample audio, but bandpasses to 0-11.025 kHz (to ensure all outputs have same scale in y-axis) can change with .pipeline.bandpass.set(min_f=,max_f=)

Parameters:

sample_duration – length in seconds of audio samples generated If not None, longer clips are trimmed to this length. By default, shorter clips will be extended (modify random_trim_audio and trim_audio to change behavior).
sample_rate – target sample rate. if None, does not resample
overlay_samples – if not None, will include an overlay (mixup) action samples can be a dataframe of file/start/end times or a set of audio files
height – height of output sample (frequency axis) - default None will use the original height of the spectrogram
width – width of output sample (time axis) - default None will use the original width of the spectrogram
channels – number of channels in output sample (default 1)
bandpass_range –
tuple (min_f, max_f) in Hz for cropping spectrogram frequency axis - default None retains full frequency range (0 - sample_rate/2 Hz) - if sample_rate is None and input audio can be multiple sample rates,

bandpass_range should be used to ensure specs have a consistent frequency range

class opensoundscape.preprocess.preprocessors.TorchSpectrogramPreprocessor(sample_duration, sample_rate, overlay_samples=None, torch_transforms=None, spec_nfft=512, spec_window_length=None, spec_hop_length=None, lower_dB_range=-80, bandpass_range=None, rescale_mean_sd=None, resize_ft=None, n_mels=None)[source]

Bases: BasePreprocessor

Spectrogram Preprocessor using torchvision.transforms for export to ONNX

opensoundscape.preprocess.preprocessors.load(path)[source]

load preprocessor from a file (json or yaml)

use to load preprocessor definitions saved with .save()

Parameters:: path – path to the file
Returns:: instance of a preprocessor class
Return type:: preprocessor

opensoundscape.preprocess.preprocessors.load_json(path)[source]

load preprocessor from a json file

for instance, file created with .save_json()

opensoundscape.preprocess.preprocessors.load_yaml(path)[source]

load preprocessor from a YAML file

for instance, file created with .save_yaml()

Parameters:: path – path to the .yaml file
Returns:: instance of a preprocessor class
Return type:: preprocessor

opensoundscape.preprocess.preprocessors.preprocessor_from_dict(dict)[source]

load a preprocessor from a dictionary saved with pre.to_dict()

looks up class name using the “class” key in PREPROCESSOR_CLS_DICT requires that the class was decorated with @register_preprocessor_cls so that it is listed in PREPROCESSOR_CLS_DICT.

If you write a custom preprocessor class, you must decorate it with @register_preprocessor_cls so that it can be looked up by name during from_dict

Parameters:

dict – dictionary created with a preprocessor class’s .to_dict() method

Returns:

initialized preprocessor with same configuration and parameters as original - some caveats: Overlay augentation will not re-load fully, as overlay sample

dataframes and `criterion_fn`s are not saved

See also: BasePreprocessor.from_dict(), .save_json(), load_json()

opensoundscape.preprocess.preprocessors.register_preprocessor_cls(cls)[source]: add class to PREPROCESSOR_CLS_DICT

opensoundscape.preprocess.preprocessors.replace_nones(value)[source]

opensoundscape.preprocess.tensor_augment module

Augmentations and transforms for torch.Tensors

opensoundscape.preprocess.tensor_augment.freq_mask(spec, F=30, max_masks=3, replace_with_zero=False)[source]

draws horizontal bars over the image

Parameters:

spec – a torch.Tensor representing a spectrogram
F – maximum frequency-width of bars in pixels
max_masks – maximum number of bars to draw
replace_with_zero – if True, bars are 0s, otherwise, mean img value

Returns:

Augmented tensor

opensoundscape.preprocess.tensor_augment.time_mask(spec, T=40, max_masks=3, replace_with_zero=False)[source]

draws vertical bars over the image

Parameters:

spec – a torch.Tensor representing a spectrogram
T – maximum time-width of bars in pixels
max_masks – maximum number of bars to draw
replace_with_zero – if True, bars are 0s, otherwise, mean img value

Returns:

Augmented tensor

opensoundscape.preprocess.utils module

Utilities for preprocessing

exception opensoundscape.preprocess.utils.PreprocessingError[source]

Bases: Exception

Custom exception indicating that a Preprocessor pipeline failed

opensoundscape.preprocess.utils.get_args(func)[source]

get list of arguments and default values from a function

ignores ‘kwargs’ argument, which is included in inspect.signature.parameters

opensoundscape.preprocess.utils.get_reqd_args(func)[source]: get list of required arguments from a function

opensoundscape.preprocess.utils.process_tensor_for_display(tensor, channel=None, normalize_from_range=[-1, 1], invert=False, clip=None)[source]

process tensor for display as image

Moves channel axis from first to third position, converts torch.Tensor to numpy array, rescales values from [min,max] to [0,1]

Parameters:

tensor – torch.Tensor of shape [c,w,h]
channel – specify an integer to plot only one channel (axis 0) otherwise will return all channels
normalize_from_range – list of [min,max] values to normalize tensor from
invert – if true, flips value range via x=1-x
clip – if specified, tuple of (min,max) to clip values to after normalization

Returns:

numpy array of shape [w,h] or [w,h,c]

opensoundscape.preprocess.utils.show_tensor(tensor, channel=None, normalize_from_range=[-1, 1], invert=False, cmap=None, clip=[0, 1], axis=None)[source]

helper function for displaying a sample as an image

Parameters:

tensor – torch.Tensor of shape [c,w,h] with values centered around zero
channel – specify an integer to plot only one channel, otherwise will attempt to plot all channels
transform_from_zero_centered – if True, transforms values from [-1,1] to [0,1]
invert – if true, flips value range via x=1-x
cmap – matplotlib colormap passed to plt.imshow() - if None, will choose ‘Greys’ if only one channel
clip – if specified, tuple of (min,max) to clip values to after normalization
axis – matplotlib axis to plot on, if None will create new figure

opensoundscape.preprocess.utils.show_tensor_grid(tensors, columns, labels=None, channel=None, normalize_from_range=[-1, 1], invert=False, cmap=None, clip=[0, 1], axes=None, pad=0.05, gap=0.05, title_height=0.07)[source]

Create a tightly packed image grid of tensors.

Parameters:

tensors – list of torch.Tensor objects to display
columns – number of columns in the grid
labels – optional list of titles for each tensor
channel – specify an integer to plot only one channel, otherwise will attempt to plot all channels
normalize_from_range – list of [min,max] values to normalize tensor from
invert – if true, flips value range via x=1-x
cmap – matplotlib colormap passed to plt.imshow() - if None, will choose ‘Greys’ if only one channel
clip – if specified, tuple of (min,max) to clip values to after normalization
axes – optional matplotlib axes to plot on, if None will create new figure
pad – outer margin around the grid (fraction of figure size)
gap – inner gap between images (fraction of figure size)
title_height – extra top margin for titles (fraction of figure size)

Returns:

numpy array of matplotlib axes objects

Return type:

axes

opensoundscape.preprocess package

Submodules

opensoundscape.preprocess.action_functions module

opensoundscape.preprocess.actions module

opensoundscape.preprocess.img_augment module

opensoundscape.preprocess.io module

opensoundscape.preprocess.overlay module

opensoundscape.preprocess.preprocessors module

opensoundscape.preprocess.tensor_augment module

opensoundscape.preprocess.utils module

Module contents