opensoundscape.preprocess package
Submodules
opensoundscape.preprocess.action_functions module
preprocessing and augmentation functions
these can be passed to the Action class (action_fn=…) to create a preprocessing action that applies the function to a sample
- opensoundscape.preprocess.action_functions.audio_add_noise(audio, noise_dB=-30, signal_dB=0, color='white')[source]
Generates noise and adds to audio object
- Parameters:
audio – an Audio object
noise_dB – number or range: dBFS of noise signal generated - if number, crates noise with dB dBFS level - if (min,max) tuple, chooses noise dBFS randomly from range with a uniform distribution
signal_dB – dB (decibels) gain to apply to the incoming Audio before mixing with noise [default: -3 dB] - like noise_dB, can specify (min,max) tuple to use random uniform choice in range
Returns: Audio object with noise added
- opensoundscape.preprocess.action_functions.audio_random_gain(audio, dB_range=(-30, 0), clip_range=(-1, 1))[source]
Applies a randomly selected gain level to an Audio object
Gain is selected from a uniform distribution in the range dB_range
- Parameters:
audio – an Audio object
dB_range – (min,max) decibels of gain to apply - dB gain applied is chosen from a uniform random distribution in this range
Returns: Audio object with gain applied
- opensoundscape.preprocess.action_functions.audio_time_mask(audio, max_masks=10, max_width=0.02, noise_dBFS=-15, noise_color='white')[source]
randomly replace time slices with noise
- Parameters:
audio – input Audio object
max_masks – maximum number of white noise time masks [default: 10]
max_width – maximum size of bars as fraction of sample width [default: 0.02]
noise_color (noise_dBFS &) – see Audio.noise() dBFS and color args
- Returns:
augmented Audio object
- opensoundscape.preprocess.action_functions.frequency_mask(tensor, max_masks=3, max_width=0.2)[source]
add random horizontal bars over Tensor
- Parameters:
tensor – input Torch.tensor sample
max_masks – max number of horizontal bars [default: 3]
max_width – maximum size of horizontal bars as fraction of sample height
- Returns:
augmented tensor
- opensoundscape.preprocess.action_functions.image_to_tensor(img, greyscale=False)[source]
Convert PIL image to RGB or greyscale Tensor (PIL.Image -> Tensor)
convert PIL.Image w/range [0,255] to torch Tensor w/range [0,1]
- Parameters:
img – PIL.Image
greyscale – if False, converts image to RGB (3 channels). If True, converts image to one channel.
- opensoundscape.preprocess.action_functions.list_action_fns()[source]
return list of available action function keyword strings (can be used to initialize Action class)
- opensoundscape.preprocess.action_functions.random_wrap_audio(audio, probability=0.5, max_shift=None)[source]
Randomly splits the audio into two parts, swapping their order
useful as a “time shift” augmentation when extra audio beyond the bounds is not available
- Parameters:
audio – an Audio object
probability – probability of performing the augmentation
max_shift – max number of seconds to shift, default None means no limit
- opensoundscape.preprocess.action_functions.register_action_fn(action_fn)[source]
add function to ACTION_FN_DICT
this allows us to recreate the Action class with a named action_fn
see also: ACTION_DICT (stores list of named classes for preprocessing)
- opensoundscape.preprocess.action_functions.scale_tensor(tensor, input_mean=0.5, input_std=0.5)[source]
linear scaling of tensor values using torch.transforms.Normalize
(Tensor->Tensor)
WARNING: This does not perform per-image normalization. Instead, it takes as arguments a fixed u and s, ie for the entire dataset, and performs X=(X-input_mean)/input_std.
- Parameters:
input_mean – mean of input sample pixels (average across dataset)
input_std – standard deviation of input sample pixels (average across dataset)
sd ((these are NOT the target mu and)
img (but the original mu and sd of)
mu=0 (for which the output will have)
std=1)
- Returns:
modified tensor
- opensoundscape.preprocess.action_functions.tensor_add_noise(tensor, std=1)[source]
Add gaussian noise to sample (Tensor -> Tensor)
- Parameters:
std – standard deviation for Gaussian noise [default: 1]
Note: be aware that scaling before/after this action will change the effect of a fixed stdev Gaussian noise
- opensoundscape.preprocess.action_functions.time_mask(tensor, max_masks=3, max_width=0.2)[source]
add random vertical bars over sample (Tensor -> Tensor)
- Parameters:
tensor – input Torch.tensor sample
max_masks – maximum number of vertical bars [default: 3]
max_width – maximum size of bars as fraction of sample width
- Returns:
augmented tensor
- opensoundscape.preprocess.action_functions.torch_color_jitter(tensor, brightness=0.3, contrast=0.3, saturation=0.3, hue=0)[source]
Wraps torchvision.transforms.ColorJitter
(Tensor -> Tensor) or (PIL Img -> PIL Img)
- Parameters:
tensor – input sample
brightness=0.3
contrast=0.3
saturation=0.3
hue=0
- Returns:
modified tensor
- opensoundscape.preprocess.action_functions.torch_random_affine(tensor, degrees=0, translate=(0.3, 0.1), fill=0)[source]
Wraps for torchvision.transforms.RandomAffine
(Tensor -> Tensor) or (PIL Img -> PIL Img)
- Parameters:
tensor – torch.Tensor input saple
0 (degrees =)
= (translate)
0-255 (fill =)
channels (duplicated across)
- Returns:
modified tensor
Note: If applying per-image normalization, we recommend applying RandomAffine after image normalization. In this case, an intermediate gray value is ~0. If normalization is applied after RandomAffine on a PIL image, use an intermediate fill color such as (122,122,122).
opensoundscape.preprocess.actions module
Actions for augmentation and preprocessing pipelines
This module contains Action classes which act as the elements in Preprocessor pipelines. Action classes have __call__() method that operates on an audio sample, using the .params dictionary of parameter values. They take a single sample of a specific type and return the transformed or augmented sample, which may or may not be the same type as the original.
See the action_functions.py module for functions that can be used to create actions using the Action class. Pass the Action class any function to the action_fn argument, and pass additional arguments to set parameters of the Action’s .params dictionary.
Note on converting to/from dictionary/json/yaml: This will break if you use non-built-in preprocessing operations. However, will work if you provide any custom functions/classes and decorate them with @register_action_cls or @register_action_fn. See the docstring of action_from_dict() for examples.
See the preprocessor module and Preprocessing tutorial for details on how to use and create your own actions.
- class opensoundscape.preprocess.actions.Action(fn, is_augmentation=False, **kwargs)[source]
Bases:
BaseAction
Action class for an arbitrary function
The function must take the sample as the first argument
Note that this allows two use cases: (A) regular function that takes an input object as first argument
eg. Audio.from_file(path,**kwargs)
method of a class, which takes ‘self’ as the first argument, eg. Spectrogram.bandpass(self,**kwargs)
Other arguments are an arbitrary list of kwargs.
- class opensoundscape.preprocess.actions.AudioClipLoader(**kwargs)[source]
Bases:
Action
Action to load clips from an audio file
Loads an audio file or part of a file to an Audio object. Will load entire audio file if sample.start_time and sample.duration are None. If sample.start_time and sample.duration are provided, loads the audio only in the specified interval.
see Audio.from_file() for documentation.
- Parameters:
Audio.from_file() (see)
- class opensoundscape.preprocess.actions.AudioTrim(**kwargs)[source]
Bases:
Action
Action to trim/extend audio to desired length
- Parameters:
actions.trim_audio() (see)
- class opensoundscape.preprocess.actions.BaseAction(is_augmentation=False)[source]
Bases:
object
Parent class for all Actions (used in Preprocessor pipelines)
New actions should subclass this class.
- class opensoundscape.preprocess.actions.SpectrogramToTensor(fn=<function Spectrogram.to_image>, is_augmentation=False, **kwargs)[source]
Bases:
Action
Action to create Tesnsor of desired shape from Spectrogram
calls .to_image on sample.data, which should be type Spectrogram
**kwargs are passed to Spectrogram.to_image()
- opensoundscape.preprocess.actions.action_from_dict(dict)[source]
load an action from a dictionary
- Parameters:
dict – dictionary created with Action.to_dict() - contains keys ‘class’, ‘params’, and other keys for object attributes
Note: if the dictionary names a ‘class’ or ‘action_fn’ that is not built-in to OpenSoundscape, you should define the class/action in your code and add the decorator @register_action_cls or @register_action_fn
For instance, if we used the Action class and passed a custom action_fn: @register_action_fn def my_special_sauce(…):
…
Now we can use action_from_dict() to re-create an action that specifies ‘action_fn’:’__main__.my_special_sauce’
Similarly, say we defined a custom class in a module my_utils.py, we add the decorator before the class definition: @register_action_cls class Magic(BaseAction):
…
now we can use action_from_dict() to re-create the class from a dictionary that has ‘class’ : ‘my_utils.Magic’
- opensoundscape.preprocess.actions.list_actions()[source]
return list of available Action class keyword strings
- opensoundscape.preprocess.actions.trim_audio(sample, target_duration, extend=True, random_trim=False, tol=1e-10)[source]
trim audio clips from t=0 or random position (Audio -> Audio)
Trims an audio file to desired length.
Allows audio to be trimmed from start or from a random time
Optionally extends audio shorter than clip_length to sample.duration by appending silence.
- Parameters:
sample – AudioSample with .data=Audio object, .duration as sample duration
target_duration – length of resulting clip in seconds. If None, no trimming is performed.
extend – if True, clips shorter than sample.duration are extended with silence to required length [Default: True]
random_trim – if True, chooses a random segment of length sample.duration from the input audio. If False, the file is trimmed from 0 seconds to sample.duration seconds. [Default: False]
tol – tolerance for considering a clip to be long enough (sec), when raising an error for short clips [Default: 1e-6]
- Effects:
Updates the sample’s .data, .start_time, and .duration attributes
opensoundscape.preprocess.img_augment module
Transforms and augmentations for PIL.Images
opensoundscape.preprocess.io module
utilities for serializing, reading, and writing Action and Preprocessor objects to/from files and dictionaries
- class opensoundscape.preprocess.io.CustomYamlDumper(*args: Any, **kwargs: Any)[source]
Bases:
Dumper
- class opensoundscape.preprocess.io.CustomYamlLoader(*args: Any, **kwargs: Any)[source]
Bases:
Loader
- class opensoundscape.preprocess.io.NumpyTypeDecoder(*args, **kwargs)[source]
Bases:
JSONDecoder
recursively modify dictionary to change “numpy_dtype_…” strings to numpy dtypes
See also: NumpyTypeEncoder
- class opensoundscape.preprocess.io.NumpyTypeEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]
Bases:
JSONEncoder
replace numpy dtypes with strings & prefix numpy_dtype_
otherwise, can’t serialize numpy dtypes as the value in a dictionary
- default(obj)[source]
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
opensoundscape.preprocess.overlay module
- class opensoundscape.preprocess.overlay.Overlay(is_augmentation=True, **kwargs)[source]
Bases:
Action
Action Class for augmentation that overlays samples on eachother
Overlay is a flavor of “mixup” augmentation, where two samples are overlayed on top of eachother. The samples are blended with a weighted average, where the weight may be chosen randomly from a range of values.
In this implementation, the overlayed samples are chosen from a dataframe of audio files and labels. The dataframe must have the audio file paths as the index, and the labels as columns. The labels are used to choose overlayed samples based on an “overlay_class” argument.
- Parameters:
overlay_df – dataframe of audio files (index) and labels to use for overlay
update_labels (bool) – if True, labels of sample are updated to include labels of overlayed sample
criterion_fn – function that takes AudioSample and returns True or False - if True, perform overlay - if False, do not perform overlay Default is always_true, perform overlay on all samples
values (See overlay() for **kwargs and default) –
- opensoundscape.preprocess.overlay.overlay(sample, overlay_df, update_labels, overlay_class=None, overlay_prob=1, max_overlay_num=1, overlay_weight=0.5, criterion_fn=<function always_true>)[source]
iteratively overlay 2d samples on top of eachother
Overlays (blends) image-like samples from overlay_df on top of the sample with probability overlay_prob until stopping condition. If necessary, trims overlay audio to the length of the input audio.
Optionally provide criterion_fn which takes sample and returns True/False to determine whether to perform overlay on this sample.
- Overlays can be used in a few general ways:
a separate df where any file can be overlayed (overlay_class=None)
- same df as training, where the overlay class is “different” ie,
does not contain overlapping labels with the original sample
- same df as training, where samples from a specific class are used
for overlays
- Parameters:
sample – AudioSample with .labels: labels of the original sample and .preprocessor: the preprocessing pipeline
overlay_df – a labels dataframe with audio files as the index and classes as columns
update_labels – if True, add overlayed sample’s labels to original sample
overlay_class –
how to choose files from overlay_df to overlay Options [default: None]: None - Randomly select any file from overlay_df “different” - Select a random file from overlay_df containing none
of the classes this file contains
specific class name - always choose files from this class
overlay_prob – the probability of applying each subsequent overlay
max_overlay_num –
the maximum number of samples to overlay on original - for example, if overlay_prob = 0.5 and max_overlay_num=2,
1/2 of samples will recieve 1 overlay and 1/4 will recieve an additional second overlay
overlay_weight – a float > 0 and < 1, or a list of 2 floats [min, max] between which the weight will be randomly chosen. e.g. [0.1,0.7] An overlay_weight <0.5 means more emphasis on original sample.
criterion_fn – function that takes AudioSample and returns True or False - if True, perform overlay - if False, do not perform overlay Default is always_true, perform overlay on all samples
- Returns:
overlayed sample, (possibly updated) labels
Example
check if sample is from a xeno canto file (has “XC” in name), and only perform overlay on xeno canto files ``` def is_xc(audio_sample):
return “XC” in Path(audio_sample.source).stem
opensoundscape.preprocess.preprocessors module
Preprocessor classes: tools for preparing and augmenting audio samples
- class opensoundscape.preprocess.preprocessors.AudioAugmentationPreprocessor(**kwargs)[source]
Bases:
AudioPreprocessor
AudioPreprocessor that applies augmentations to audio samples during training
- class opensoundscape.preprocess.preprocessors.AudioPreprocessor(sample_duration, sample_rate, extend_short_clips=True)[source]
Bases:
BasePreprocessor
Child of BasePreprocessor that only loads audio and resamples
- Parameters:
sample_duration – length in seconds of audio samples generated
sample_rate – target sample rate. [default: None] does not resample
extend_short_clips – if True, clips shorter than sample_duration are extended to sample_duration by adding silence.
- class opensoundscape.preprocess.preprocessors.BasePreprocessor(sample_duration=None)[source]
Bases:
object
Class for defining an ordered set of Actions and a way to run them
Custom Preprocessor classes should subclass this class or its children
Preprocessors have one job: to transform samples from some input (eg a file path) to some output (eg an AudioSample with .data as torch.Tensor) using a specific procedure defined by the .pipeline attribute. The procedure consists of Actions ordered by the Preprocessor’s .pipeline. Preprocessors have a forward() method which sequentially applies the Actions in the pipeline to produce a sample.
- Parameters:
sample_duration – length of audio samples to generate (seconds)
- forward(sample, break_on_type=None, break_on_key=None, bypass_augmentations=False, trace=False, profile=False)[source]
perform actions in self.pipeline on a sample (until a break point)
Actions with .bypass = True are skipped. Actions with .is_augmentation = True can be skipped by passing bypass_augmentations=True.
- Parameters:
sample – any of - (path, start time) tuple - pd.Series with (file, start_time, end_time) as .name (eg index of a pd.DataFrame from which row was taken) - AudioSample object
break_on_type – if not None, the pipeline will be stopped when it reaches an Action of this class. The matching action is not performed.
break_on_key – if not None, the pipeline will be stopped when it reaches an Action whose index equals this value. The matching action is not performed.
clip_times –
can be either - None: the file is treated as a single sample - dictionary {“start_time”:float,”end_time”:float}:
the start and end time of clip in audio
bypass_augmentations – if True, actions with .is_augmentatino=True are skipped
trace (boolean - default False) – if True, saves the output of each pipeline step in the sample_info output argument Can be used for analysis/debugging of intermediate values of the sample during preprocessing
profile (boolean - default False) – if True, saves the runtime of each pipeline step in .runtime (a series indexed like .pipeline)
- Returns:
sample (instance of AudioSample class)
- classmethod from_json(path)[source]
load preprocessor from a json file
for instance, file created with .save_json()
- classmethod from_yaml(path)[source]
load preprocessor from a YAML file
for instance, file created with .save_yaml()
note that safe_load is not used, so make sure you trust the author of the file
- Parameters:
path – path to the .yaml file
- Returns:
instance of a preprocessor class
- Return type:
preprocessor
- insert_action(action_index, action, after_key=None, before_key=None)[source]
insert an action in specific specific position
This is an in-place operation
Inserts a new action before or after a specific key. If after_key and before_key are both None, action is appended to the end of the index.
- Parameters:
action_index – string key for new action in index
action – the action object, must be subclass of BaseAction
after_key – insert the action immediately after this key in index
before_key – insert the action immediately before this key in index Note: only one of (after_key, before_key) can be specified
- remove_action(action_index)[source]
alias for self.drop(…,inplace=True), removes an action
This is an in-place operation
- Parameters:
action_index – index of action to remove
- save(path)[source]
save preprocessor to a file
- Parameters:
path – path to the file, with .json or .yaml extension
- save_json(path)[source]
save preprocessor to a json file
re-load with load_json(path) or .from_json(path)
- class opensoundscape.preprocess.preprocessors.NoiseReduceAudioPreprocessor(sample_duration, sample_rate, extend_short_clips=True, noisereduce_kwargs=None)[source]
Bases:
AudioPreprocessor
- class opensoundscape.preprocess.preprocessors.NoiseReduceSpectrogramPreprocessor(sample_duration, overlay_df=None, height=None, width=None, channels=1, noisereduce_kwargs=None)[source]
Bases:
SpectrogramPreprocessor
- class opensoundscape.preprocess.preprocessors.PCENPreprocessor(*args, **kwargs)[source]
Bases:
SpectrogramPreprocessor
- class opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor(sample_duration, overlay_df=None, height=224, width=224, channels=1, sample_shape=None)[source]
Bases:
BasePreprocessor
Child of BasePreprocessor that creates specrogram Tensors w/augmentation
loads audio, creates spectrogram, performs augmentations, creates tensor
by default, does not resample audio, but bandpasses to 0-11.025 kHz (to ensure all outputs have same scale in y-axis) can change with .pipeline.bandpass.set(min_f=,max_f=)
- Parameters:
sample_duration – length in seconds of audio samples generated If not None, longer clips are trimmed to this length. By default, shorter clips will be extended (modify random_trim_audio and trim_audio to change behavior).
overlay_df – if not None, will include an overlay action drawing samples from this df
height – height of output sample (frequency axis) - default None will use the original height of the spectrogram
width – width of output sample (time axis) - default None will use the original width of the spectrogram
channels – number of channels in output sample (default 1)
sample_shape – tuple of (height, width, channels) for output sample Deprecated in favor of using height, width, channels - if not None, will override height, width, channels [default: None] means use height, width, channels arguments
- opensoundscape.preprocess.preprocessors.load(path)[source]
load preprocessor from a file (json or yaml)
use to load preprocessor definitions saved with .save()
- Parameters:
path – path to the file
- Returns:
instance of a preprocessor class
- Return type:
preprocessor
- opensoundscape.preprocess.preprocessors.load_json(path)[source]
load preprocessor from a json file
for instance, file created with .save_json()
- opensoundscape.preprocess.preprocessors.load_yaml(path)[source]
load preprocessor from a YAML file
for instance, file created with .save_yaml()
- Parameters:
path – path to the .yaml file
- Returns:
instance of a preprocessor class
- Return type:
preprocessor
- opensoundscape.preprocess.preprocessors.preprocessor_from_dict(dict)[source]
load a preprocessor from a dictionary saved with pre.to_dict()
looks up class name using the “class” key in PREPROCESSOR_CLS_DICT requires that the class was decorated with @register_preprocessor_cls so that it is listed in PREPROCESSOR_CLS_DICT.
If you write a custom preprocessor class, you must decorate it with @register_preprocessor_cls so that it can be looked up by name during from_dict
- Parameters:
dict – dictionary created with a preprocessor class’s .to_dict() method
- Returns:
initialized preprocessor with same configuration and parameters as original - some caveats: Overlay augentation will not re-load fully, as overlay sample
dataframes and `criterion_fn`s are not saved
See also: BasePreprocessor.from_dict(), .save_json(), load_json()
opensoundscape.preprocess.tensor_augment module
Augmentations and transforms for torch.Tensors
- opensoundscape.preprocess.tensor_augment.freq_mask(spec, F=30, max_masks=3, replace_with_zero=False)[source]
draws horizontal bars over the image
- Parameters:
spec – a torch.Tensor representing a spectrogram
F – maximum frequency-width of bars in pixels
max_masks – maximum number of bars to draw
replace_with_zero – if True, bars are 0s, otherwise, mean img value
- Returns:
Augmented tensor
- opensoundscape.preprocess.tensor_augment.time_mask(spec, T=40, max_masks=3, replace_with_zero=False)[source]
draws vertical bars over the image
- Parameters:
spec – a torch.Tensor representing a spectrogram
T – maximum time-width of bars in pixels
max_masks – maximum number of bars to draw
replace_with_zero – if True, bars are 0s, otherwise, mean img value
- Returns:
Augmented tensor
opensoundscape.preprocess.utils module
Utilities for preprocessing
- exception opensoundscape.preprocess.utils.PreprocessingError[source]
Bases:
Exception
Custom exception indicating that a Preprocessor pipeline failed
- opensoundscape.preprocess.utils.get_args(func)[source]
get list of arguments and default values from a function
ignores ‘kwargs’ argument, which is included in inspect.signature.parameters
- opensoundscape.preprocess.utils.get_reqd_args(func)[source]
get list of required arguments from a function
- opensoundscape.preprocess.utils.process_tensor_for_display(tensor, channel=None, transform_from_zero_centered=True, invert=False)[source]
- opensoundscape.preprocess.utils.show_tensor(tensor, channel=None, transform_from_zero_centered=True, invert=False, cmap=None)[source]
helper function for displaying a sample as an image
- Parameters:
tensor – torch.Tensor of shape [c,w,h] with values centered around zero
channel – specify an integer to plot only one channel, otherwise will attempt to plot all channels
transform_from_zero_centered – if True, transforms values from [-1,1] to [0,1]
invert – if true, flips value range via x=1-x
cmap – matplotlib colormap passed to plt.imshow() - if None, will choose ‘Greys’ if only one channel
- opensoundscape.preprocess.utils.show_tensor_grid(tensors, columns, channel=None, transform_from_zero_centered=True, invert=False, labels=None)[source]
create image of nxn tensors
- Parameters:
tensors – list of samples
columns – number of columns in grid
labels – title of each subplot
args (for other)
show_tensor() (see)