opensoundscape.preprocess package
Submodules
opensoundscape.preprocess.action_functions module
preprocessing and augmentation functions
these can be passed to the Action class (action_fn=…) to create a preprocessing action that applies the function to a sample
- opensoundscape.preprocess.action_functions.adaptive_random_gain(audio, gain_range=(-30, 0), min_output_level=-40, clip_range=(-1, 1))[source]
apply gain while maintaining a minimum resulting dBFS level
applies a randomly selected gain level to an Audio object, while ensuring that the resulting audio has at least min_output_level dBFS (while respecting the maximum gain allowed in the gain_range argument)
- Parameters:
audio – an Audio object
gain_range – (min,max) decibels of gain to apply - dB gain applied is chosen from a uniform random distribution in this range
min_output_level – minimum dBFS level of resulting audio - if audio.dBFS + gain < min_output_level, gain_range is restricted to ensure resulting audio is at least min_output_level dBFS
- opensoundscape.preprocess.action_functions.adaptive_random_noise(audio, snr_range=(-20, 0), input_gain=0, color='white')[source]
apply random noise, selecting from a signal to noise ratio range
- Parameters:
audio – an Audio object
snr_range – (min,max) decibels of signal to noise ratio - SNR is defined here as signal_dB - noise_dBFS - SNR is chosen from a uniform random distribution in this range
input_gain – dB (decibels) gain to apply to the incoming Audio before mixing with noise [default: 0 dB]
color – color of noise to add (see Audio.noise() color arg) options: “white”, “pink”, “brownian”, “brown”, “violet”, “blue”
Returns: Audio object with noise added
- opensoundscape.preprocess.action_functions.audio_add_noise(audio, noise_dB=-30, signal_dB=0, color='white')[source]
Generates noise and adds to audio object
- Parameters:
audio – an Audio object
noise_dB – number or range: dBFS of noise signal generated - if number, crates noise with dB dBFS level - if (min,max) tuple, chooses noise dBFS randomly from range with a uniform distribution
signal_dB – dB (decibels) gain to apply to the incoming Audio before mixing with noise [default: -3 dB] - like noise_dB, can specify (min,max) tuple to use random uniform choice in range
Returns: Audio object with noise added
- opensoundscape.preprocess.action_functions.audio_random_gain(audio, dB_range=(-30, 0), clip_range=(-1, 1))[source]
Applies a randomly selected gain level to an Audio object
Gain is selected from a uniform distribution in the range dB_range
- Parameters:
audio – an Audio object
dB_range – (min,max) decibels of gain to apply - dB gain applied is chosen from a uniform random distribution in this range
Returns: Audio object with gain applied
- opensoundscape.preprocess.action_functions.audio_time_mask(audio, max_masks=10, max_width=0.02, noise_to_signal_dB=10, noise_color='white')[source]
randomly replace time slices with noise
Adaptively selects noise level relative to the signal level of the input audio
- Parameters:
audio – input Audio object
max_masks – maximum number of white noise time masks [default: 10]
max_width – maximum size of bars as fraction of sample width [default: 0.02]
noise_to_signal_dB – desired noise:signal ratio in dB. Positive values mean noise is louder than signal, negative values mean noise is quieter. Signal level is calculated as audio.dBFS ie the temporal average level.
noise_color – see Audio.noise() dBFS and color args
- Returns:
augmented Audio object
- opensoundscape.preprocess.action_functions.frequency_mask(tensor, max_masks=3, max_width=0.05)[source]
add random horizontal bars over Tensor
- Parameters:
tensor – input Torch.tensor sample
max_masks – max number of horizontal bars [default: 3]
max_width – maximum size of horizontal bars as fraction of sample height
- Returns:
augmented tensor
- opensoundscape.preprocess.action_functions.image_to_tensor(img, greyscale=False)[source]
Convert PIL image to RGB or greyscale Tensor (PIL.Image -> Tensor)
convert PIL.Image w/range [0,255] to torch Tensor w/range [0,1]
- Parameters:
img – PIL.Image
greyscale – if False, converts image to RGB (3 channels). If True, converts image to one channel.
- opensoundscape.preprocess.action_functions.list_action_fns()[source]
return list of available action function keyword strings (can be used to initialize Action class)
- opensoundscape.preprocess.action_functions.random_lowpass(audio, cutoff_range=(3000, 9000), probability=0.5, order_range=(1, 1))[source]
randomly apply lowpass filter to audio
- Parameters:
audio – an Audio object
cutoff_range – (min,max) frequency range in Hz for cutoff frequency - cutoff frequency is chosen randomly from this range with uniform distribution
probability – probability of applying the lowpass filter
order_range – (min,max) range of filter orders to choose from - order is chosen randomly from this range with uniform distribution - higher order = steeper filter rolloff; default 1 = gentle rolloff 2 already creates steep enough rollof to eliminate most high freq content
- Returns:
Audio object, possibly lowpass filtered
- opensoundscape.preprocess.action_functions.random_wrap_audio(audio, probability=0.5, max_shift=None)[source]
Randomly splits the audio into two parts, swapping their order
useful as a “time shift” augmentation when extra audio beyond the bounds is not available
- Parameters:
audio – an Audio object
probability – probability of performing the augmentation
max_shift – max number of seconds to shift, default None means no limit
- opensoundscape.preprocess.action_functions.register_action_fn(action_fn)[source]
add function to ACTION_FN_DICT
this allows us to recreate the Action class with a named action_fn
see also: ACTION_DICT (stores list of named classes for preprocessing)
- opensoundscape.preprocess.action_functions.scale_tensor(tensor, input_mean=0.5, input_std=0.5)[source]
linear scaling of tensor values using torch.transforms.Normalize
(Tensor->Tensor)
WARNING: This does not perform per-image normalization. Instead, it takes as arguments a fixed u and s, ie for the entire dataset, and performs X=(X-input_mean)/input_std.
- Parameters:
input_mean – mean of input sample pixels (average across dataset)
input_std – standard deviation of input sample pixels (average across dataset)
sd ((these are NOT the target mu and)
img (but the original mu and sd of)
mu=0 (for which the output will have)
std=1)
- Returns:
modified tensor
- opensoundscape.preprocess.action_functions.tensor_add_noise(tensor, std=1)[source]
Add gaussian noise to sample (Tensor -> Tensor)
- Parameters:
std – standard deviation for Gaussian noise [default: 1]
Note: be aware that scaling before/after this action will change the effect of a fixed stdev Gaussian noise
- opensoundscape.preprocess.action_functions.time_mask(tensor, max_masks=4, max_width=0.05)[source]
add random vertical bars over sample (Tensor -> Tensor)
- Parameters:
tensor – input Torch.tensor sample
max_masks – maximum number of vertical bars [default: 3]
max_width – maximum size of bars as fraction of sample width
- Returns:
augmented tensor
- opensoundscape.preprocess.action_functions.torch_color_jitter(tensor, brightness=0.3, contrast=0.3, saturation=0.3, hue=0)[source]
Wraps torchvision.transforms.ColorJitter
(Tensor -> Tensor) or (PIL Img -> PIL Img)
- Parameters:
tensor – input sample
brightness=0.3
contrast=0.3
saturation=0.3
hue=0
- Returns:
modified tensor
- opensoundscape.preprocess.action_functions.torch_random_affine(tensor, degrees=0, translate=(0.2, 0.05), fill=0)[source]
Wraps for torchvision.transforms.RandomAffine
(Tensor -> Tensor) or (PIL Img -> PIL Img)
- Parameters:
tensor – torch.Tensor input saple
0 (degrees =)
= (translate)
0-255 (fill =)
channels (duplicated across)
- Returns:
modified tensor
Note: If applying per-image normalization, we recommend applying RandomAffine after image normalization. In this case, an intermediate gray value is ~0. If normalization is applied after RandomAffine on a PIL image, use an intermediate fill color such as (122,122,122).
opensoundscape.preprocess.actions module
Actions for augmentation and preprocessing pipelines
This module contains Action classes which act as the elements in Preprocessor pipelines. Action classes have __call__() method that operates on an audio sample, using the .params dictionary of parameter values. They take a single sample of a specific type and return the transformed or augmented sample, which may or may not be the same type as the original.
See the action_functions.py module for functions that can be used to create actions using the Action class. Pass the Action class any function to the action_fn argument, and pass additional arguments to set parameters of the Action’s .params dictionary.
Note on converting to/from dictionary/json/yaml: This will break if you use non-built-in preprocessing operations. However, will work if you provide any custom functions/classes and decorate them with @register_action_cls or @register_action_fn. See the docstring of action_from_dict() for examples.
See the preprocessor module and Preprocessing tutorial for details on how to use and create your own actions.
- class opensoundscape.preprocess.actions.Action(fn, is_augmentation=False, **kwargs)[source]
Bases:
BaseActionAction class for an arbitrary function
The function must take the sample as the first argument
Note that this allows two use cases: (A) regular function that takes an input object as first argument
eg. Audio.from_file(path,**kwargs)
method of a class, which takes ‘self’ as the first argument, eg. Spectrogram.bandpass(self,**kwargs)
Other arguments are an arbitrary list of kwargs.
- class opensoundscape.preprocess.actions.AudioClipLoader(out_of_bounds_mode='ignore', **kwargs)[source]
Bases:
ActionAction to load clips from an audio file
Loads an audio file or part of a file to an Audio object. Will load entire audio file if sample.start_time and sample.duration are None. If sample.start_time and sample.duration are provided, loads the audio only in the specified interval.
see Audio.from_file() for documentation.
- Parameters:
Audio.from_file() (see)
- class opensoundscape.preprocess.actions.AudioToSamplesTensor(is_augmentation=False)[source]
Bases:
BaseActionextract Audio.samples to a PyTorch tensor and add channel dimensions
- class opensoundscape.preprocess.actions.AudioToTensor(is_augmentation=False)[source]
Bases:
BaseAction
- class opensoundscape.preprocess.actions.AudioTrim(**kwargs)[source]
Bases:
ActionAction to trim/extend audio to desired length
- Parameters:
actions.trim_audio() (see)
- class opensoundscape.preprocess.actions.BaseAction(is_augmentation=False)[source]
Bases:
objectParent class for all Actions (used in Preprocessor pipelines)
New actions should subclass this class (or Action for pre-wired functionality).
- class opensoundscape.preprocess.actions.MelScale(*args: Any, **kwargs: Any)[source]
Bases:
MelScalePatch of torchaudio.transforms.MelScale that saves n_stft attribute
This allows re-loading from a dictionary with the correct n_stft value.
- class opensoundscape.preprocess.actions.SpectrogramToTensor(fn=<function Spectrogram.to_image>, is_augmentation=False, **kwargs)[source]
Bases:
ActionAction to create Tesnsor of desired shape from Spectrogram
calls .to_image on sample.data, which should be type Spectrogram
**kwargs are passed to Spectrogram.to_image()
- class opensoundscape.preprocess.actions.TorchCropFreq(*args: Any, **kwargs: Any)[source]
Bases:
Module
- class opensoundscape.preprocess.actions.TorchTransforms(transforms)[source]
Bases:
BaseActionAction to apply torchvision transforms to sample
- Parameters:
transforms – list of torchvision transform objects to apply in sequence see https://pytorch.org/vision/stable/transforms.html and https://pytorch.org/audio/stable/transforms.html
- to_dict(ignore_attributes=())[source]
export the composed transforms and their parameters to a dictionary
useful for saving to JSON
Will fail if any of the transforms or their parameters are not serializable.
- property transforms
- opensoundscape.preprocess.actions.action_from_dict(dict)[source]
load an action from a dictionary
- Parameters:
dict – dictionary created with Action.to_dict() - contains keys ‘class’, ‘params’, and other keys for object attributes
Note: if the dictionary names a ‘class’ or ‘action_fn’ that is not built-in to OpenSoundscape, you should define the class/action in your code and add the decorator @register_action_cls or @register_action_fn
For instance, if we used the Action class and passed a custom action_fn: @register_action_fn def my_special_sauce(…):
…
Now we can use action_from_dict() to re-create an action that specifies ‘action_fn’:’__main__.my_special_sauce’
Similarly, say we defined a custom class in a module my_utils.py, we add the decorator before the class definition: @register_action_cls class Magic(BaseAction):
…
now we can use action_from_dict() to re-create the class from a dictionary that has ‘class’ : ‘my_utils.Magic’
- opensoundscape.preprocess.actions.deserialize_transform(transform_dict)[source]
Recreate a transform from a serialized dict.
- opensoundscape.preprocess.actions.list_actions()[source]
return list of available Action class keyword strings
- opensoundscape.preprocess.actions.serialize_transform(transform)[source]
Convert a torchvision/torchaudio transform object into a JSON-serializable dict.
- opensoundscape.preprocess.actions.trim_audio(sample, target_duration, extend=True, random_trim=False, tol=1e-10)[source]
trim audio clips from t=0 or random position (Audio -> Audio)
Trims an audio file to desired length.
Allows audio to be trimmed from start or from a random time
Optionally extends audio shorter than clip_length to sample.duration by appending silence.
- Parameters:
sample – AudioSample with .data=Audio object, .duration as sample duration
target_duration – length of resulting clip in seconds. If None, no trimming is performed.
extend – if True, clips shorter than sample.duration are extended with silence to required length [Default: True]
random_trim – if True, chooses a random segment of length sample.duration from the input audio. If False, the file is trimmed from 0 seconds to sample.duration seconds. [Default: False]
tol – tolerance for considering a clip to be long enough (sec), when raising an error for short clips [Default: 1e-6]
- Effects:
Updates the sample’s .data, .start_time, and .duration attributes
opensoundscape.preprocess.img_augment module
Transforms and augmentations for PIL.Images
opensoundscape.preprocess.io module
utilities for serializing, reading, and writing Action and Preprocessor objects to/from files and dictionaries
- class opensoundscape.preprocess.io.CustomYamlDumper(*args: Any, **kwargs: Any)[source]
Bases:
Dumper
- class opensoundscape.preprocess.io.CustomYamlLoader(*args: Any, **kwargs: Any)[source]
Bases:
Loader
- class opensoundscape.preprocess.io.NumpyTypeDecoder(*args, **kwargs)[source]
Bases:
JSONDecoderrecursively modify dictionary to change “numpy_dtype_…” strings to numpy dtypes
See also: NumpyTypeEncoder
- class opensoundscape.preprocess.io.NumpyTypeEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]
Bases:
JSONEncoderreplace numpy dtypes with strings & prefix numpy_dtype_
otherwise, can’t serialize numpy dtypes as the value in a dictionary
- default(obj)[source]
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return super().default(o)
opensoundscape.preprocess.overlay module
- class opensoundscape.preprocess.overlay.Overlay(overlay_samples, break_on_key, is_augmentation=True, sample_duration=None, **kwargs)[source]
Bases:
BaseActionAction Class for augmentation that overlays samples on eachother
Overlay is a flavor of “mixup” augmentation, where two samples are overlayed on top of eachother. The samples are blended with a weighted average, where the weight may be chosen randomly from a range of values.
In this implementation, the overlayed samples are chosen from a dataframe of audio files and labels. The dataframe must have the audio file paths as the index, and the labels as columns. The labels are used to choose overlayed samples based on an “overlay_class” argument.
- Parameters:
overlay_samples – list or dataframe of audio files (index) and labels to use for overlay
update_labels (bool) – if True, labels of sample are updated to include labels of overlayed sample
criterion_fn – function that takes AudioSample and returns True or False - if True, perform overlay - if False, do not perform overlay Default is always_true, perform overlay on all samples
values (See overlay() for **kwargs and default) –
- opensoundscape.preprocess.overlay.overlay(sample, overlay_df, update_labels, break_on_key, overlay_class=None, overlay_prob=1, max_overlay_num=1, overlay_weight=0.5, criterion_fn=<function always_true>)[source]
iteratively overlay 2d samples on top of eachother
Overlays (blends) image-like samples from overlay_df on top of the sample with probability overlay_prob until stopping condition. If necessary, trims overlay audio to the length of the input audio.
Optionally provide criterion_fn which takes sample and returns True/False to determine whether to perform overlay on this sample.
- Overlays can be used in a few general ways:
a separate df where any file can be overlayed (overlay_class=None)
- same df as training, where the overlay class is “different” ie,
does not contain overlapping labels with the original sample
- same df as training, where samples from a specific class are used
for overlays
- Parameters:
sample – AudioSample with .labels: labels of the original sample and .preprocessor: the preprocessing pipeline
overlay_df – a labels dataframe with audio files as the index and classes as columns
update_labels – if True, add overlayed sample’s labels to original sample
break_on_key – the name of a preprocessing step to stop at when preprocessing the overlayed samples (typically this action’s name)
overlay_class –
how to choose files from overlay_df to overlay Options [default: None]: None - Randomly select any file from overlay_df “different” - Select a random file from overlay_df containing none
of the classes this file contains
specific class name - always choose files from this class
overlay_prob – the probability of applying each subsequent overlay
max_overlay_num –
the maximum number of samples to overlay on original - for example, if overlay_prob = 0.5 and max_overlay_num=2,
1/2 of samples will recieve 1 overlay and 1/4 will recieve an additional second overlay
overlay_weight – a float > 0 and < 1, or a list of 2 floats [min, max] between which the weight will be randomly chosen. e.g. [0.1,0.7] An overlay_weight <0.5 means more emphasis on original sample.
criterion_fn – function that takes AudioSample and returns True or False - if True, perform overlay - if False, do not perform overlay Default is always_true, perform overlay on all samples
- Returns:
overlayed sample, (possibly updated) labels
Example
check if sample is from a xeno canto file (has “XC” in name), and only perform overlay on xeno canto files ``` def is_xc(audio_sample):
return “XC” in Path(audio_sample.source).stem
s=overlay(s, overlay_df, break_on_key=”overlay”, is_augmentation=True, criterion_fn=is_xc) ```
opensoundscape.preprocess.preprocessors module
Preprocessor classes: tools for preparing and augmenting audio samples
- class opensoundscape.preprocess.preprocessors.AudioAugmentationPreprocessor(**kwargs)[source]
Bases:
AudioPreprocessorAudioPreprocessor that applies augmentations to audio samples during training
- class opensoundscape.preprocess.preprocessors.AudioPreprocessor(sample_duration, sample_rate, extend_short_clips=True)[source]
Bases:
BasePreprocessorChild of BasePreprocessor that only loads audio and resamples
- Parameters:
sample_duration – length in seconds of audio samples generated
sample_rate – target sample rate. [default: None] does not resample
extend_short_clips – if True, clips shorter than sample_duration are extended to sample_duration by adding silence.
- class opensoundscape.preprocess.preprocessors.BasePreprocessor(sample_duration=None, sample_rate=None)[source]
Bases:
objectClass for defining an ordered set of Actions and a way to run them
Custom Preprocessor classes should subclass this class or its children
Preprocessors have one job: to transform samples from some input (eg a file path) to some output (eg an AudioSample with .data as torch.Tensor) using a specific procedure defined by the .pipeline attribute. The procedure consists of Actions ordered by the Preprocessor’s .pipeline. Preprocessors have a forward() method which sequentially applies the Actions in the pipeline to produce a sample.
- Parameters:
sample_duration – length of audio samples to generate (seconds)
- forward(sample, break_on_type=None, break_on_key=None, bypass_augmentations=False, trace=False, profile=False)[source]
perform actions in self.pipeline on a sample (until a break point)
Actions with .bypass = True are skipped. Actions with .is_augmentation = True can be skipped by passing bypass_augmentations=True.
- Parameters:
sample – any of - (path, start time) tuple - pd.Series with (file, start_time, end_time) as .name (eg index of a pd.DataFrame from which row was taken) - AudioSample object
break_on_type – if not None, the pipeline will be stopped when it reaches an Action of this class. The matching action is not performed.
break_on_key – if not None, the pipeline will be stopped when it reaches an Action whose index equals this value. The matching action is not performed.
clip_times –
can be either - None: the file is treated as a single sample - dictionary {“start_time”:float,”end_time”:float}:
the start and end time of clip in audio
bypass_augmentations – if True, actions with .is_augmentatino=True are skipped
trace (boolean - default False) – if True, saves the output of each pipeline step in the sample_info output argument Can be used for analysis/debugging of intermediate values of the sample during preprocessing
profile (boolean - default False) – if True, saves the runtime of each pipeline step in .runtime (a series indexed like .pipeline)
- Returns:
sample (instance of AudioSample class)
- classmethod from_json(path)[source]
load preprocessor from a json file
for instance, file created with .save_json()
- classmethod from_yaml(path)[source]
load preprocessor from a YAML file
for instance, file created with .save_yaml()
note that safe_load is not used, so make sure you trust the author of the file
- Parameters:
path – path to the .yaml file
- Returns:
instance of a preprocessor class
- Return type:
- insert_action(action_index, action, after_key=None, before_key=None)[source]
insert an action in specific specific position
This is an in-place operation
Inserts a new action before or after a specific key. If after_key and before_key are both None, action is appended to the end of the index.
- Parameters:
action_index – string key for new action in index
action – the action object, must be subclass of BaseAction
after_key – insert the action immediately after this key in index
before_key – insert the action immediately before this key in index Note: only one of (after_key, before_key) can be specified
- remove_action(action_index)[source]
alias for self.drop(…,inplace=True), removes an action
This is an in-place operation
- Parameters:
action_index – index of action to remove
- save(path)[source]
save preprocessor to a file
- Parameters:
path – path to the file, with .json or .yaml extension
- save_json(path)[source]
save preprocessor to a json file
re-load with load_json(path) or .from_json(path)
- class opensoundscape.preprocess.preprocessors.NoiseReduceAudioPreprocessor(sample_duration, sample_rate, extend_short_clips=True, noisereduce_kwargs=None)[source]
Bases:
AudioPreprocessor
- class opensoundscape.preprocess.preprocessors.NoiseReduceSpectrogramPreprocessor(sample_duration, sample_rate, overlay_samples=None, height=None, width=None, channels=1, noisereduce_kwargs=None)[source]
Bases:
SpectrogramPreprocessor
- class opensoundscape.preprocess.preprocessors.PCENPreprocessor(*args, **kwargs)[source]
Bases:
SpectrogramPreprocessor
- class opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor(sample_duration, sample_rate, overlay_samples=None, height=None, width=None, channels=1, bandpass_range=None, use_legacy_spectrogram=False)[source]
Bases:
BasePreprocessorChild of BasePreprocessor that creates specrogram Tensors w/augmentation
loads audio, creates spectrogram, performs augmentations, creates tensor
by default, does not resample audio, but bandpasses to 0-11.025 kHz (to ensure all outputs have same scale in y-axis) can change with .pipeline.bandpass.set(min_f=,max_f=)
- Parameters:
sample_duration – length in seconds of audio samples generated If not None, longer clips are trimmed to this length. By default, shorter clips will be extended (modify random_trim_audio and trim_audio to change behavior).
sample_rate – target sample rate. if None, does not resample
overlay_samples – if not None, will include an overlay (mixup) action samples can be a dataframe of file/start/end times or a set of audio files
height – height of output sample (frequency axis) - default None will use the original height of the spectrogram
width – width of output sample (time axis) - default None will use the original width of the spectrogram
channels – number of channels in output sample (default 1)
bandpass_range –
tuple (min_f, max_f) in Hz for cropping spectrogram frequency axis - default None retains full frequency range (0 - sample_rate/2 Hz) - if sample_rate is None and input audio can be multiple sample rates,
bandpass_range should be used to ensure specs have a consistent frequency range
- class opensoundscape.preprocess.preprocessors.TorchSpectrogramPreprocessor(sample_duration, sample_rate, overlay_samples=None, torch_transforms=None, spec_nfft=512, spec_window_length=None, spec_hop_length=None, lower_dB_range=-80, bandpass_range=None, rescale_mean_sd=None, resize_ft=None, n_mels=None)[source]
Bases:
BasePreprocessorSpectrogram Preprocessor using torchvision.transforms for export to ONNX
- opensoundscape.preprocess.preprocessors.load(path)[source]
load preprocessor from a file (json or yaml)
use to load preprocessor definitions saved with .save()
- Parameters:
path – path to the file
- Returns:
instance of a preprocessor class
- Return type:
- opensoundscape.preprocess.preprocessors.load_json(path)[source]
load preprocessor from a json file
for instance, file created with .save_json()
- opensoundscape.preprocess.preprocessors.load_yaml(path)[source]
load preprocessor from a YAML file
for instance, file created with .save_yaml()
- Parameters:
path – path to the .yaml file
- Returns:
instance of a preprocessor class
- Return type:
- opensoundscape.preprocess.preprocessors.preprocessor_from_dict(dict)[source]
load a preprocessor from a dictionary saved with pre.to_dict()
looks up class name using the “class” key in PREPROCESSOR_CLS_DICT requires that the class was decorated with @register_preprocessor_cls so that it is listed in PREPROCESSOR_CLS_DICT.
If you write a custom preprocessor class, you must decorate it with @register_preprocessor_cls so that it can be looked up by name during from_dict
- Parameters:
dict – dictionary created with a preprocessor class’s .to_dict() method
- Returns:
initialized preprocessor with same configuration and parameters as original - some caveats: Overlay augentation will not re-load fully, as overlay sample
dataframes and `criterion_fn`s are not saved
See also: BasePreprocessor.from_dict(), .save_json(), load_json()
opensoundscape.preprocess.tensor_augment module
Augmentations and transforms for torch.Tensors
- opensoundscape.preprocess.tensor_augment.freq_mask(spec, F=30, max_masks=3, replace_with_zero=False)[source]
draws horizontal bars over the image
- Parameters:
spec – a torch.Tensor representing a spectrogram
F – maximum frequency-width of bars in pixels
max_masks – maximum number of bars to draw
replace_with_zero – if True, bars are 0s, otherwise, mean img value
- Returns:
Augmented tensor
- opensoundscape.preprocess.tensor_augment.time_mask(spec, T=40, max_masks=3, replace_with_zero=False)[source]
draws vertical bars over the image
- Parameters:
spec – a torch.Tensor representing a spectrogram
T – maximum time-width of bars in pixels
max_masks – maximum number of bars to draw
replace_with_zero – if True, bars are 0s, otherwise, mean img value
- Returns:
Augmented tensor
opensoundscape.preprocess.utils module
Utilities for preprocessing
- exception opensoundscape.preprocess.utils.PreprocessingError[source]
Bases:
ExceptionCustom exception indicating that a Preprocessor pipeline failed
- opensoundscape.preprocess.utils.get_args(func)[source]
get list of arguments and default values from a function
ignores ‘kwargs’ argument, which is included in inspect.signature.parameters
- opensoundscape.preprocess.utils.get_reqd_args(func)[source]
get list of required arguments from a function
- opensoundscape.preprocess.utils.process_tensor_for_display(tensor, channel=None, normalize_from_range=[-1, 1], invert=False, clip=None)[source]
process tensor for display as image
Moves channel axis from first to third position, converts torch.Tensor to numpy array, rescales values from [min,max] to [0,1]
- Parameters:
tensor – torch.Tensor of shape [c,w,h]
channel – specify an integer to plot only one channel (axis 0) otherwise will return all channels
normalize_from_range – list of [min,max] values to normalize tensor from
invert – if true, flips value range via x=1-x
clip – if specified, tuple of (min,max) to clip values to after normalization
- Returns:
numpy array of shape [w,h] or [w,h,c]
- opensoundscape.preprocess.utils.show_tensor(tensor, channel=None, normalize_from_range=[-1, 1], invert=False, cmap=None, clip=[0, 1], axis=None)[source]
helper function for displaying a sample as an image
- Parameters:
tensor – torch.Tensor of shape [c,w,h] with values centered around zero
channel – specify an integer to plot only one channel, otherwise will attempt to plot all channels
transform_from_zero_centered – if True, transforms values from [-1,1] to [0,1]
invert – if true, flips value range via x=1-x
cmap – matplotlib colormap passed to plt.imshow() - if None, will choose ‘Greys’ if only one channel
clip – if specified, tuple of (min,max) to clip values to after normalization
axis – matplotlib axis to plot on, if None will create new figure
- opensoundscape.preprocess.utils.show_tensor_grid(tensors, columns, labels=None, channel=None, normalize_from_range=[-1, 1], invert=False, cmap=None, clip=[0, 1], axes=None, pad=0.05, gap=0.05, title_height=0.07)[source]
Create a tightly packed image grid of tensors.
- Parameters:
tensors – list of torch.Tensor objects to display
columns – number of columns in the grid
labels – optional list of titles for each tensor
channel – specify an integer to plot only one channel, otherwise will attempt to plot all channels
normalize_from_range – list of [min,max] values to normalize tensor from
invert – if true, flips value range via x=1-x
cmap – matplotlib colormap passed to plt.imshow() - if None, will choose ‘Greys’ if only one channel
clip – if specified, tuple of (min,max) to clip values to after normalization
axes – optional matplotlib axes to plot on, if None will create new figure
pad – outer margin around the grid (fraction of figure size)
gap – inner gap between images (fraction of figure size)
title_height – extra top margin for titles (fraction of figure size)
- Returns:
numpy array of matplotlib axes objects
- Return type:
axes