Preprocessing¶
Preprocessing¶
-
class
opensoundscape.preprocess.preprocessors.
AudioLoadingPreprocessor
(df, return_labels=True, audio_length=None)¶ creates Audio objects from file paths
Parameters: - df – dataframe of samples. df must have audio paths in the index. If df has labels, the class names should be the columns, and the values of each row should be 0 or 1. If data does not have labels, df will have no columns
- return_labels – if True, __getitem__ returns {“X”:batch_tensors,”y”:labels} if False, __getitem__ returns {“X”:batch_tensors} [default: True]
- audio_length – length in seconds of audio to return - None: do not trim the original audio - seconds (float): trim longer audio to this length. Shorter audio input will raise a ValueError.
-
class
opensoundscape.preprocess.preprocessors.
AudioToSpectrogramPreprocessor
(df, audio_length=None, out_shape=[224, 224], return_labels=True)¶ loads audio paths, creates spectrogram, returns tensor
by default, resamples audio to sr=22050 can change with .actions.load_audio.set(sample_rate=sr)
Parameters: - df – dataframe of samples. df must have audio paths in the index. If df has labels, the class names should be the columns, and the values of each row should be 0 or 1. If data does not have labels, df will have no columns
- audio_length – length in seconds of audio clips [default: None] If provided, longer clips trimmed to this length. By default, shorter clips will not be extended (modify actions.AudioTrimmer to change behavior).
- out_shape – output shape of tensor in pixels [default: [224,224]]
- return_labels – if True, the __getitem__ method will return {X:sample,y:labels} If False, the __getitem__ method will return {X:sample} If df has no labels (no columns), use return_labels=False [default: True]
-
class
opensoundscape.preprocess.preprocessors.
BasePreprocessor
(df, return_labels=True)¶ Base class for Preprocessing pipelines (use in place of torch Dataset)
Custom Preprocessor classes should subclass this class or its children
Parameters: - df – dataframe of samples. df must have audio paths in the index. If df has labels, the class names should be the columns, and the values of each row should be 0 or 1. If data does not have labels, df will have no columns
- return_labels – if True, the __getitem__ method will return {X:sample,y:labels} If False, the __getitem__ method will return {X:sample} If df has no labels (no columns), use return_labels=False [default: True]
Raises: PreprocessingError if exception is raised during __getitem__
-
class_counts_cal
()¶ count number of each label
-
head
(n=5)¶ out-of-place copy of first n samples
performs df.head(n) on self.df
Parameters: - n – number of first samples to return, see pandas.DataFrame.head()
- [default – 5]
Returns: a new dataset object
-
sample
(**kwargs)¶ out-of-place random sample
creates copy of object with n rows randomly sampled from dataframe
Args: see pandas.DataFrame.sample()
Returns: a new dataset object
-
class
opensoundscape.preprocess.preprocessors.
CnnPreprocessor
(df, audio_length=None, return_labels=True, debug=None, overlay_df=None, out_shape=[224, 224])¶ Child of AudioToSpectrogramPreprocessor with full augmentation pipeline
loads audio, creates spectrogram, performs augmentations, returns tensor
by default, resamples audio to sr=22050 can change with .actions.load_audio.set(sample_rate=sr)
Parameters: - df – dataframe of samples. df must have audio paths in the index. If df has labels, the class names should be the columns, and the values of each row should be 0 or 1. If data does not have labels, df will have no columns
- audio_length – length in seconds of audio clips [default: None] If provided, longer clips trimmed to this length. By default, shorter clips will not be extended (modify actions.AudioTrimmer to change behavior).
- out_shape – output shape of tensor in pixels [default: [224,224]]
- return_labels – if True, the __getitem__ method will return {X:sample,y:labels} If False, the __getitem__ method will return {X:sample} If df has no labels (no columns), use return_labels=False [default: True]
- debug – If a path is provided, generated samples (after all augmentation) will be saved to the path as an image. This is useful for checking that the sample provided to the model matches expectations. [default: None]
-
augmentation_off
()¶ use pipeline that skips all augmentations
-
augmentation_on
()¶ use pipeline containing all actions including augmentations
-
exception
opensoundscape.preprocess.utils.
PreprocessingError
¶ Custom exception indicating that a Preprocessor pipeline failed
Preprocessing Actions¶
Actions for augmentation and preprocessing pipelines
This module contains Action classes which act as the elements in Preprocessor pipelines. Action classes have go(), on(), off(), and set() methods. They take a single sample of a specific type and return the transformed or augmented sample, which may or may not be the same type as the original.
See the preprocessor module and Preprocessing tutorial for details on how to use and create your own actions.
-
class
opensoundscape.preprocess.actions.
ActionContainer
¶ this is a container object which holds instances of Action child-classes
the Actions it contains each have .go(), .on(), .off(), .set(), .get()
The actions are un-ordered and may not all be used. In preprocessor objects such as AudioToSpectrogramPreprocessor, Actions from the action container are listed in a pipeline(list), which defines their order of use.
To add actions to the container: action_container.loader = AudioLoader() To set parameters of actions: action_container.loader.set(param=value,…)
Methods: list_actions()
-
class
opensoundscape.preprocess.actions.
AudioLoader
(**kwargs)¶ Action child class for Audio.from_file() (path -> Audio)
Loads an audio file, see Audio.from_file() for documentation.
Parameters: - sample_rate (int, None) – resample audio with value and resample_type, if None use source sample_rate (default: None)
- resample_type – method used to resample_type (default: kaiser_fast)
- max_duration – the maximum length of an input file, None is no maximum (default: None)
Note: default sample_rate=None means use file’s sample rate, don’t resample
-
class
opensoundscape.preprocess.actions.
AudioToSpectrogram
(**kwargs)¶ Action child class for Audio.from_file() (Audio -> Spectrogram)
see spectrogram.Spectrogram.from_audio for documentation
Parameters: - window_type="hann" – see scipy.signal.spectrogram docs for description of window parameter
- window_samples=512 – number of audio samples per spectrogram window (pixel)
- overlap_samples=256 – number of samples shared by consecutive windows
- = (decibel_limits) –
- the dB values to (limit) –
- values set to min, higher values set to max) ((lower) –
-
class
opensoundscape.preprocess.actions.
AudioTrimmer
(**kwargs)¶ Action child class for trimming audio (Audio -> Audio)
Trims an audio file to desired length Allows audio to be trimmed from start or from a random time Optionally extends audio shorter than clip_length with silence
Parameters: - audio_length – desired final length (sec); if None, no trim is performed
- extend – if True, clips shorter than audio_length are extended with silence to required length
- random_trim – if True, a random segment of length audio_length is chosen from the input audio. If False, the file is trimmed from 0 seconds to audio_length seconds.
-
class
opensoundscape.preprocess.actions.
BaseAction
(**kwargs)¶ Parent class for all Actions (used in Preprocessor pipelines)
New actions should subclass this class.
Subclasses should set self.requires_labels = True if go() expects (X,y) instead of (X). y is a row of a dataframe (a pd.Series) with index (.name) = original file path, columns=class names, values=labels (0,1). X is the sample, and can be of various types (path, Audio, Spectrogram, Tensor, etc). See ImgOverlay for an example of an Action that uses labels.
-
class
opensoundscape.preprocess.actions.
FrequencyMask
(**kwargs)¶ add random horizontal bars over image
Parameters: - max_masks – max number of horizontal bars [default: 3]
- max_width – maximum size of horizontal bars as fraction of image height
-
go
(x)¶ torch Tensor in, torch Tensor out
-
class
opensoundscape.preprocess.actions.
ImgOverlay
(overlay_df, audio_length, loader_pipeline, update_labels, **kwargs)¶ iteratively overlay images on top of eachother
Overlays images from overlay_df on top of the sample with probability overlay_prob until stopping condition. If necessary, trims overlay audio to the length of the input audio. Overlays the images on top of each other with a weight.
- Overlays can be used in a few general ways:
- a separate df where any file can be overlayed (overlay_class=None)
- same df as training, where the overlay class is “different” ie,
- does not contain overlapping labels with the original sample
- same df as training, where samples from a specific class are used
- for overlays
Parameters: - overlay_df – a labels dataframe with audio files as the index and classes as columns
- audio_length – length in seconds of original audio sample
- loader_pipeline – the preprocessing pipeline to load audio -> spec
- update_labels – if True, add overlayed sample’s labels to original sample
- overlay_class –
how to choose files from overlay_df to overlay Options [default: “different”]: None - Randomly select any file from overlay_df “different” - Select a random file from overlay_df containing none
of the classes this file containsspecific class name - always choose files from this class
- overlay_prob – the probability of applying each subsequent overlay
- max_overlay_num –
the maximum number of samples to overlay on original - for example, if overlay_prob = 0.5 and max_overlay_num=2,
1/2 of images will recieve 1 overlay and 1/4 will recieve an additional second overlay - overlay_weight – can be a float between 0-1 or range of floats (chooses randomly from within range) such as [0.1,0.7]. An overlay_weight <0.5 means more emphasis on original image.
-
go
(x, x_labels)¶ Overlay images from overlay_df
-
class
opensoundscape.preprocess.actions.
ImgToTensor
(**kwargs)¶ Convert PIL image to RGB Tensor (PIL.Image -> Tensor)
convert PIL.Image w/range [0,255] to torch Tensor w/range [0,1] converts image to RGB (3 channels)
-
class
opensoundscape.preprocess.actions.
ImgToTensorGrayscale
(**kwargs)¶ Convert PIL image to greyscale Tensor (PIL.Image -> Tensor)
convert PIL.Image w/range [0,255] to torch Tensor w/range [0,1] converts image to grayscale (1 channel)
-
class
opensoundscape.preprocess.actions.
SaveTensorToDisk
(save_path, **kwargs)¶ save a torch Tensor to disk (Tensor -> Tensor)
Requires x_labels because the index of the label-row (.name) gives the original file name for this sample.
Uses torchvision.utils.save_image. Creates save_path dir if it doesn’t exist
Parameters: save_path – a directory where tensor will be saved -
go
(x, x_labels)¶ we require x_labels because the .name gives origin file name
-
-
class
opensoundscape.preprocess.actions.
SpecToImg
(**kwargs)¶ Action class to transform Spectrogram to PIL image
(Spectrogram -> PIL.Image)
Parameters: - destination – a file path (string)
- shape=None – tuple of image dimensions for 1 channel, eg (224,224)
- mode="RGB" – RGB for 3-channel color or “L” for 1-channel grayscale
- spec_range=[-100,-20] – the lowest and highest possible values in the spectrogram
-
class
opensoundscape.preprocess.actions.
SpectrogramBandpass
(**kwargs)¶ Action class for Spectrogram.bandpass() (Spectrogram -> Spectrogram)
see opensoundscape.spectrogram.Spectrogram.bandpass() for documentation
To bandpass the spectrogram from 1kHz to 5Khz: action = SpectrogramBandpass(1000,5000)
Parameters: - min_f – low frequency in Hz for bandpass
- max_f – high frequency in Hz for bandpass
-
class
opensoundscape.preprocess.actions.
TensorAddNoise
(**kwargs)¶ Add gaussian noise to sample (Tensor -> Tensor)
Parameters: std – standard deviation for Gaussian noise [default: 1] Note: be aware that scaling before/after this action will change the effect of a fixed stdev Gaussian noise
-
class
opensoundscape.preprocess.actions.
TensorAugment
(**kwargs)¶ combination of 3 augmentations with hard-coded parameters
time warp, time mask, and frequency mask
use (bool) time_warp, time_mask, freq_mask to turn each on/off
-
go
(x)¶ torch Tensor in, torch Tensor out
-
-
class
opensoundscape.preprocess.actions.
TensorNormalize
(**kwargs)¶ torchvision.transforms.Normalize (WARNING: FIXED shift and scale)
(Tensor->Tensor)
WARNING: This does not perform per-image normalization. Instead, it takes as arguments a fixed u and s, ie for the entire dataset, and performs X=(X-u)/s.
- Params:
- mean=0.5 std=0.5
-
class
opensoundscape.preprocess.actions.
TimeMask
(**kwargs)¶ add random vertical bars over image (Tensor -> Tensor)
Parameters: - max_masks – maximum number of bars [default: 3]
- max_width – maximum width of horizontal bars as fraction of image width
- [default – 0.2]
-
class
opensoundscape.preprocess.actions.
TimeWarp
(**kwargs)¶ Time warp is an experimental augmentation that creates a tilted image.
Parameters: warp_amount – use higher values for more skew and offset (experimental)
-
class
opensoundscape.preprocess.actions.
TorchColorJitter
(**kwargs)¶ Action class for torchvision.transforms.ColorJitter
(Tensor -> Tensor) or (PIL Img -> PIL Img)
Parameters: - brightness=0.3 –
- contrast=0.3 –
- saturation=0.3 –
- hue=0 –
-
class
opensoundscape.preprocess.actions.
TorchRandomAffine
(**kwargs)¶ Action class for torchvision.transforms.RandomAffine
(Tensor -> Tensor) or (PIL Img -> PIL Img)
Parameters: - = 0 (degrees) –
- = (fill) –
- = –
Note: If applying per-image normalization, we recommend applying RandomAffine after image normalization. In this case, an intermediate gray value is ~0. If normalization is applied after RandomAffine on a PIL image, use an intermediate fill color such as (122,122,122).
Image Augmentation¶
Transforms and augmentations for PIL.Images
-
opensoundscape.preprocess.img_augment.
time_split
(img, seed=None)¶ Given a PIL.Image, split into left/right parts and swap
Randomly chooses the slicing location For example, if h chosen
- abcdefghijklmnop
- ^
hijklmnop + abcdefg
Parameters: img – A PIL.Image Returns: A PIL.Image
Tensor Augmentation¶
Augmentations and transforms for torch.Tensors
These functions were implemented for PyTorch in: https://github.com/zcaceres/spec_augment The original paper is available on https://arxiv.org/abs/1904.08779
-
opensoundscape.preprocess.tensor_augment.
freq_mask
(spec, F=30, max_masks=3, replace_with_zero=False)¶ draws horizontal bars over the image
F:maximum frequency-width of bars in pixels
max_masks: maximum number of bars to draw
replace_with_zero: if True, bars are 0s, otherwise, mean img value
-
opensoundscape.preprocess.tensor_augment.
time_mask
(spec, T=40, max_masks=3, replace_with_zero=False)¶ draws vertical bars over the image
T:maximum time-width of bars in pixels
max_masks: maximum number of bars to draw
replace_with_zero: if True, bars are 0s, otherwise, mean img value
-
opensoundscape.preprocess.tensor_augment.
time_warp
(spec, W=5)¶ apply time stretch and shearing to spectrogram
fills empty space on right side with horizontal bars
W controls amount of warping. Random with occasional large warp.