opensoundscape.ml package
Submodules
opensoundscape.ml.cam module
Class activation maps (CAM) for OpenSoundscape models
- class opensoundscape.ml.cam.CAM(base_image, activation_maps=None, gbp_maps=None)[source]
Bases:
object
Object to hold and view Class Activation Maps, including guided backprop
Stores activation maps as .activation_maps, and guided backprop as .gbp_cams
each is a Series indexed by class
- create_rgb_heatmaps(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), gbp_normalization_q=99)[source]
create rgb numpy array of heatmaps overlaid on the sample
Can choose a subset of classes and activation/backprop modes
- Parameters:
class_subset – iterable of classes to visualize with activation maps - default None plots all classes - each item must be in the index of self.gbp_map / self.activation_maps - note that a class None is created by cnn.generate_cams() when classes are not specified during CNN.generate_cams()
mode – str selecting which maps to visualize, one of: ‘activation’ [default]: overlay activation map ‘backprop’: overlay guided back propogation result ‘backprop_and_activation’: overlay product of both maps None: do not overlay anything on the original sample
show_base – if False, does not plot the image of the original sample [default: True]
alpha – opacity of the activation map overlap [default: 0.5]
color_cycle – iterable of colors activation maps - cycles through the list using one color per class
gbp_normalization_q – guided backprop is normalized such that the q’th percentile of the map is 1. [default: 99]. This helps avoid gbp maps that are too dark to see. Lower values make brighter and noiser maps, higher values make darker and smoother maps.
- Returns:
numpy array of shape [w, h, 3] representing the image with CAM heatmaps if mode is None, returns the original sample if show_base is False, returns just the heatmaps if mode is None _and_ show_base is False, returns None
- plot(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), figsize=None, plt_show=True, save_path=None, gbp_normalization_q=99)[source]
Plot per-class activation maps, guided back propogations, or their products
Do not pass both mode=None and show_base=False.
- Parameters:
class_subset – see create_rgb_heatmaps
mode – see create_rgb_heatmaps
show_base – see create_rgb_heatmaps
alpha – see create_rgb_heatmaps
color_cycle – see create_rgb_heatmaps
gbp_normalization_q – see create_rgb_heatmaps
figsize – the figure size for the plot [default: None]
plt_show – if True, runs plt.show() [default: True] - ignored if return_numpy=True
save_path – path to save image to [default: None does not save file]
- Returns:
(fig, ax) of matplotlib figure, or np.array if return_numpy=True
Note: if base_image does not have 3 channels, channels are averaged then copied across 3 RGB channels to create a greyscale image
- Note 2: If return_numpy is true, fig and ax are never created, it simply creates
a numpy array representing the image with the CAMs overlaid and returns it
opensoundscape.ml.cnn module
classes for pytorch machine learning models in opensoundscape
For tutorials, see notebooks on opensoundscape.org
- class opensoundscape.ml.cnn.BaseClassifier(*args: Any, **kwargs: Any)[source]
Bases:
SpectrogramClassifier
alias for SpectrogramClassifier
improves compatibility with older code / previous opso versions, which had a BaseClassifier class as a parent to the CNN class
- class opensoundscape.ml.cnn.BaseModule[source]
Bases:
object
- property classifier_params
return the parameters of the classifier layer of the network
override this method if the classifier parameters should be retrieved in a different way
- configure_optimizers(reset_optimizer=False, restart_scheduler=False)[source]
standard Lightning method to initialize an optimizer and learning rate scheduler
Lightning uses this function at the start of training. Weirdly it needs to return {“optimizer”: optimizer, “scheduler”: scheduler}.
Initializes the optimizer and learning rate scheduler using the parameters self.optimizer_params and self.scheduler_params, which are dictionaries with a key “class” and a key “kwargs” (containing a dictionary of keyword arguments to initialize the class with). We initialize the class with the kwargs and the appropriate first argument: optimizer=opt_cls(self.parameters(), **opt_kwargs) and scheduler=scheduler_cls(optimizer, **scheduler_kwargs)
You can also override this method and write one that returns {“optimizer”: optimizer, “scheduler”: scheduler}
Uses the attributes: - self.optimizer_params: dictionary with “class” key such as torch.optim.Adam,
and “kwargs”, dict of keyword args for class’s init
- self.scheduler_params: dictionary with “class” key such as
torch.optim.lr_scheduler.StepLR, and and “kwargs”, dict of keyword args for class’s init
- self.lr_scheduler_step: int, number of times lr_scheduler.step() has been called
can set to -1 to restart learning rate schedule
can set to another value to start lr scheduler from an arbitrary position
Note: when used by lightning, self.optimizer and self.scheduler should not be modified directly, lightning handles these internally. Lightning will call the method without passing reset_optimizer or restart_scheduler, so default=False results in not modifying .optimizer or .scheduler
Documentation: https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningModule.html#lightning.pytorch.core.LightningModule.configure_optimizers :param reset_optimizer: if True, initializes the optimizer from scratch even if self.optimizer is not None :param reset_scheduler: if True, initializes the scheduler from scratch even if self.scheduler is not None
- Returns:
dictionary with keys “optimizer” and “scheduler” containing the optimizer and learning rate scheduler objects to use during training
- inference_dataloader_cls
a DataLoader class to use for inference, defaults to SafeAudioDataloader
- loss_fn
specify a loss function to use for training, eg BCEWithLogitsLoss_hot,
by initializing a callable object or passing a function
- lr_scheduler_params
dictionary with “class” and “kwargs” to class.__init__
for example, to use Cosine Annealing, set: ```python model.lr_scheduler_params = {
“class”: torch.optim.lr_scheduler.CosineAnnealingLR, “kwargs”:{
“T_max”: n_epochs, “eta_min”: 1e-7, “last_epoch”:self.current_epoch-1
}
- type:
learning rate schedule
- network
a pytorch Module such as Resnet18 or a custom object
- optimizer_params
dictionary with “class” and “kwargs” to class.__init__
for example, to use Adam optimizer set: ```python my_instance.optimizer_params = {
“class”: torch.optim.Adam, “kwargs”: {
“lr”: 0.001, “weight_decay”: 0.0005,
},
}
- type:
optimizer settings
- predict_dataloader(samples, collate_fn=<function collate_audio_samples>, **kwargs)[source]
generate dataloader for inference (predict/validate/test)
- Args: see self.inference_dataloader_cls docstring for arguments
**kwargs: any arguments to pass to the DataLoader __init__ Note: these arguments are fixed and should not be passed in kwargs: - shuffle=False: retain original sample order
- preprocessor
an instance of BasePreprocessor or subclass that preprocesses audio samples into tensors
The preprocessor contains .pipline, and ordered set of Actions to run
preprocessor will have attributes .sample_duration (seconds) and .height, .width, .channels for output shape (input shape to self.network)
The pipeline can be modified by adding or removing actions, and by modifying parameters: ```python
my_obj.preprocessor.remove_action(‘add_noise’) my_obj.preprocessor.insert_action(‘add_noise’,Action(my_function),after_key=’frequency_mask’)
Or, the preprocessor can be replaced with a different or custom preprocessor, for instance: ```python from opensoundscape.preprocess import AudioPreprocessor my_obj.preprocessor = AudioPreprocessor(
sample_duration=5, sample_rate=22050
) # this preprocessor returns 1d arrays of the audio signal ```
- scheduler
torch.optim.lr_scheduler object for learning rate scheduling
- score_metric
choose one of the keys in self.torch_metrics to use as the overall score metric
this metric will be used to determine the best model during training
- torch_metrics
object pairs to compute metrics during training/validation
- Type:
specify torchmetrics “name”
- train_dataloader(samples, bypass_augmentations=False, collate_fn=<function collate_audio_samples>, **kwargs)[source]
generate dataloader for training
train_loader samples batches of images + labels from training set
- Args: see self.train_dataloader_cls docstring for arguments
**kwargs: any arguments to pass to the DataLoader __init__ Note: some arguments are fixed and should not be passed in kwargs: - shuffle=True: shuffle samples for training - bypass_augmentations=False: apply augmentations to training samples
- train_dataloader_cls
a DataLoader class to use for training, defaults to SafeAudioDataloader
- training_step(samples, batch_idx)[source]
a standard Lightning method used within the training loop, acting on each batch
returns loss
- Effects:
logs metrics and loss to the current logger
- use_amp
if True, uses automatic mixed precision for training
- class opensoundscape.ml.cnn.CNN(*args: Any, **kwargs: Any)[source]
Bases:
SpectrogramClassifier
alias for SpectrogramClassifier
improves comaptibility with older code / previous opso versions
- class opensoundscape.ml.cnn.InceptionV3(*args: Any, **kwargs: Any)[source]
Bases:
SpectrogramClassifier
Child of SpectrogramClassifier class for InceptionV3 architecture
- class opensoundscape.ml.cnn.SpectrogramClassifier(*args: Any, **kwargs: Any)[source]
Bases:
SpectrogramModule
,Module
- current_epoch
track number of trained epochs
- property device
- embed(samples, target_layer=None, progress_bar=True, return_preds=False, avgpool=True, return_dfs=True, audio_root=None, **dataloader_kwargs)[source]
Generate embeddings (intermediate layer outputs) for audio files/clips
Note: to capture embeddings on multiple layers, use self.__call__ with intermediate_layers argument directly. This wrapper only allows one target_layer.
Note: Output can be n-dimensional array (return_dfs=False) or pd.DataFrame with multi-index like .predict() (return_dfs=True). If avgpool=False, return_dfs is forced to False since we can’t create a DataFrame with >2 dimensions.
- Parameters:
samples – same as CNN.predict(): list of file paths, OR pd.DataFrame with index containing audio file paths, OR a pd.DataFrame with multi-index (file, start_time, end_time)
target_layers – layers from self.model._modules to extract outputs from - if None, attempts to use self.model.embedding_layer as default
progress_bar – bool, if True, shows a progress bar with tqdm [default: True]
return_preds – bool, if True, returns two outputs (embeddings, logits)
avgpool – bool, if True, applies global average pooling to intermediate outputs i.e. averages across all dimensions except first to get a 1D vector per sample
return_dfs – bool, if True, returns embeddings as pd.DataFrame with multi-index like .predict(). if False, returns np.array of embeddings [default: True]. If avg_pool=False, overrides to return np.array since we can’t have a df with >2 dimensions
audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files
self.predict_dataloader() (dataloader_kwargs are passed to)
- Returns: (embeddings, preds) if return_preds=True or embeddings if return_preds=False
types are pd.DataFrame if return_dfs=True, or np.array if return_dfs=False
- eval(targets=None, scores=None, reset_metrics=True)[source]
compute single-target or multi-target metrics from targets and scores
Or, compute metrics on accumulated values in the TorchMetrics if targets is None
By default, the overall model score is “map” (mean average precision) for multi-target models (self.single_target=False) and “f1” (average of f1 score across classes) for single-target models).
update self.torch_metrics to include the desired metrics
- Parameters:
targets – 0/1 for each sample and each class
None (- if targets is) – (using accumulated values)
self.torch_metrics (runs metric.compute() on each of) – (using accumulated values)
scores – continuous values in 0/1 for each sample and class
None
ignored (this is)
reset_metrics – if True, resets the metrics after computing them [default: True]
- Returns:
value)
- Return type:
dictionary of metrics (name
- Raises:
AssertionError – if targets are outside of range [0,1]
- generate_cams(samples, method='gradcam', classes=None, target_layers=None, guided_backprop=False, progress_bar=True, **kwargs)[source]
Generate a activation and/or backprop heatmaps for each sample
- Parameters:
samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths
method –
method to use for activation map. Can be str (choose from below) or a class of pytorch_grad_cam (any subclass of BaseCAM), or None if None, activation maps will not be created [default:’gradcam’]
- str can be any of the following:
”gradcam”: pytorch_grad_cam.GradCAM, “hirescam”: pytorch_grad_cam.HiResCAM, “scorecam”: pytorch_grad_cam.ScoreCAM, “gradcam++”: pytorch_grad_cam.GradCAMPlusPlus, “ablationcam”: pytorch_grad_cam.AblationCAM, “xgradcam”: pytorch_grad_cam.XGradCAM, “eigencam”: pytorch_grad_cam.EigenCAM, “eigengradcam”: pytorch_grad_cam.EigenGradCAM, “layercam”: pytorch_grad_cam.LayerCAM, “fullgrad”: pytorch_grad_cam.FullGrad, “gradcamelementwise”: pytorch_grad_cam.GradCAMElementWise,
classes (list) – list of classes, will create maps for each class [default: None] if None, creates an activation map for the highest scoring class on a sample-by-sample basis
target_layers (list) –
list of target layers for GradCAM - if None [default] attempts to use architecture’s default target_layer Note: only architectures created with opensoundscape 0.9.0+ will have a default target layer. See pytorch_grad_cam docs for suggestions. Note: if multiple layers are provided, the activations are merged across
layers (rather than returning separate activations per layer)
guided_backprop – bool [default: False] if True, performs guided backpropagation for each class in classes. AudioSamples will have attribute .gbp_maps, a pd.Series indexed by class name
SafeAudioDataloader (**kwargs are passed to) – (incl: batch_size, num_workers, split_file_into_clips, bypass_augmentations, raise_errors, overlap_fraction, final_clip, other DataLoader args)
- Returns:
a list of AudioSample objects with .cam attribute, an instance of the CAM class ( visualize with sample.cam.plot()). See the CAM class for more details
See pytorch_grad_cam documentation for references to the source of each method.
- generate_samples(samples, invalid_samples_log=None, return_invalid_samples=False, audio_root=None, **dataloader_kwargs)[source]
Generate AudioSample objects. Input options same as .predict()
- Parameters:
samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path as str or pathlib.Path
args (see .predict() documentation for other)
**dataloader_kwargs – any arguments to inference_dataloader_cls.__init__ except samples (uses samples) and collate_fn (uses identity) (Note: default class is SafeAudioDataloader)
- Returns:
a list of AudioSample objects - if return_invalid_samples is True, returns second value: list of paths to samples that failed to preprocess
Example:
` from opensoundscappe.preprocess.utils import show_tensor_grid samples = generate_samples(['/path/file1.wav','/path/file2.wav']) tensors = [s.data for s in samples] show_tensor_grid(tensors,columns=3) `
- classmethod load(path, unpickle=True)[source]
load a model saved using CNN.save()
- Parameters:
path – path to file saved using CNN.save()
unpickle – if True, passes weights_only=False to torch.load(). This is necessary if the model was saved with pickle=True, which saves the entire model object. If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]
- Returns:
new CNN instance
Note: Note that if you used pickle=True when saving, the model object might not load properly across different versions of OpenSoundscape.
- load_weights(path, strict=True)[source]
load network weights state dict from a file
For instance, load weights saved with .save_weights() in-place operation
- Parameters:
path – file path with saved weights
strict – (bool) see torch.load()
- log_file
specify a path to save output to a text file
- logging_level
amount of logging to self.log_file. 0 for nothing, 1,2,3 for increasing logged info
- loss_hist
mean batch loss during training
- Type:
dictionary of epoch
- name = 'SpectrogramClassifier'
- predict(samples, batch_size=1, num_workers=0, activation_layer=None, split_files_into_clips=True, clip_overlap=None, clip_overlap_fraction=None, clip_step=None, overlap_fraction=None, final_clip=None, bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, wandb_session=None, return_invalid_samples=False, progress_bar=True, audio_root=None, **dataloader_kwargs)[source]
Generate predictions on a set of samples
Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)
- Parameters:
samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path (str or pathlib.Path)
batch_size – Number of files to load simultaneously [default: 1]
num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]
activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]
split_files_into_clips – If True, internally splits and predicts on clips from longer audio files Otherwise, assumes each row of samples corresponds to one complete sample
clip_overlap_fraction – see opensoundscape.utils.generate_clip_times_df
clip_overlap – see opensoundscape.utils.generate_clip_times_df
clip_step – see opensoundscape.utils.generate_clip_times_df
final_clip – see opensoundscape.utils.generate_clip_times_df
overlap_fraction – deprecated alias for clip_overlap_fraction
bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.
invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.
raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log
wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb
return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]
progress_bar – bool, if True, shows a progress bar with tqdm [default: True]
audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files
**dataloader_kwargs – additional arguments to self.predict_dataloader()
- Returns:
df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess
- Effects:
(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. Afte prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.
(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file
- Note: if loading an audio file raises a PreprocessingError, the scores
for that sample will be np.nan
- run_validation(validation_df, progress_bar=True, **kwargs)[source]
run validation on a validation set
override this to customize the validation step eg, could run validation on multiple datasets and save performance of each in self.valid_metrics[current_epoch][validation_dataset_name]
- Parameters:
validation_df – dataframe of validation samples
progress_bar – if True, show a progress bar with tqdm
**kwargs – passed to self.predict_dataloader()
- Returns:
dictionary of evaluation metrics calculated with self.torch_metrics
- Return type:
metrics
- Effects:
updates self.valid_metrics[current_epoch] with metrics for the current epoch
- save(path, save_hooks=False, pickle=False)[source]
save model with weights using torch.save()
load from saved file with cnn.load_model(path)
- Parameters:
path – file path for saved model object
save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()
pickle – if True, saves the entire model object using torch.save() Note: if using pickle=True, entire object is pickled, which means that saving and loading model objects across OpenSoundscape versions might not work properly. pickle=True is useful for resuming training, because it retains the state of the optimizer, scheduler, loss function, etc pickle=False is recommended for saving models for inference/deployment/sharing [default: False]
- save_weights(path)[source]
save just the weights of the network
This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())
- Parameters:
path – location to save weights file
- train(train_df, validation_df=None, epochs=1, batch_size=1, num_workers=0, save_path='.', save_interval=1, log_interval=10, validation_interval=1, reset_optimizer=False, restart_scheduler=False, invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, progress_bar=True, audio_root=None, **dataloader_kwargs)[source]
train the model on samples from train_dataset
If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().
- Parameters:
train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)
validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]
epochs – number of epochs to train for (1 epoch constitutes 1 view of each training sample)
batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation
num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)
save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]
save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.
log_interval – interval in batches to print training loss/metrics
validation_interval – interval in epochs to test the model on the validation set Note that model will only update it’s best score and save best.model file on epochs that it performs validation.
reset_optimizer – if True, resets the optimizer rather than retaining state_dict of self.optimizer [default: False]
restart_scheduler – if True, resets the learning rate scheduler rather than retaining state_dict of self.scheduler [default: False]
invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file
raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log
wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example:
` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.train(...,wandb_session=session) session.finish() `
audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files
progress_bar – bool, if True, shows a progress bar with tqdm [default: True]
**dataloader_kwargs – additional arguments passed to train_dataloader()
- Effects:
If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.
- verbose
amount of logging to stdout. 0 for nothing, 1,2,3 for increasing printed output
- class opensoundscape.ml.cnn.SpectrogramModule(architecture, classes, sample_duration, single_target=False, preprocessor_dict=None, preprocessor_cls=<class 'opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor'>, **preprocessor_kwargs)[source]
Bases:
BaseModule
Parent class for both SpectrogramClassifier (pytorch) and LightningSpectrogramModule (lightning)
implements functionality that is shared between both pure PyTorch and Lightning classes/workflows
- Parameters:
architecture – a pytorch Module such as Resnet18 or a custom object
classes – list of class names
sample_duration – duration of audio samples in seconds
single_target – if True, predict only class with max score
channels – number of channels in input data
sample_height – height of input data
sample_width – width of input data
preprocessor_dict – dictionary defining preprocessor and parameters, can be generated with preprocessor.to_dict() if not None, will override other preprocessor arguments (sample_duration, sample_height, sample_width, channels)
preprocessor_cls – a class object that inherits from BasePreprocessor if preprocessor_dict is None, this class will be instantiated to set self.preprocessor
**preprocessor_kwargs – additional arguments to pass to the initialization of the preprocessor class this is ignored if preprocessor_dict is not None
- change_classes(new_classes)[source]
change the classes that the model predicts
replaces the network’s final linear classifier layer with a new layer with random weights and the correct number of output features
will raise an error if self.network.classifier_layer is not the name of a torch.nn.Linear layer, since we don’t know how to replace it otherwise
- Parameters:
new_classes – list of class names
- property classifier
return the classifier layer of the network, based on .network.classifier_layer string
- freeze_feature_extractor()[source]
freeze all layers except self.classifier
prepares the model for transfer learning where only the classifier is trained
uses the attribute self.network.classifier_layer (via the .classifier attribute) to identify the classifier layer
if this is not set will raise Exception - use freeze_layers_except() instead
- freeze_layers_except(train_layers=None)[source]
Freeze all parameters of a model except the parameters in the target_layer(s)
Freezing parameters means that the optimizer will not update the weights
Modifies the model in place!
- Parameters:
model – the model to freeze the parameters of
train_layers – layer or list/iterable of the layers whose parameters should not be frozen For example: pass model.classifier to train only the classifier
Example 1:
` freeze_all_layers_except(model, model.classifier) `
Example 2: freeze all but 2 layers
` freeze_all_layers_except(model, [model.layer1, model.layer2]) `
- lr_scheduler_step
track number of calls to lr_scheduler.step()
set to -1 to restart learning rate schedule from initial lr
this value is used to initialize the lr_scheduler’s last_epoch parameter it is tracked separately from self.current_epoch because the lr_scheduler might be stepped more or less than 1 time per epoch
Note that the initial learning rate is set via self.optimizer_params[‘kwargs’][‘lr’]
- network
a pytorch Module such as Resnet18 or a custom object
for convenience, __init__ also allows user to provide string matching a key from opensoundscape.ml.cnn_architectures.ARCH_DICT.
List options: opensoundscape.ml.cnn_architectures.list_architectures()
- property single_target
- opensoundscape.ml.cnn.list_model_classes()[source]
return list of available action function keyword strings (can be used to initialize Action class)
- opensoundscape.ml.cnn.load_model(path, device=None, unpickle=True)[source]
load a saved model object
This function handles models saved either as pickled objects or as a dictionary including weights, preprocessing parameters, architecture name, etc.
Note that pickled objects may not load properly across different versions of OpenSoundscape, while the dictionary format does not retain the full training state for resuming model training.
- Parameters:
path – file path of saved model
device – which device to load into, eg ‘cuda:1’ [default: None] will choose first gpu if available, otherwise cpu
unpickle – if True, passes weights_only=False to torch.load(). This is necessary if the
with`pickle=True` (model was saved) – If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]
object. (which saves the entire model) – If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]
- Returns:
a model object with loaded weights
- opensoundscape.ml.cnn.register_model_cls(model_cls)[source]
add class to MODEL_CLS_DICT
this allows us to recreate the class when loading saved model file with load_model()
- opensoundscape.ml.cnn.use_resample_loss(model, train_df)[source]
Modify a model to use ResampleLoss for multi-target training
ResampleLoss may perform better than BCE Loss for multitarget problems in some scenarios.
- Parameters:
model – CNN object
train_df – dataframe of labels, used to calculate class frequency
opensoundscape.ml.cnn_architectures module
Module to initialize PyTorch CNN architectures with custom output shape
This module allows the use of several built-in CNN architectures from PyTorch. The architecture refers to the specific layers and layer input/output shapes (including convolution sizes and strides, etc) - such as the ResNet18 or Inception V3 architecture.
We provide wrappers which modify the output layer to the desired shape (to match the number of classes). The way to change the output layer shape depends on the architecture, which is why we need a wrapper for each one. This code is based on pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
To use these wrappers, for example, if your model has 10 output classes, write
my_arch=resnet18(10)
Then you can initialize a model object from opensoundscape.ml.cnn with your architecture:
model=CNN(my_arch,classes,sample_duration)
or override an existing model’s architecture:
model.network = my_arch
Note: the InceptionV3 architecture must be used differently than other architectures - the easiest way is to simply use the InceptionV3 class in opensoundscape.ml.cnn.
- opensoundscape.ml.cnn_architectures.alexnet(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for AlexNet architecture
input size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.change_conv2d_channels(conv2d, num_channels=3, reuse_weights=True)[source]
Modify the number of input channels for a pytorch CNN
This function changes the input shape of a torch.nn.Conv2D layer to accommodate a different number of channels. It attempts to retain weights in the following manner: - If num_channels is less than the original, it will average weights across the original channels and apply them to all new channels. - if num_channels is greater than the original, it will cycle through the original channels, copying them to the new channels
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
num_channels – specify channels in input sample, eg [channels h,w] sample shape
reuse_weights – if True (default), averages (if num_channels<original)
through (or cycles) – and adds them to the new Conv2D
- opensoundscape.ml.cnn_architectures.change_fc_output_size(fc, num_classes)[source]
Modify the number of output nodes of a fully connected layer
- Parameters:
fc – the fully connected layer of the model that should be modified
num_classes – number of output nodes for the new fc
- opensoundscape.ml.cnn_architectures.densenet121(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for densenet121 architecture
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.efficientnet_b0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for efficientnet_b0 architecture
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- Note: in v0.10.2, changed from using NVIDIA/DeepLearningExamples:torchhub repo
implementatiuon to native pytorch implementation
- opensoundscape.ml.cnn_architectures.efficientnet_b4(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for efficientnet_b4 architecture
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- Note: in v0.10.2, changed from using NVIDIA/DeepLearningExamples:torchhub repo
implementatiuon to native pytorch implementation
- opensoundscape.ml.cnn_architectures.freeze_params(model)[source]
disable gradient updates for all model parameters
- opensoundscape.ml.cnn_architectures.generic_make_arch(constructor, weights, num_classes, embed_layer, cam_layer, name, input_conv2d_layer, linear_clf_layer, freeze_feature_extractor=False, num_channels=3)[source]
works when first layer is conv2d and last layer is fully-connected Linear
input_size = 224
- Parameters:
constructor – function that creates a torch.nn.Module and takes weights argument
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html Passed to constructor()
num_classes – number of output nodes for the final layer
embed_layer – specify which layers outputs should be accessed for “embeddings”
cam_layer – specify a default layer for GradCAM/etc visualizations
name – name of the architecture, used for the constructor_name attribute to re-load from saved version
input_conv2d_layer – name of first Conv2D layer that can be accessed with .get_submodule() string formatted as .-delimited list of attribute names or list indices, e.g. “features.0”
linear_clf_layer – name of final Linear classification fc layer that can be accessed with .get_submodule() string formatted as .-delimited list of attribute names or list indices, e.g. “classifier.0.fc”
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.inception_v3(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for Inception v3 architecture
Input: 229x229
WARNING: expects (299,299) sized images and has auxiliary output. See InceptionV3 class in opensoundscape.ml.cnn for use.
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.list_architectures()[source]
return list of available architecture keyword strings
- opensoundscape.ml.cnn_architectures.resnet101(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet101 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.resnet152(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet152 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.resnet18(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet18 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.resnet34(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet34 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.resnet50(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet50 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.set_layer_from_name(module, layer_name, new_layer)[source]
assign an attribute of an object using a string name
- Parameters:
module – object to assign attribute to
layer_name – string name of attribute to assign the attribute_name is formatted with . delimiter and can contain either attribute names or list indices e.g. “network.classifier.0.0.fc” sets network.classifier[0][0].fc this type of string is given by torch.nn.Module.named_modules()
new_layer – replace layer with this torch.nn.Module instance
also (see) – torch.nn.Module.named_modules(), torch.nn.Module.get_submodule()
- opensoundscape.ml.cnn_architectures.squeezenet1_0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for squeezenet architecture
input size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.unfreeze_params(model)[source]
enable gradient updates for all model parameters
- opensoundscape.ml.cnn_architectures.vgg11_bn(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for vgg11 architecture
input size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
opensoundscape.ml.dataloaders module
- class opensoundscape.ml.dataloaders.SafeAudioDataloader(*args: Any, **kwargs: Any)[source]
Bases:
DataLoader
Create DataLoader for inference, wrapping a SafeDataset
SafeDataset contains AudioFileDataset or AudioSampleDataset depending on sample type
During inference, we allow the user to pass any of 3 things to samples: - list of file paths - Dataframe with file as index - Dataframe with file, start_time, end_time of clips as index
If file as index, default split_files_into_clips=True means that it will automatically determine the number of clips that can be created from the file (with overlap between subsequent clips based on overlap_fraction)
- Parameters:
samples – any of the following: - list of file paths - Dataframe with file as index - Dataframe with file, start_time, end_time of clips as index - CategoricalLabels object
preprocessor – preprocessor object, eg AudioPreprocessor or SpectrogramPreprocessor
split_files_into_clips=True – use AudioSplittingDataset to automatically split audio files into appropriate-lengthed clips
clip_overlap_fraction – see opensoundscape.utils.generate_clip_times_df
clip_overlap – see opensoundscape.utils.generate_clip_times_df
clip_step – see opensoundscape.utils.generate_clip_times_df
final_clip – see opensoundscape.utils.generate_clip_times_df
overlap_fraction – deprecated alias for clip_overlap_fraction
bypass_augmentations – if True, don’t apply any augmentations [default: True]
raise_errors – if True, raise errors during preprocessing [default: False]
collate_fn – function to collate list of AudioSample objects into batches [default: idenitty] returns list of AudioSample objects, use collate_fn=opensoundscape.sample.collate_audio_samples to return a tuple of (data, labels) tensors
audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files
**kwargs – any arguments to torch.utils.data.DataLoader
- Returns:
DataLoader that returns lists of AudioSample objects when iterated (if collate_fn is identity)
- preprocessor
do not override or modify this attribute, as it will have no effect
- samples
do not override or modify this attribute, as it will have no effect
opensoundscape.ml.datasets module
Preprocessors: pd.Series child with an action sequence & forward method
- class opensoundscape.ml.datasets.AudioFileDataset(*args: Any, **kwargs: Any)[source]
Bases:
Dataset
Base class for audio datasets with OpenSoundscape (use in place of torch Dataset)
Custom Dataset classes should subclass this class or its children.
Datasets in OpenSoundscape contain a Preprocessor object which is responsible for the procedure of generating a sample for a given input. The DataLoader handles a dataframe of samples (and potentially labels) and uses a Preprocessor to generate samples from them.
- Parameters:
samples –
the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index of (path,start_time,end_time) per clip, OR - a list or np.ndarray of audio file paths
- Notes for input dataframe:
df must have audio paths in the index.
If label_df has labels, the class names should be the columns, and
- the values of each row should be 0 or 1.
If data does not have labels, label_df will have no columns
preprocessor – an object of BasePreprocessor or its children which defines the operations to perform on input samples
audio_root – optionally pass a root directory (pathlib.Path or str) to prepend to each file path - if None (default), samples must contain full paths to files
- Returns:
sample (AudioSample object)
- Raises:
PreprocessingError if exception is raised during __getitem__ –
- Effects:
- self.invalid_samples will contain a set of paths that did not successfully
produce a list of clips with start/end times, if split_files_into_clips=True
- audio_root
path to prepend to all audio file paths when loading
- bypass_augmentations
if True, skips Actions with .is_augmentation=True
- classes
list of classes to which multi-hot labels correspond
- classmethod from_categorical_df(categorical_labels, preprocessor, class_list, bypass_augmentations=False)[source]
Create AudioFileDataset from a DataFrame with a column listing categorical labels
e.g. where df[‘labels’] = [[‘a’,’b’], [], [‘a’,’c’]]
- Parameters:
categorical_labels – DataFrame with index (file) or (file, start_time, end_time) and ‘label’ column containing lists of labels or integers corresponding to class names
preprocessor – Preprocessor object
bypass_augmentations – if True, skip augmentations with .is_augmentation=True
- Returns:
AudioFileDataset object
- head(n=5)[source]
out-of-place copy of first n samples
performs df.head(n) on self.label_df
- Parameters:
n – number of first samples to return, see pandas.DataFrame.head()
[default – 5]
- Returns:
a new dataset object
- invalid_samples
set of file paths that raised exceptions during preprocessing
- label_df
dataframe containing file paths, clip times, and multi-hot labels (one column per class)
- preprocessor
Preprocessor object containing a .pipeline of ordered preprocessing operations
- class opensoundscape.ml.datasets.AudioSplittingDataset(*args: Any, **kwargs: Any)[source]
Bases:
AudioFileDataset
class to load clips of longer files rather than one sample per file
Internally creates even-lengthed clips split from long audio files.
If file labels are provided, applies copied labels to all clips from a file
NOTE: If you’ve already created a dataframe with clip start and end times, you can use AudioFileDataset. This class is only necessary if you wish to automatically split longer files into clips (providing only the file paths).
- Parameters:
AudioFileDataset.__init__ (samples and preprocessor are passed to)
opensoundscape.utils.make_clip_df (**kwargs are passed to)
opensoundscape.ml.lightning module
- class opensoundscape.ml.lightning.LightningSpectrogramModule(*args: Any, **kwargs: Any)[source]
Bases:
SpectrogramModule
,LightningModule
- fit_with_trainer(train_df, validation_df=None, epochs=1, batch_size=1, num_workers=0, save_path='.', invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, checkpoint_path=None, **kwargs)[source]
train the model on samples from train_dataset
If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().
- Parameters:
train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)
validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]
batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation
num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)
save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]
save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.
log_interval – interval in batches to print training loss/metrics
validation_interval – interval in epochs to test the model on the validation set Note that model will only update it’s best score and save best.model file on epochs that it performs validation.
invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file
raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log
wandb_session – a wandb session to log to (Note: can also pass logger kwarg with any Lightning logger object) - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example:
` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.fit_with_trainer(...,wandb_session=session) session.finish() `
**kwargs – any arguments to pytorch_lightning.Trainer(), such as accelerator, precision, logger, accumulate_grad_batches, etc. Note: the max_epochs kwarg is overridden by the epochs argument
- Returns:
a trained pytorch_lightning.Trainer object
- Effects:
If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.
- forward(samples)[source]
standard Lightning method defining action to take on each batch for inference
typically returns logits (raw, untransformed model outputs)
- load_weights(path, strict=True)[source]
load network weights state dict from a file
For instance, load weights saved with .save_weights() in-place operation
- Parameters:
path – file path with saved weights
strict – (bool) see torch.Module.load_state_dict()
- predict_with_trainer(samples, batch_size=1, num_workers=0, activation_layer=None, split_files_into_clips=True, clip_overlap=None, clip_overlap_fraction=None, clip_step=None, overlap_fraction=None, final_clip=None, bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, return_invalid_samples=False, lightning_trainer_kwargs=None, dataloader_kwargs=None)[source]
Generate predictions on a set of samples
Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)
- Parameters:
samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path (str or pathlib.Path)
batch_size – Number of files to load simultaneously [default: 1]
num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]
activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]
split_files_into_clips – If True, internally splits and predicts on clips from longer audio files Otherwise, assumes each row of samples corresponds to one complete sample
clip_overlap_fraction – see opensoundscape.utils.generate_clip_times_df
clip_overlap – see opensoundscape.utils.generate_clip_times_df
clip_step – see opensoundscape.utils.generate_clip_times_df
final_clip – see opensoundscape.utils.generate_clip_times_df
overlap_fraction – deprecated alias for clip_overlap_fraction
bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.
invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.
raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log
wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb
return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]
lightning_trainer_kwargs – dictionary of keyword args to pass to __call__, which are then passed to lightning.Trainer.__init__ see lightning.Trainer documentation for options. [Default: None] passes no kwargs
dataloader_kwargs – dictionary of keyword args to self.predict_dataloader()
- Returns:
df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess
- Effects:
(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. After prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.
(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file
- Note: if loading an audio file raises a PreprocessingError, the scores
for that sample will be np.nan
- save(path, save_hooks=False, weights_only=False)[source]
save model with weights using Trainer.save_checkpoint()
load from saved file with LightningSpectrogramModule.load_from_checkpoint()
Note: saving and loading model objects across OpenSoundscape versions will not work properly. Instead, use .save_weights() and .load_weights() (but note that architecture, customizations to preprocessing, training params, etc will not be retained using those functions).
For maximum flexibilty in further use, save the model with both .save() and .save_torch_dict() or .save_weights().
- Parameters:
path – file path for saved model object
save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()
- save_weights(path)[source]
save just the weights of the network
This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())
- Parameters:
path – location to save weights file
opensoundscape.ml.loss module
loss function classes to use with opensoundscape models
- class opensoundscape.ml.loss.BCEWithLogitsLoss_hot(*args: Any, **kwargs: Any)[source]
Bases:
BCEWithLogitsLoss
use pytorch’s nn.BCEWithLogitsLoss for one-hot labels by simply converting y from long to float
- Parameters:
**kwargs – passed to nn.BCEWithLogitsLoss
- class opensoundscape.ml.loss.CrossEntropyLoss_hot(*args: Any, **kwargs: Any)[source]
Bases:
CrossEntropyLoss
use pytorch’s nn.CrossEntropyLoss for one-hot labels by converting labels from 1-hot to integer labels
throws a ValueError if labels are not one-hot
- Parameters:
**kwargs – passed to nn.CrossEntropyLoss
- opensoundscape.ml.loss.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None)[source]
helper function for BCE loss in ResampleLoss class
- opensoundscape.ml.loss.reduce_loss(loss, reduction)[source]
Reduce loss as specified.
- Parameters:
loss (Tensor) – Elementwise loss tensor.
reduction (str) – Options are “none”, “mean” and “sum”.
- Returns:
Reduced loss tensor.
- Return type:
Tensor
- opensoundscape.ml.loss.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]
Apply element-wise weight and reduce loss.
- Parameters:
loss (Tensor) – Element-wise loss.
weight (Tensor) – Element-wise weights.
reduction (str) – Same as built-in losses of PyTorch.
avg_factor (float) – Avarage factor when computing the mean of losses.
- Returns:
Processed loss values.
- Return type:
Tensor
opensoundscape.ml.safe_dataset module
Dataset wrapper to handle errors gracefully in Preprocessor classes
A SafeDataset handles errors in a potentially misleading way: If an error is raised while trying to load a sample, the SafeDataset will instead load a different sample. The indices of any samples that failed to load will be stored in ._invalid_indices.
The behavior may be desireable for training a model, but could cause silent errors when predicting a model (replacing a bad file with a different file), and you should always be careful to check for ._invalid_indices after using a SafeDataset.
based on an implementation by @msamogh in nonechucks (github.com/msamogh/nonechucks/)
- class opensoundscape.ml.safe_dataset.SafeDataset(dataset, invalid_sample_behavior)[source]
Bases:
object
A wrapper for a Dataset that handles errors when loading samples
WARNING: When iterating, will skip the failed sample, but when using within a DataLoader, finds the next good sample and uses it for the current index (see __getitem__).
Note that this class does not subclass DataSet. Instead, it contains a .dataset attribute that is a DataSet (or AudioFileDataset / AudioSplittingDataset, which subclass DataSet).
- Parameters:
dataset – a torch Dataset instance or child such as AudioFileDataset, AudioSplittingDataset
eager_eval – If True, checks if every file is able to be loaded during initialization (logs _valid_indices and _invalid_indices)
Attributes: _vlid_indices and _invalid_indices can be accessed later to check which samples raised Exceptions. _invalid_samples is a set of all index values for samples that raised Exceptions.
opensoundscape.ml.sampling module
classes for strategically sampling within a DataLoader
- class opensoundscape.ml.sampling.ClassAwareSampler(*args: Any, **kwargs: Any)[source]
Bases:
Sampler
In each batch of samples, pick a limited number of classes to include and give even representation to each class
- class opensoundscape.ml.sampling.ImbalancedDatasetSampler(*args: Any, **kwargs: Any)[source]
Bases:
Sampler
Samples elements randomly from a given list of indices for imbalanced dataset :param indices: a list of indices :type indices: list, optional :param num_samples: number of samples to draw :type num_samples: int, optional :param callback_get_label func: a callback-like function which takes two arguments:
dataset and index
Based on Imbalanced Dataset Sampling by davinnovation (https://github.com/ufoym/imbalanced-dataset-sampler)
opensoundscape.ml.shallow_classifier module
- class opensoundscape.ml.shallow_classifier.MLPClassifier(*args: Any, **kwargs: Any)[source]
Bases:
Module
initialize a fully connected NN with ReLU activations
- fit(*args, **kwargs)[source]
fit the weights on features and labels, without batching
Args: see quick_fit()
- opensoundscape.ml.shallow_classifier.augmented_embed(embedding_model, sample_df, n_augmentation_variants, batch_size=1, num_workers=0, device=torch.device)[source]
Embed samples using augmentation during preprocessing
- Parameters:
embedding_model – a model with an embed() method that takes a dataframe and returns
like (embeddings (e.g. a pretrained opensoundscape model or Bioacoustics Model Zoo model)
Perch
BirdNET
HawkEars)
sample_df – dataframe with samples to embed
n_augmentation_variants – number of augmented variants to generate for each sample
batch_size – batch size for embedding; default 1
num_workers – number of workers for embedding; default 0
device – torch.device to use; default is torch.device(‘cpu’)
- Returns:
the embedded training samples and their labels, as torch.tensors
- Return type:
x_train, y_train
- opensoundscape.ml.shallow_classifier.fit_classifier_on_embeddings(embedding_model, classifier_model, train_df, validation_df, n_augmentation_variants=0, embedding_batch_size=1, embedding_num_workers=0, steps=1000, optimizer=None, criterion=None, device=torch.device)[source]
Embed samples with an embedding model, then fit a classifier on the embeddings
wraps embedding_model.embed() with quick_fit(clf,…)
Also supports generating augmented variations of the training samples
Note: if embedding takes a while and you might want to fit multiple times, consider embedding the samples first then running quick_fit(…) rather than calling this function.
- Parameters:
embedding_model – a model with an embed() method that takes a dataframe and returns embeddings
Perch ((e.g. a pretrained opensoundscape model or Bioacoustics Model Zoo model like)
BirdNET
HawkEars)
classifier_model – a torch.nn.Module object to train, e.g. MLPClassifier or final layer of CNN
train_df – dataframe with training samples and labels; see opensoundscape.ml.cnn.train() train_df argument
validation_df – dataframe with validation samples and labels; see opensoundscape.ml.cnn.train() validation_df if None, skips validation
n_augmentation_variants – if 0 (default), embeds training samples without augmentation; if >0, embeds each training sample with stochastic augmentation num_augmentation_variants times
embedding_batch_size – batch size for embedding; default 1
embedding_num_workers – number of workers for embedding; default 0
steps – model fitting parameters, see quick_fit()
optimizer – model fitting parameters, see quick_fit()
criterion – model fitting parameters, see quick_fit()
device – model fitting parameters, see quick_fit()
- Returns:
the embedded training and validation samples and their labels, as torch.tensor
- Return type:
x_train, y_train, x_val, y_val
- opensoundscape.ml.shallow_classifier.quick_fit(model, train_features, train_labels, validation_features=None, validation_labels=None, steps=1000, optimizer=None, criterion=None, device=torch.device)[source]
train a PyTorch model on features and labels without batching
Assumes all data can fit in memory, so that one step includes all data (i.e. step=epoch)
Defaults are for multi-target label problems and assume train_labels is an array of 0/1 of shape (n_samples, n_classes)
- Parameters:
model (); generally shape (n_samples,n_features) – a torch.nn.Module object to train
train_features – input features for training, often embeddings; should be a valid input to
model
train_labels – labels for training, generally one-hot encoded with shape
(n_samples
criterion() (n_classes); should be a valid target for)
validation_features – input features for validation; if None, does not perform validation
validation_labels – labels for validation; if None, does not perform validation
steps – number of training steps (epochs); each step, all data is passed forward and
backward – [Default: 1000]
weights (and the optimizer updates the) – [Default: 1000]
optimizer – torch.optim optimizer to use; default None uses Adam
criterion – loss function to use; default None uses BCEWithLogitsLoss (appropriate for
classification) (multi-label)
device – torch.device to use; default is torch.device(‘cpu’)
opensoundscape.ml.utils module
Utilties for .ml
- opensoundscape.ml.utils.apply_activation_layer(x, activation_layer=None)[source]
applies an activation layer to a set of scores
- Parameters:
x – input values
activation_layer –
None [default]: return original values
’softmax’: apply softmax activation
’sigmoid’: apply sigmoid activation
’softmax_and_logit’: apply softmax then logit transform
- Returns:
values with activation layer applied Note: if x is None, returns None
Note: casts x to float before applying softmax, since torch’s softmax implementation doesn’t support int or Long type
- opensoundscape.ml.utils.cas_dataloader(dataset, batch_size, num_workers)[source]
Return a dataloader that uses the class aware sampler
Class aware sampler tries to balance the examples per class in each batch. It selects just a few classes to be present in each batch, then samples those classes for even representation in the batch.
- Parameters:
dataset – a pytorch dataset type object
batch_size – see DataLoader
num_workers – see DataLoader
- opensoundscape.ml.utils.check_labels(label_df, classes)[source]
check that classes and label_df.columns are the same, otherwise raise a helpful error
- opensoundscape.ml.utils.collate_audio_samples_to_tensors(batch)[source]
takes a list of AudioSample objects, returns batched tensors
use this collate function with DataLoader if you want to use AudioFileDataset (or AudioSplittingDataset) but want the traditional output of PyTorch Dataloaders (returns two tensors:
the first is a tensor of the data with dim 0 as batch dimension, the second is a tensor of the labels with dim 0 as batch dimension)
- Parameters:
batch – a list of AudioSample objects
- Returns:
(Tensor of stacked AudioSample.data, Tensor of stacked AudioSample.label.values)
from opensoundscape import AudioFileDataset, SpectrogramPreprocessor
preprocessor = SpectrogramPreprocessor(sample_duration=2,height=224,width=224) audio_dataset = AudioFileDataset(label_df,preprocessor)
- train_dataloader = DataLoader(
audio_dataset, batch_size=64, shuffle=True, collate_fn = collate_audio_samples_to_tensors
)
- opensoundscape.ml.utils.get_batch(array, batch_size, batch_number)[source]
get a single slice of a larger array
using the batch size and batch index, from zero
- Parameters:
array – iterable to split into batches
batch_size – num elements per batch
batch_number – index of batch
- Returns:
one batch (subset of array)
Note: the final elements are returned as the last batch even if there are fewer than batch_size
Example
if array=[1,2,3,4,5,6,7] then:
get_batch(array,3,0) returns [1,2,3]
get_batch(array,3,3) returns [7]